b-editor · Copilot · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025 · Dec 4, 2025
diff --git a/AUDIOQUERY_IMPLEMENTATION.md b/AUDIOQUERY_IMPLEMENTATION.md
@@ -0,0 +1,145 @@
+# AudioQuery UI Implementation
+
+This document describes the implementation of the AudioQuery UI feature for adjusting voice accents and other parameters in the Beutl Voice Extension.
+
+## Overview
+
+The implementation follows the workflow specified in the requirements:
+
+1. User inputs text
+2. Generate AudioQuery
+3. Parse AudioQuery
+4. Display in UI
+5. User adjusts accent, pitch and other parameters
+6. Reflect changes in AudioQuery
+7. Generate audio
+
+## Architecture
+
+### Models (Created)
+
+#### AudioQuery.cs
+Represents the audio synthesis query with the following properties:
+- `AccentPhrases`: Array of accent phrases
+- `SpeedScale`: Overall speech speed (0.5-2.0)
+- `PitchScale`: Overall pitch adjustment (-0.15 to 0.15)
+- `IntonationScale`: Overall intonation (0.0-2.0)
+- `VolumeScale`: Overall volume (0.0-2.0)
+- `PrePhonemeLength`: Silence before audio (seconds)
+- `PostPhonemeLength`: Silence after audio (seconds)
+- `OutputSamplingRate`: Audio sampling rate
+- `OutputStereo`: Stereo output flag
+- `Kana`: AquesTalk-style notation (read-only)
+
+#### AccentPhrase.cs
+Represents an accent phrase with:
+- `Moras`: Array of mora (syllable units)
+- `Accent`: Accent position (1-indexed)
+- `IsInterrogative`: Whether it's a question
+- `PauseMora`: Optional pause mora after the phrase
+
+#### Mora.cs
+Represents a mora (smallest speech unit) with:
+- `Text`: Display text
+- `Consonant`: Consonant phoneme
+- `ConsonantLength`: Consonant duration (seconds)
+- `Vowel`: Vowel phoneme
+- `VowelLength`: Vowel duration (seconds)
+- `Pitch`: Pitch in Hz
+
+### ViewModels (Created/Modified)
+
+#### AccentPhraseViewModel.cs
+Wraps AccentPhrase for UI binding with:
+- `Accent`: Reactive property for accent position
+- `IsInterrogative`: Reactive property for question flag
+- `Moras`: Observable collection of MoraViewModel
+- Two-way binding that updates the underlying model
+
+#### MoraViewModel.cs
+Wraps Mora for UI binding with:
+- `Text`: Display text (read-only in UI)
+- `Pitch`: Reactive property for pitch adjustment
+- `VowelLength`: Reactive property for duration adjustment
+- Two-way binding that updates the underlying model
+
+#### TtsTabViewModel.cs (Modified)
+Added new properties and methods:
+- `CurrentAudioQuery`: Stores the generated AudioQuery
+- `IsAudioQueryGenerated`: Flag indicating if AudioQuery is available
+- `AccentPhrases`: Observable collection for UI binding
+- `GenerateAudioQuery()`: New method to generate AudioQuery from text
+- Modified `Tts()`: Now uses AudioQuery if available, falls back to direct TTS otherwise
+
+### Views (Modified)
+
+#### TtsTabView.axaml
+Enhanced UI with:
+
+1. **New "AudioQuery生成" button**: Generates AudioQuery from text
+2. **AudioQuery editor section**: Shows when AudioQuery is generated
+3. **Global parameter sliders**:
+   - Speech speed (話速): 0.5-2.0
+   - Pitch (音高): -0.15 to 0.15
+   - Intonation (抑揚): 0.0-2.0
+   - Volume (音量): 0.0-2.0
+   - Pre-silence (前の無音): 0.0-1.5 seconds
+   - Post-silence (後の無音): 0.0-1.5 seconds
+4. **Accent phrase editor**:
+   - Displays each accent phrase with its text
+   - Accent position selector (NumericUpDown)
+   - Question mark checkbox
+   - Per-mora pitch adjustment controls
+5. **Existing buttons** ("追加" and "読み上げ"): Now use AudioQuery when available
+
+## Workflow
+
+### Without AudioQuery (Original behavior)
+1. User enters text
+2. Selects voice and style
+3. Clicks "追加" or "読み上げ"
+4. System directly generates audio using TTS API
+
+### With AudioQuery (New behavior)
+1. User enters text
+2. Selects voice and style
+3. Clicks "AudioQuery生成"
+4. System calls `CreateAudioQuery` API
+5. AudioQuery is parsed and displayed in UI
+6. User adjusts parameters:
+   - Global parameters (speed, pitch, intonation, volume)
+   - Per-phrase accent position
+   - Per-phrase question flag
+   - Per-mora pitch values
+7. User clicks "追加" or "読み上げ"
+8. System uses `Synthesis` API with modified AudioQuery
+9. Audio is generated with customized parameters
+
+## Technical Details
+
+### JSON Serialization
+The AudioQuery models use `System.Text.Json` with `JsonPropertyName` attributes to match the VOICEVOX API schema. The serialization handles snake_case and camelCase property names correctly.
+
+### Reactive Programming
+The implementation uses ReactiveBindings extensively:
+- Changes to sliders immediately update the AudioQuery model
+- ObservableCollection automatically updates the UI when accent phrases change
+- Two-way bindings ensure UI and model stay synchronized
+
+### API Integration
+The implementation uses the VoicevoxCoreSharp library:
+- `Synthesizer.CreateAudioQuery()`: Generates AudioQuery from text
+- `Synthesizer.Synthesis()`: Synthesizes audio from AudioQuery
+- `Synthesizer.Tts()`: Direct text-to-speech (fallback)
+
+## Future Enhancements
+
+Possible improvements:
+1. Pitch visualization graph
+2. Audio waveform preview
+3. Save/load AudioQuery presets
+4. Batch processing multiple AudioQueries
+5. Advanced phoneme editing
+6. Undo/redo for parameter changes
+7. Visual accent position indicator
+8. Mora duration adjustment UI
diff --git a/AUDIOQUERY_USER_GUIDE.md b/AUDIOQUERY_USER_GUIDE.md
@@ -0,0 +1,125 @@
+# AudioQuery UI User Guide
+
+## はじめに
+
+このガイドでは、音声のアクセントやピッチなどを調整する新しいUIの使い方を説明します。
+
+## 基本的な使い方
+
+### 1. テキストの入力
+
+1. 「テキスト読み上げ」タブを開きます
+2. テキスト欄に読み上げたい文章を入力します
+3. 話者とスタイルを選択します
+
+### 2. AudioQueryの生成
+
+1. 「AudioQuery生成」ボタンをクリックします
+2. システムがテキストを解析し、AudioQueryを生成します
+3. 生成が完了すると、パラメータ編集UIが表示されます
+
+### 3. 音声パラメータの調整
+
+生成されたAudioQueryには以下のパラメータを調整できます：
+
+#### グローバルパラメータ
+
+- **話速 (Speed Scale)**: 0.5〜2.0
+  - 1.0が標準速度
+  - 0.5で半分の速度（ゆっくり）
+  - 2.0で2倍の速度（速く）
+
+- **音高 (Pitch Scale)**: -0.15〜0.15
+  - 0.0が標準の高さ
+  - 負の値で低く、正の値で高く
+
+- **抑揚 (Intonation Scale)**: 0.0〜2.0
+  - 1.0が標準の抑揚
+  - 0.0で平坦、2.0で抑揚を強調
+
+- **音量 (Volume Scale)**: 0.0〜2.0
+  - 1.0が標準の音量
+
+- **前の無音**: 0.0〜1.5秒
+  - 音声の前に挿入する無音時間
+
+- **後の無音**: 0.0〜1.5秒
+  - 音声の後に挿入する無音時間
+
+#### アクセント句ごとの調整
+
+各アクセント句（文を区切った単位）に対して：
+
+- **アクセント位置**: 
+  - 高くなる位置を指定（1から始まる）
+  - 0にすると平板（アクセントなし）
+
+- **疑問文チェックボックス**:
+  - チェックすると疑問文として処理
+  - 文末が上がる調子になります
+
+#### モーラごとのピッチ調整
+
+各モーラ（音節単位）に対して：
+
+- **ピッチ (P)**: 0〜200Hz
+  - 個別の音の高さを調整
+  - より細かい音高制御が可能
+
+### 4. 音声の生成
+
+1. パラメータを調整した後、「追加」ボタンをクリックします
+2. 調整したパラメータで音声が生成され、タイムラインに追加されます
+3. または「読み上げ」ボタンで、その場で音声を再生できます
+
+## 使用例
+
+### 例1: ゆっくり話す
+
+1. テキストを入力してAudioQueryを生成
+2. 話速を0.7に設定
+3. 「追加」または「読み上げ」をクリック
+
+### 例2: 質問文を強調
+
+1. 質問文を入力してAudioQueryを生成
+2. 該当するアクセント句の「疑問文」チェックボックスをオン
+3. 抑揚を1.3に増やす
+4. 「追加」または「読み上げ」をクリック
+
+### 例3: 特定の音を高く
+
+1. テキストを入力してAudioQueryを生成
+2. 強調したいモーラのピッチ値を増やす（例: 120→150）
+3. 「追加」または「読み上げ」をクリック
+
+### 例4: 低い声で話す
+
+1. テキストを入力してAudioQueryを生成
+2. 音高を-0.1に設定
+3. 「追加」または「読み上げ」をクリック
+
+## ヒント
+
+- AudioQueryを生成しなくても、従来通り「追加」「読み上げ」ボタンで直接音声を生成できます
+- AudioQueryを生成してから調整することで、より細かい制御が可能になります
+- パラメータはリアルタイムで変更できるので、何度も試して最適な設定を見つけてください
+- モーラごとのピッチ調整は上級者向けです。通常はグローバルパラメータの調整で十分です
+
+## トラブルシューティング
+
+### AudioQueryが生成されない
+
+- テキストが入力されているか確認してください
+- 話者とスタイルが選択されているか確認してください
+- VOICEVOXがインストールされているか確認してください
+
+### パラメータを変更しても効果がない
+
+- AudioQueryを再生成してみてください
+- 変更したパラメータが適用されているか確認してください（スライダーの値を確認）
+
+### 音声が生成されない
+
+- ログを確認してエラーメッセージを確認してください
+- VOICEVOXが正しくロードされているか確認してください