PhraseTrainer Manual
The complete manual for PhraseTrainer, a web app for training your pronunciation with your own video and audio. New here? Start with "Basic flow". Looking for a specific feature? Use the table of contents on the left.
1. What is PhraseTrainer
A pronunciation training app that lets you clip just the part you want from any material — movies, dramas, podcasts, meeting recordings — listen to it repeatedly, and record and AI-analyze your own pronunciation.
How it differs from conventional pronunciation apps:
- No fixed teaching material. Practice with the phrases you actually want to say (BYOC: Bring Your Own Content).
- Every phrase you practice is kept permanently — re-practice or export anytime.
- Works on both PC and smartphone.
- Separate background music and ambient sound to focus on the dialogue (Voice extraction).
- Japanese / English UI (German and French planned).
2. Basic flow
Step 1: SETUP (PC recommended) Drop a video/audio file -> select a range on the waveform -> register the phrase Step 2: PRACTICE (PC / smartphone) Replay the original -> record -> the AI-recognized speech shows instantly -> side-by-side comparison with the original -> evaluate & AI commentary as needed Step 3: HISTORY (PC / smartphone) Pick a past phrase -> re-practice -> compare scores -> organize phrases with Active / Library
You can prepare material on a PC and practice on a smartphone.
3. Setup — Clipping phrases
Drag and drop a video or audio file and clip the section you want to practice.
- Fast even with large files — processed inside your browser; only the needed section is uploaded.
- Set the start and end points (A-B points) intuitively while watching the waveform.
- If an SRT (subtitle) file is available, it loads automatically and you can clip line by line.
- You can tag clipped sections (two tag systems: title tags / phrase tags).
Library — bulk import from a folder
The "Open folder" button in the "Library" panel on the left lets you load material in bulk, by folder.
- Register multiple video/audio files at once from within a folder.
- If an SRT subtitle file is in the same folder, it loads together with the matching video/audio (match the file names to ensure reliable pairing).
- If there are subfolders, items are grouped by subfolder.
Instead of dragging and dropping one file at a time, you can import a whole drama season or dozens of podcast episodes at once.
Automatic range selection from SRT subtitles
When you load an SRT subtitle file, a list of subtitles (lines) appears on the Setup tab.
- Double-click a subtitle row to automatically set the A-B points (start/end) to that subtitle's time range.
- No need to drag the waveform manually — clip line by line, accurately and quickly.
- Subtitle rows can also be tagged, so you can quickly find a target line with search and filters.
4. Practice — Practice, evaluation & AI analysis
Playback
- Repeat playback of the original (1x / 2x / 5x / infinite loop)
- Switch between original / Voice version / noise-reduced version
- Pinpoint-play just a specific word on the waveform (range select, touch supported)
- ▶ / ⏸ toggle button — during playback the button turns into a stop icon; stop with one tap
Recording
- Start / stop with the record button, a hotkey, or a foot pedal
- Designed so the start of your speech isn't cut off, even on consecutive recordings
- Trim unwanted parts of the recording
- Right after you stop recording, the "AI-recognized speech" appears within 1-2 seconds, without pressing the evaluate button
Comparison & evaluation
- Side-by-side comparison — your speech and the original lined up word by word, with differing words color-highlighted
- IPA notation — phoneme sequences shown in the International Phonetic Alphabet (e.g.
h ə l oʊ). Switch to ARPAbet notation (HH AH0 L OW1) with the "IPA ⇄" header in the comparison table. - Cambridge Dictionary integration — click a word in the original to open the Cambridge Dictionary (US) in a new tab
- Waveform linkage — word boxes on the waveform are colored to match the comparison result
- F0 (pitch) curve — visually compare your intonation with the original
AI pronunciation evaluation scores you automatically on 5 metrics plus a total:
| Metric | Description |
|---|---|
| Similarity | Acoustic similarity to the original |
| Accuracy | Phoneme-level correctness |
| Fluency | Fluency and natural rhythm |
| Completeness | Coverage of all words |
| Prosody | Intonation and stress |
| Total | Overall score of the 5 metrics (out of 100) |
Hover over a metric label to see a description in your selected UI language.
AI commentary & translation
- AI pronunciation advice — identifies "words you pronounced as in the original / words you didn't say well" and explains specifically using natural sound-change terms (linking, assimilation, elision, flap T, weak forms). Returned in your selected UI language.
- AI translation, part-of-speech breakdown & grammar points — the AI explains the subtitle's English in a 3-part structure: translation + part-of-speech breakdown + grammar points. Generated automatically in the background after a new segment is created, and shown in the "Translation" section of the ANALYSIS panel.
- Sound-change rule detection — liaison, elision, weak forms and flap T are detected and highlighted automatically
Collapsing Analysis items
Each Analysis item — word comparison, translation, sound-change rules, AI commentary — can be collapsed by clicking its section header (▼ / ▶ icon). When there's too much information, collapse the items you don't use for a simpler display. Keeping only the items you use often expanded keeps the screen tidy. (The learning curve cannot be collapsed.)
Focus mode (hide the original)
The "Hide" button at the top right of the Original section hides the word labels on the waveform, the subtitle text, and the original side of the AI recognition result all at once. Your recording side stays visible, so you can do dictation-style learning: pronounce by ear without looking at the text, then check the answer afterward. The setting is saved in your browser.
5. History — History & Active / Library management
All the phrases you've practiced are saved in your history.
- Re-practice anytime with one tap — the original, your recordings and scores are all reproduced
- Check score trends on the learning curve graph
- Search and filter by tag, date or text
Active and Library
Phrases are managed in two layers: Active and Library.
| Layer | Role | Limit |
|---|---|---|
| Active | Target of practice and AI evaluation | Regular 200 / Pro 500 |
| Library | Permanent storage vault | Unlimited |
- Switch the display with "All / Active / Library" at the top of the History tab
- Click each phrase's ●Active / ○Library badge to switch it
- An Active remaining-count is shown at the top right
- Click a Library phrase and you'll be asked whether to make it Active and start practicing
6. Export — Exporting your library
You can export your entire phrase library as a self-contained archive. Permanently yours — your learning assets stay with you even after you cancel.
- Regular: CSV / Excel — outputs phrase text, tags, scores and audio links as CSV and Excel files
- Pro: full HTML archive — a ZIP package that works fully in a browser, including the original audio, Voice version, noise-reduced version, learner recordings and all analysis data. Just open
index.htmlto use waveforms, F0, analysis, the learning curve and audio playback offline.
Run Export from "Library management" inside the ⚙ settings modal.
- Eligibility: after 30 consecutive days of subscription
- Count: up to 2 times per 30 days. Processing is sequential (1 at a time).
- The generated ZIP is downloadable for 7 days
7. Audio processing — Voice extraction & noise reduction
Voice extraction (Pro only)
AI separates the background music and sound effects of movies and dramas to extract just the dialogue. Even with material that has loud BGM, you can practice with clean, easy-to-hear audio. After extraction, you can switch between "original" and "Voice version".
Noise reduction (all plans)
Automatically removes recording-environment noise such as air conditioning, fans and hiss. You can switch between the original and the noise-reduced version, and combining it with Voice extraction makes it even clearer.
8. Settings, hotkeys & foot pedal
Open the settings modal with the ⚙ button at the top right of the toolbar.
- Hotkeys — Space = play, R = record, E = evaluate, etc. Key assignments are customizable
- Theme — light / dark
- Language — switch UI language (Japanese / English)
- Font size — small / medium / large
- Pronunciation comparison — customize the symbols ignored in comparison (
. , ! ? ; :etc.) - Foot pedal — hands-free play/record with a USB foot switch (operable even with both hands full)
9. Plans & pricing
| Feature | Free (20-day trial) | Regular | Pro |
|---|---|---|---|
| Monthly | Free | ¥1,100 | ¥2,200 |
| Active phrases | 20 | 200 | 500 |
| Library (storage) | — | Unlimited | Unlimited |
| AI evaluation | 30 (total) | 200 / month | 400 / month |
| Practice / AI recognition | 100 (total) | 1500 / month | Unlimited |
| Voice extraction | 10 (total) | ✗ (Pro only) | Unlimited |
| Noise reduction | ✓ | ✓ | ✓ |
| Library Export | — | CSV / Excel | Full HTML archive |
Upgrade from the pricing cards on the landing page, or from the "Upgrade" item in the top-right menu after logging in. You can choose monthly / yearly (20% off) billing, applied instantly via Stripe. Cancel from "Manage subscription"; your current plan stays active until the next billing date.
10. Using on a smartphone
- When you access from a smartphone, the History tab opens automatically
- You can prepare material on a PC and practice on a smartphone
- A smartphone microphone is often higher quality than a PC's, improving recording quality
- Full support for touch operations on the waveform (range select, playback, trimming)
11. Known issues (beta)
- Old iPad (iOS 14 or earlier): a click sound may occur on the first playback (it does not happen from the second time on)
- Some versions of Safari: about the first 100ms after recording starts may be cut off
- Voice extraction: with material mixing classical music or loud BGM, parts of the dialogue may become faint
- Monthly recording-limit counter: deleting a phrase does not roll back the recording count (it resets automatically per calendar month)
12. Contact
- Email: [email protected] (we reply within 1 business day as a rule)
- Bug reports: attaching a screenshot and the steps helps us identify the issue faster
- Feature requests: welcome, big or small
Before registering, please also review the Terms of Service / Privacy Policy / Legal notice (Specified Commercial Transactions Act).