UTAU manual TOP > 15. Continuous sound sources
Previous: 14. Extending features with Plug-ins
15. Continuous sound sources
15-1. What are continuous sound sources
※ This chapter describes the construction of speech synthesis based on continuous sound sources. If you want to use continuous sound sources immediately, skip this chapter and read 15-2. How to use continuous sound sources.
In UTAU, the method to create voice banks is basically to record the 50 sounds "a, i, u, e, o, ka, ki, ki, ke, ko, ..." etc. into one separate file for each individual syllable. This is what UTAU users call "single sound" 「単独音」.
The method for compiling the single sounds (the so-called 50 sounds) is easy to understand even for beginners, and has the advantage of being easy to record. However, the human speech can be separated into units called "phonemes" 「音素」 that are more detailed than the 50 sounds. The phonemes are connected by various patterns like e.g. [silence to consonant (or vowel) connection], [consonant to vowel connection], [vowel to consonnant (or another vowel) connection], [vowel to silence connection] etc., thus with just the single sounds, the phoneme connection patterns are missing and the sounds are not well connected.
For example, when thinking about separating the lyrics "saita" 「さいた」 into phoneme units, it results as follows:
.#-s (silence-consonant), s-a (consonant-vowel), a-i (vowel-other vowel), i-t (vowel-consonant), t-a (consonant-vowel), a-# (vowel-silence).
However, if enumerated with single sounds only, it becomes "#-s-a (silence-consonant-vowel)" "i (vowel)" "t-a-# (consonant-vowel-silence)" as shown below, the "a-i (vowel-other vowel)" and "i-t (vowel-consonant)" phonemes are missing and the sounds are not properly connected.
Therefore, producing the consecutive sounds "ai (vowel-other vowel)" and "ita (vowel-consonant-vowel-silence)" instead of the single sounds "i (vowel)" and "ta (consonant-vowel-silence)", and lining up "#-s-a (silence-consonant-vowel)", "a-i (vowel-other vowel)", "i-t-a-# (vowel-consonant-vowel-silence)" like in the figure 2 below, allows to reproduce connections that are identical to the original voice, by linking parts of the same vowels (a and i) with a cross-fade. This is the basic idea behind the continuous sound sources.
It is possible to create continuous sounds with as low as 2 moras (a mora is the unit for the number of syllabes comprised in one continuous utterance), like the "aI" 「あい」 and "iTA" 「いた」 of figure 2, but this case is inefficient because you can create continuous sounds of just one type each (whose number is half the number of moras), like in "aI" 「aい」 or "iTA" 「iた」.
Therefore, if you create continuous sounds of 5 moras like "aaiau" 「ああいあう」 or "tatachitatsu" 「たたちたつ」, configure a voice in all the parts where the phoneme changes, and specify aliases like "aA, aI, iA, aU" 「aあ、aい、iあ、aう」 or "aTA, aCHI, iTA, aTSU" 「aた、aち、iた、aつ」, you can efficiently produce 4 continuous sounds out of 5 moras. (You can also further increase the efficiency by creating continuous sounds of 7 moras)
Reference: List of continuous sounds (Ameya version with 5, 6 and 7 moras) download page -> About UTAU - Continuous sounds association.
For more information on how to configure a voice bank with continuous sounds, please refer to 15-3. Voice configuration of continuous sound sources.
15-2. How to use continuous sound sources
Please carry out the steps (1) to (5) in order. However, choose one way to proceed between (4A) and (4B).
※ You may also refer to the video courses -> 15. [UTAU Course Compiling continuous sound sources] and 16. [UTAU: NHP How to record continuous sounds [course]] (These videos are created by Loop sama.)
(1). What you need
What you need absolutely
- UTAU itself, at least version 0.2.46 (We recommend that you use the latest stable version -> Latest version download page)
- A continuous sound voice bank (if you are a beginner, we recommend the use of a sound source recorded with the Ameya list of continuous sounds. Introduction page -> UTAU continuous sounds installation page)
- In this manual, the Nagone Mako continuous sound voice (F4+B3/1 folder version), recorded with the Ameya list of 5 moras, is used as an example. -> Download page
Things nice to have
Use the UTAU plug-ins shown in this manual. Download them from UTAU Users Mutual Aid Society - Plug-ins. Using continuous sounds is possible without them, but as they make the work much easier, their use is recommended.
※ For the installation method of plug-ins, please refer to 14. Extending features with Plug-ins.
- Continuous sounds batch setup plug-in 「連続音一括設定プラグイン」 (Its installation is recommended as it may simplify the work.)
- Adjusting lyrics for continuous sounds plug-in 「歌詞を連続音にするプラグイン」 (Compared to using this plugin that sets up in one go for continuous sounds, not using it is merely tolerable.)
- Extended envelope editor 「拡張エンベロープエディタ」 (There is no problem without it, but its use is recommended if you want to manipulate the envelope afterwards.)
(2). Setting Options
You need to configure the options when you use a continuous sound voice. You can't generate proper WAV files from the continuous sounds if you don't, so be sure to perform it. By the way, you may carry out (3). Setting note properties. Envelope initialization to configure the timings anytime after saving the project (UST) file.
1. Select the menu "Tools" 「ツール」 -> "Options" 「オプション」 to open the options configuration screen.
2. Open the "General" 「全般」 tab into the options screen, then in "Appplication General Configurations" 「編集オプション」 check "No automatic copy of pre-utterance and overlap parameters from voice default on lyric change" 「歌詞変更時に原音から先行発音・オーバーラップ値をコピーしない」. (If unchecked, the pre-utterance and overlap values of the single sounds are applied to the continuous sound, causing the pronunciation timings to be messed up.)
In addition, voice drops occur with continuous sound voices with multiple pitch files, when sounds are missing depending on the pitch, thus it is recommended to check also "Check voice file existance on rendering to prevent voice drop (slower rendering)" 「レンダリング時にファイルの存在チェックを行い、なるべく音抜けしないようにする。（遅くなります）」 in "Rendering" 「レンダリング」. (It is better to check it even if you use only single sounds.)
3. Open the "Cache Config" 「キャッシュ」 tab of the Options settings screen, check "Cache intermediate files" 「中間ファイルをキャッシュする」 and select the radio button "Do not remove cache files" 「キャッシュを削除しない」. When you're done, press "OK" to close the screen.
※ Cache is required when you perform crossfade optimization. Basically, the "Do not remove cache files" 「キャッシュを削除しない」 setting is recommended but if you don't have enough hard disk space, select the radio button "Remove older files when cache size exceeds" 「一定の容量を超えたら古いものから削除」 and enter a value of about 500MB.
(3) Setting note properties. Envelope initialization
※ This description assumes that (2). Setting Options and notes input are completed. Furthermore, when importing a VSQ file or diverting a UST file made for single sounds, it can happen that very short rests like e.g. 32nd or 64th rests are inserted for improving articulation. As the sound connection would become bad with continuous sounds, extend the note before the rest till it connects with the following note and the very short rest disappears.
1. Select the menu "Edit" 「編集」 -> "Select All" 「全て選択」, then when all the notes are selected select the menu "Edit" 「編集」 -> "Region Property " 「選択部分のプロパティ」 to open the Properties configuration screen. (You can also select "Region Property" 「選択部分のプロパティ」 from the note right-click menu.)
2. In the "Selected range properties" 「選択範囲のプロパティ」 screen, configure as follows.
Set "Intensity" 「音量」 (1) to 100%. (For improving the sound connection by aligning the volume of all the notes.)
Set "Modulation" 「モジュレーション」 (2) to 0%. (Always enter 0% to improve the sound connection. In addition, be careful that an empty or blank space value is treated as 100%.)
Press the "Clear" (3) button to set a white blank space in the "Preutterance" 「先行発音」 and "Overlap" 「オーバーラップ」 input fields. (If numerical values are entered in some of the notes, the input field is grayed like in the picture on the left below. If numerical values are pre-filled, the cross-fading process is carried out incorrectly and the pronunciation timings could be shifted and/or sounds could be missing. Please be especially carefull if diverting a UST file prepared with single sounds.)
Set a white blank space too in the "Consonant velocity" (4) and "STP" (7) input fields. If values are entered, there is a risk of pronunciation timings being shifted and/or of sounds being missing.
Caution: If the Clear button (3) can not clear, temporarily enter an appropriate value like e.g. zero, then make it a white blank space by deleting the value.
Enter a small value not exceeding 40 in BRE (Breathiness) (5). Noise becomes noticeable if BRE is set to a high value. However, for whisper voices like e.g. the continuous voice Sekka Yufu (soft), it may be better to set a not too small value. In addition, setting the "b" flag (BRE is applied after the formant filter) in Flags (6) instead of setting the BRE value to 0 can reduce the noise. For more information, please refer to 7-2. The settings of the "Notes Properties" screen and 7-3. Setting the Flags.
Warning: The adjustment technique of setting the Y flag value to zero or a small value not only has no effect on improving the articulation in the case of continuous sounds, but it increases the noise, so don't use it.
Finally, press the "OK" button (8) to close the Properties screen.
3. After selecting all the notes, press the "Envelope reset" button on the upper right of the main screen to initialize the envelope. Especially, be sure to reset the envelope when using a UST file that manipulates the envelopes for single sounds, because the crossfade for continous sounds could become unable to perform successfully if left this way.
Reference: 9. Envelope and Vowel Blending
(4A). Executing the plug-in "Continuous sound batch configuration"
※ This description assumes that (1). Continuous sounds batch configuration plug-in installation up to (3). Setting note properties. Envelope initialization are completed. However, if you are using a continuous sound voice recorded with the MimiRobo-P list, where some sounds are completed with single sounds, the pronunciation timings of the single sound parts could be shifted with this method, thus please refer to (4B). How to manually configure continuous sounds.
1. After ensuring that all the notes are selected and that the Mode2 button is pushed, select the menu "Tools" 「ツール」 -> "Plug-ins" 「プラグイン」 -> "Continuous sounds batch configuration" 「連続音一括設定」 to open the "Continuous sounds batch configuration" 「連続音一括設定」 screen. (If you don't configure the portamento, you cannot connect smoothly continuous sounds of different pitches. For more information about Mode2, please refer to 12. Mode2 features.)
2. There is basically no problem with keeping the default values in the "Continuous sounds batch configuration" 「連続音一括設定」 screen, but for a slow song, change "Start" 「開始」 to -30..-50ms, and "Length" 「長さ」 to 60..100ms in "Portamento" 「ポルタメント」 (1). Also if selecting "Apply prefix.map" 「prefix.mapを反映」 (2), 「↑」 「↓」 symbols are attached to the notes according to the prefix.map settings.
Finally, press "OK" to close the screen.
3. Lyrics change automatically to the writings of continuous tones ("tsu" 「つ」 "me" 「め」"to" 「と」 -> "-TSU" 「-つ」 "uME" 「uめ」 "eTO" 「eと」), while the envelopes are automatically adjusted and the paired vowels are blended with a crossfade. (To check that the notes are cross-faded, press the 「～」 button on the lower left of the main screen, or press F4.)
Here, select the notes in about four bars, and play once. (Please note that trying to play at once the notes of a too long section takes time.)
When playing is done, please perform the following section (5). Crossfade optimization.
(4B). How to manually configure continuous sounds
※ This description assumes that (1). Continuous sounds batch configuration plug-in installation up to (3). Setting note properties are completed. However, if you performed (4A). Executing the plug-in "Continuous sound batch configuration", you can skip this part and perform the following section (5). Crossfade optimization.
1. After selecting all the notes, select the menu "Tools" 「ツール」 -> "Plug-Ins" 「プラグイン」 -> "Adjusting lyrics for continuous sounds" 「歌詞を連続音にする」 to change the lyrics for continuous sounds.
※ If you don't use the plug-in to convert lyrics for continuous tone, select the notes one at a time, right click while holding the Shift key and change to a selected entry from the displayed list.
For example for a "ra" 「ら」 preceded by "hi" 「ひ」, select "iRA" 「iら」 where the "i" of the vowel part of "hi" 「ひ」 is prepended. For the notes immediateley preceded by a rest "R", append a "-" to the lyric to indicate the lead pronunciation, like e.g. "-HI" 「-ひ」.
2. Check that all the notes are selected, then press the Automatic parameters adjustment 「パラメータ自動調整」 button on the upper right of the screen. (This operation fixes the parts, marked with a 「!」, where the envelope is erroneously crossed, and the parts where 3 or more sounds are superposed. If this is not done, the pronunciation timings could be shifted and/or sounds could be missing, so be sure to perform this operation.)
Warning: If you are using a continuous sound voice recorded with the MimiRobo-P list, where some sounds are completed with single sounds, be sure to select only the continuous sound notes when pressing the Automatic parameters adjustment 「パラメータ自動調整」 button. If performing Automatic parameters adjustment with single sounds included, there is a risk of shifting the pronunciation timings.
3. Check that all the notes are selected, then press the Set crossfade 「クロスフェード設定」 button on the upper right of the screen to join the paired vowels with a cross-fade. (To check that the notes are cross-faded, press the 「～」 button on the lower left of the main screen, or press F4.)
Warning: If you are using a continuous sound voice recorded with the MimiRobo-P list, where some sounds are completed with single sounds, be sure to select only the continuous sound notes when pressing the Set crossfade 「クロスフェード設定」 button. If performing Set crossfade with single sounds included, there is a risk of shifting the pronunciation timings.
※ Starting from Ver0.2.61 the Set crossfade button has now the two types "Set crossfade envelopes by p1 and p4" (the button enclosed in a red box) and "Set crossfade envelopes by p2 and p3" (the button on the left of the red box). If you use the "p1, p4" button, this has the advantage of allowing to shift and adjust afterwards the 3 points p2, p3 and p5 that are located between p1 and p4. However, if you mistakenly move the points p1 or p4, the crossfade becomes stange, thus we recommend that you use the Extended Envelope Editor plug-in. Introduction Video -> [UTAU Extended Envelope Editor [Plugin]].
4. Apply the Mode2 and enable the portamento configuration. (Portamento is required to smoothly connect the sounds. Timings may be configured at any time before playing. For more details about Mode2, please refer to 12. Mode2 features.)
5. Select the notes in about four bars, and play once. (Please note that trying to play at once the notes of a too long section takes time.)
When playing is done, please perform the following section (5). Crossfade optimization.
(5). Crossfade optimization
After playing the continuous sounds once, check that the played notes are selected, then press the Crossfade Optimization 「クロスフェード最適化」 button. The "Crossfade Optimization" progress bar is displayed in the center of the screen. When the optimization process is terminated, press "Yes" in the "Cache of selection is removed, OK?" pop-up screen that appears.
Warning: If you are using a continuous sound voice recorded with the MimiRobo-P list, where some sounds are completed with single sounds, be sure to select only the continuous sound notes when performing crossfade optimization. If performing crossfade optimization with single sounds included, there is a risk of shifting the pronunciation timings.
2. There are times where you can obtain good results by repeating several times the crossfade optimization, until the following popup indicating "Optimization completed" 「最適化済み」 appears. (There are also cases where it does not change much, depending on the voice.)
※ Crossfade optimization needs the cache files produced when playing, so be sure to play once beforehand.
※ If you want to modify some of the continuous sounds lyrics, please refer to (4B). How to manually configure continuous sounds. Also after correcting the lyrics, Automatic parameter adjustment 「パラメータ自動調整」 and Set crossfade 「クロスフェード設定」, described in (3). Setting note properties and (4B). How to manually configure continuous sounds, must be carried out again for the corresponding notes.
※ If the pronunciation timings of some of the consecutive sounds are shifted even if properly configured with the methods exposed so far, there is a possibility that the voice configuration of the used voice is not well done. Please refer to the following section 15-3. Voice configuration of continuous sound sources and fix the voice settings.
15-3. Voice configuration of continuous sound sources
As a continuous sound voice stores the pronunciation of 5 moras (five syllables) in one WAV file, like e.g. "aaiau" 「ああいあう」, there are parts where the voice configuration methods differ from single sound voices. The pronunciation timings are shifted if the voice configuration is not properly done: fix it the following way.
※ For a basic user guide of the Voice configuration screen, please refer to 10. Setting voice banks.
※ When producing a continuous sound voice, please refer to the video course produced by Loop sama 16. [UTAU:NHP How to record a continuous sound [course]] on how to configure a voice from the beginning.
1. Select one note on which to perform voice configuration, select the menu "Tools" 「ツール」 -> "Voice Bank Setting" 「原音の設定」 or press Ctrl+G to open the Voice Configuration screen, then after checking that the corresponding note is selected in Alias 「エイリアス」, press the "Launch editor" 「エディタを起動」 button to start the Voice Configuration Editor.
2. Place the cursor on the boundary between the left blank area (in purple) on the left side of the Voice Configuration Editor and the fixed consonant part area (in pink) of is right side so that it changes into a crosshair "+". When you drag it to the right, the preutterance (the red vertical line) and the overlap (the green vertical line) move along with the end edge of the left blank. (So as to configure a negative value for the left blank of continuous sounds.)
In this manner, modify the preutterance (the red vertical line) to the boundary between consonant and vowel. When this is done, close the Editor with the "Close" 「閉じる」 button.
Warning: In the voice configuration of continuous tones, see to it that the preutterance (the red vertical line) and the overlap (the green vertical line) do not move independently.
UTAU manual TOP > 15. Continuous sound sources