Documentation

Installation and use

The recognizer fbdtw uses a fairly standard implementation of the Dynamic Time Warp (DTW) algorithm. The user defines the vocabulary by specifying single words or a complete list of words. From each word a given number of references (default is 3) are prompted. Each recorded utterance is displayed for inspection. Sometimes, for example, the automatic word boundary detection fails. The user can reject such utterances with obvious faults.

Each utterance is stored in an own file. The file name is build from the label and a counter as shown in the following example

physik	#	5	.wav
label	separator	count	extension

In this case the German word Physik was spoken. This is the 5-th utterance of the word. Both the waveform and the derived feature vectors are stored.

Multiple users are supported. For each user a separate directory is created. All reference utterances from one user are then stored in the corresponding directory. All user directories are placed in a common directory (default name userdata).

We use the term users but the concept can be employed likewise to handle different sets of words from the same speaker. During recognition any number of users can be selected at the same time. For training a new word, however, exactly one and only one user has to be active. Likewise, a number of operations such as deleting words require that exactly one user is active. In this way ambiguities are avoidedand it is possible, to use the same word concurrently in different subsets.

No further information besides the names of the directories and the files is needed in fbdtw. Therefore - in addition to the commands in fbdtw - standard techniques can be applied to maintain the data. As an example, copying an user is achieved by simply copying the user directory (and restarting fbdtw). It is also possible to edit the wav-files and then calculate new feature files.

In this example four users are available. Currently only one speaker (steve) is loaded. The vocabulary consists of German city names. The last test utterance is shown in the window below the control window. The recognition yielded the word Darmstadt. For comparison all scores are shown in an additional window.

By default all test utterances are stored in a directory autosave with subdirectories for individual users. If only one user is activated, the utterances are stored in the corresponding directory. A special directory unknown is provided for the case of multiple active users. In both cases the utterances are named in#n.wav, n denoting the counter.

Menu options

System

OnOff+: Same as recognize button
Verbose+: More informtion
Verbose-: Less informtion
SingleWord: Toggle mode between single word mode (i. e. wait after recognition) and continuous mode.

User

New: create a new user
Rename: rename an existing user
Info: some information on currently selected users
Remove: remove one user (user has to be empty)

Vocab

New: Train one new word
NewFromList: Train all words in a list, i. e. a text file with one word per line
Rebuild: calculates feature files from audio files
Remove: delete selected words
Clear: delete all words
StoreLast: store the last utterance as reference. The recognition result is used as name assuming a correct recognition.
PlayLast: Play the last utterance.

Download

jar-archive
file with properties (Rename to fbdtw.ini)

Documentation

Installation and use

Menu options

Download

Categories