DeepSpeech setup
It's been a few months since I have built DeepSpeech (today is August 13th, 2018), so these instructions probably need to be updated. They are for building DeepSpeech on Debian or a derivative, but should be fairly easy to translate to other systems by just changing the package manager and package names.
[~]$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip
Apparently you need bazel in order to build bazel, so download a copy and unzip it
[~]$ wget https://github.com/bazelbuild/bazel/releases/download/0.4.5/bazel-0.4.5-dist.zip
[~]$ unzip -d bazel-0.4.5-dist bazel-0.4.5-dist.zip
[~]$ cd bazel-0.4.5-dist
The script is in 555 mode when you unzip it, so you have to make it writable
[~/bazel-0.4.5-dist]$ chmod a+w scripts/bootstrap/compile.sh
If you are building for the Raspberry Pi, do these things (note that DeepSpeech did not run very well on the Raspberry Pi the last time I checked. The language model was too large to fit in memory, and without it DeepSpeech just returns raw phonemes):
- vi scripts/bootstrap/compile.sh
- Goto line 117 and add -J-Xmx500M
- Save and quit
It also sounds like maybe the cc_configure.bzl script has some trouble figuring out the platform, so you give it a little help by editing the _get_cpu_value function and just tell it to always return “arm”
Now build it
[~/bazel-0.4.5-dist]$ ./compile.sh
Build successful! Binary is here: ~/bazel-0.4.5-dist/output/bazel
[~/bazel-0.4.5-dist]$ sudo cp -iv output/bazel /usr/local/bin/
[~/bazel-0.4.5-dist]$ cd ..
[~]$ git clone https://github.com/mozilla/tensorflow.git
[~]$ git clone https://github.com/mozilla/DeepSpeech.git
[~]$ cd tensorflow/
[~/tensorflow]$ ln -s ../DeepSpeech/native_client/ ./
[~/tensorflow]$ ./configure
[~/tensorflow]$ bazel build -c opt --copt=-O3 //native_client:libctc_decoder_with_kenlm.so
[~/tensorflow]$ bazel build --config=monolithic -c opt --copt=-O3 --copt=-fvisibility=hidden --incompatible_load_argument_is_label=false //native_client:libdeepspeech.so //native_client:deepspeech_utils //native_client:libctc_decoder_with_kenlm.so //native_client:generate_trie
[~/tensorflow]$ cd native_client
[~/tensorflow/native_client]$ make deepspeech
[~/tensorflow/native_client]$ PREFIX=/usr/local sudo make install
[~/tensorflow/native_client]$ make bindings
[~/tensorflow/native_client]$ pip install dist/deepspeech-*.whl
I recommend using PocketSphinx for passive listening and DeepSpeech for active listening. To use it as the active listener with Naomi, you will need to add a section like this to your profile.yml file:
active_stt:
engine: deepspeech-stt
deepspeech:
model: '/home/user/models/output_graph.pb'
alphabet: '/home/user/models/alphabet.txt'
language_model: '/home/user/models/lm.binary'
trie: '/home/user/models/trie'