Content-type: text/html Downes.ca ~ Stephen's Web ~ Hello World in Speech Recognition

Stephen Downes

Knowledge, Learning, Community

This article is a bit complex (since it shows you how to write your own speech recognition system) but at the same time it's simple enough to allow us to grasp the essentials of one of the most intractable problems in learning: language. Let's take the simplest case possible: did a person say 'cat'? We solve this in two stages: we convert the audio to a feature matrix of digital data points called a 'spectrogram', then we feed this an input into a neural network that calculates the probability that we did indeed hear the word 'cat'. That's the gist of it. Everything else - how we prepare the audio, how we train the neural network, how we interpret the results - is detail. Messy messy detail.

Today: 5 Total: 1108 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Apr 24, 2024 11:48 a.m.

Canadian Flag Creative Commons License.

Force:yes