This post reflects on a paper by Arun Narayanan et al., Recognizing Long-Form Speech Using Streaming End-To-End Models. Mark Liberman observes, " Modern AI (almost) works because of machine learning techniques that find patterns in training data, rather than relying on human programming of explicit rules. A weakness of this approach has always been that generalization to material different in any way from the training set can be unpredictably poor. (Though of course rule- or constraint-based approaches to AI generally never even got off the ground at all.) "End-to-end" techniques, which eliminate human-defined layers like words, so that speech-to-text systems learn to map directly between sound waveforms and letter strings, are especially brittle." What this means is that AI is still (and for the foreseeable future) limited to specific context-insensitive domains.