Content-type: text/html Downes.ca ~ Stephen's Web ~ How to Automatically Generate Textual Descriptions for Photographs with Deep Learning

Stephen Downes

Knowledge, Learning, Community

OK, you're not actually going to learn how to do this simply by reading the article, but you will learn how it's done, and more importantly, that it can be done. The task breaks down into three parts: classifying images (do you see a cat, a rabbit?), describing images (providing a natural language summary of the content), and annotating images (generating text descriptions for specific parts of the image). So basically we're associating object recognition with language strings (in English, in French, whatever). Going further, the neural networks can act as feature extractors, which map images to "an internal representation of the image, not something directly intelligible." Language generation algorithms, coder-decoder algorithms, and an attention mechanism mechanism round out the picture. It's pretty interesting.

Today: 8 Total: 1160 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Apr 24, 2024 12:52 p.m.

Canadian Flag Creative Commons License.

Force:yes