Content-type: text/html Downes.ca ~ Stephen's Web ~ This could lead to the next big breakthrough in common sense AI

Stephen Downes

Knowledge, Learning, Community

This article is about combining language models in AI, like GPT-3, with computer vision, a process called 'vokenization'. The idea is that it will enable a system like GPT-3 to distinguish between the linguistic expression 'black sheep' and the visual recognition of black sheep and white sheep. This has been triued before; the process involves combining images with captions and presenting both to the AI. But captions are hard to get right, and so the training sets are (comparatively) tiny. Vokenization is the approach used to get around this problem. Instead of starting with images and manually adding captions, they start with language and automatically associate images using image recognition. Why would we do this? "If we want to build robotic assistants, for example, they need computer vision to navigate the world and language to communicate about it to humans.

Today: 5 Total: 1113 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Apr 26, 2024 10:12 p.m.

Canadian Flag Creative Commons License.

Force:yes