Content-type: text/html ~ Stephen's Web ~ Let's reproduce GPT-2 (124M)

Stephen Downes

Knowledge, Learning, Community

I am not likely to ever find the time to work though this exercise - it's a four-hour video demonstrating every step involved to reproduce GPT-2. "We reproduce the GPT-2 (124M) from scratch," writes Andrej Karpathy. "This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations." What I really appreciate about this that it exists. As one commenter says, "Andrej is doing himself what OpenAi was supposed to do in the early days — make AI open." Expand the description for a full table of contents.

Today: 5 Total: 19 [Direct link] [Share]

Stephen Downes Stephen Downes, Casselman, Canada

Copyright 2024
Last Updated: Jul 23, 2024 11:43 a.m.

Canadian Flag Creative Commons License.