model 14524 views

nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs

Impact Summary

Problem Solved

Removed boilerplate and complexity from huge LLM frameworks like Megatron-LM for experimental researchers.

Improves On

HuggingFace Accelerate / minGPT

Abstract Snapshot

A complete rewrite of minGPT. Designed specifically to be simple, hackable, and fast for academic and educational environments. Capable of reproducing GPT-2 (124M) in a few hours on a modern GPU node.
LLMPyTorchGPTEducation

1 Reproductions

  • User

    Yann LeCun

    Verified reproduction

1 Citations

  • OpenAI Research cited this contextually

    in related peer review

Peer Comments Thread

Be the first to leave a verified peer review on this work.