Neural Scaling Laws and GPT-3

21 Oct 2020

Neural Scaling Laws and GPT-3

Jared Kaplan, Johns Hopkins University, 12:00 EDT

Abstract: A variety of recent works suggest that scaling laws are ubiquitous in machine learning. In particular, neural network performance obeys scaling laws with respect to the number of parameters, dataset size, and the training compute budget. I will explain these scaling laws, and argue that they are both precise and highly universal. Then I will explain how this way of thinking about machine learning led to the GPT-3 language model, and what it suggests for the future.

A video of the talk is available.

Series

seminars

Physics ∩ ML

Neural Scaling Laws and GPT-3