On the Road to Universal Learners
Blog: Decision Management Community
Peter Norvig: “Given example inputs and outputs of any function that can be computed by any computer, a neural net can learn to approximate that function.” He refers to this paper “Auto-Regressive Next-Token Predictors are Universal Learners“: We demonstrate that even simple models such as linear next-token predictors, trained on Chain-of-Thought (CoT) data, can approximate any function efficiently computed by a Turing machine. We introduce a new complexity measure — length complexity — which measures the number of intermediate tokens in a CoT sequence required to approximate some target function, and analyze the interplay between length complexity and other notions of complexity. Our results demonstrate that the power of language models can be attributed, to a great extent, to the auto-regressive next-token training scheme, and not necessarily to a particular choice of architecture. Link