October 23, 2024 • 4 min read

Ilya's AI Paper List for John Carmack

John Carmack’s interview regarding AI and AGI shares an interesting story:

“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”

Here is the list:

(The higher-ranked items represent fundamental breakthroughs, critical architectures, or core theories that one must understand to grasp the evolution of AI.)

Attention Is All You Need - Introduces the transformer architecture, foundational for most modern NLP models, including GPT.
CS231n: Convolutional Neural Networks for Visual Recognition - Fundamental course and resource for understanding CNNs, critical in computer vision tasks.
Deep Residual Learning for Image Recognition - Introduces ResNets, a groundbreaking architecture for deep learning and addressing vanishing gradients.
ImageNet Classification with Deep Convolutional Neural Networks (2012.12) - A pioneering paper demonstrating the power of CNNs in large-scale image classification.
Scaling Laws for Neural Language Models - Critical for understanding how neural networks scale and generalize as model sizes increase, including the behavior of large models like GPT.
Understanding LSTM Networks - Explains LSTMs, which revolutionized sequence-based tasks before transformers took over.
The Annotated Transformer: Attention is All You Need - A detailed walkthrough of the transformer model, essential for understanding modern NLP.
GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism - Important for understanding model parallelism, especially for scaling large models.
Neural Turing Machines - Proposes models with memory capabilities, stepping toward models that can simulate algorithmic reasoning.
Recurrent Neural Network Regularization - Advances in regularizing RNNs, relevant to improving the robustness of sequence-based models.
VARIATIONAL LOSSY AUTOENCODER - A critical work in unsupervised learning and latent variable models.
Kolmogorov Complexity and Algorithmic Randomness - Important theoretical background for complexity theory in AI.
Pointer Networks - Key for understanding models that output sequences of discrete elements, relevant to tasks such as combinatorial optimization.
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin - Advances in speech recognition using end-to-end learning, essential for voice-based AI applications.
Neural Message Passing for Quantum Chemistry - Important for applications of neural networks to graph structures and quantum chemistry.
MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS - Relevant for understanding improvements in convolutions, particularly in image segmentation tasks.
Machine Super Intelligence (2008.6) - Philosophical and long-term view of superintelligence, important for understanding AGI.
A Tutorial Introduction to the Minimum Description Length Principle (2004.6) - Theoretical foundation on model selection and complexity.
The Unreasonable Effectiveness of Recurrent Neural Networks - A blog post that made RNNs more accessible and impactful.
Relational recurrent neural networks - Focuses on extending RNNs with relational structures, useful in more complex AI reasoning tasks.
ORDER MATTERS: SEQUENCE TO SEQUENCE FOR SETS - Provides improvements to sequence-to-sequence models, vital for many NLP applications.
Identity Mappings in Deep Residual Networks - Builds on the ResNet paper, further improving deep learning models.
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (1993.8) - Focuses on regularization in neural networks, relevant for generalization and simplicity.
Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton - A more theoretical work focusing on complexity in closed systems, which can be related to AGI.
The First Law of Complexodynamics (2011.9) - Explores concepts related to complexity in computation, with a broad theoretical impact.
A simple neural network module for relational reasoning - Focuses on improving relational reasoning in neural networks.
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - Early transformer-based approaches to machine translation, influential but now overtaken by newer models.

I will add

1948: Claude E. Shannon, “A Mathematical Theory of Communication”. Bell System Technical Journal. Introduced the notions of information and entropy.
Backprop
Support-vector networks