Ilya's AI Paper List for John Carmack
John Carmack’s interview regarding AI and AGI shares an interesting story:
“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”
Here is the list:
(The higher-ranked items represent fundamental breakthroughs, critical architectures, or core theories that one must understand to grasp the evolution of AI.)
- Attention Is All You Need - Introduces the transformer architecture, foundational for most modern NLP models, including GPT.
- CS231n: Convolutional Neural Networks for Visual Recognition - Fundamental course and resource for understanding CNNs, critical in computer vision tasks.
- Deep Residual Learning for Image Recognition - Introduces ResNets, a groundbreaking architecture for deep learning and addressing vanishing gradients.
- ImageNet Classification with Deep Convolutional Neural Networks (2012.12) - A pioneering paper demonstrating the power of CNNs in large-scale image classification.
- Scaling Laws for Neural Language Models - Critical for understanding how neural networks scale and generalize as model sizes increase, including the behavior of large models like GPT.
- Understanding LSTM Networks - Explains LSTMs, which revolutionized sequence-based tasks before transformers took over.
- The Annotated Transformer: Attention is All You Need - A detailed walkthrough of the transformer model, essential for understanding modern NLP.
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism - Important for understanding model parallelism, especially for scaling large models.
- Neural Turing Machines - Proposes models with memory capabilities, stepping toward models that can simulate algorithmic reasoning.
- Recurrent Neural Network Regularization - Advances in regularizing RNNs, relevant to improving the robustness of sequence-based models.
- VARIATIONAL LOSSY AUTOENCODER - A critical work in unsupervised learning and latent variable models.
- Kolmogorov Complexity and Algorithmic Randomness - Important theoretical background for complexity theory in AI.
- Pointer Networks - Key for understanding models that output sequences of discrete elements, relevant to tasks such as combinatorial optimization.
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin - Advances in speech recognition using end-to-end learning, essential for voice-based AI applications.
- Neural Message Passing for Quantum Chemistry - Important for applications of neural networks to graph structures and quantum chemistry.
- MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS - Relevant for understanding improvements in convolutions, particularly in image segmentation tasks.
- Machine Super Intelligence (2008.6) - Philosophical and long-term view of superintelligence, important for understanding AGI.
- A Tutorial Introduction to the Minimum Description Length Principle (2004.6) - Theoretical foundation on model selection and complexity.
- The Unreasonable Effectiveness of Recurrent Neural Networks - A blog post that made RNNs more accessible and impactful.
- Relational recurrent neural networks - Focuses on extending RNNs with relational structures, useful in more complex AI reasoning tasks.
- ORDER MATTERS: SEQUENCE TO SEQUENCE FOR SETS - Provides improvements to sequence-to-sequence models, vital for many NLP applications.
- Identity Mappings in Deep Residual Networks - Builds on the ResNet paper, further improving deep learning models.
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (1993.8) - Focuses on regularization in neural networks, relevant for generalization and simplicity.
- Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton - A more theoretical work focusing on complexity in closed systems, which can be related to AGI.
- The First Law of Complexodynamics (2011.9) - Explores concepts related to complexity in computation, with a broad theoretical impact.
- A simple neural network module for relational reasoning - Focuses on improving relational reasoning in neural networks.
- NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - Early transformer-based approaches to machine translation, influential but now overtaken by newer models.
I will add
- 1948: Claude E. Shannon, “A Mathematical Theory of Communication”. Bell System Technical Journal. Introduced the notions of information and entropy.
- Backprop
- Support-vector networks