This blog post is available in audio format as well. You can listen using the player below or download the mp3 file for listening at your convenience.
Section 1 | Bigger Isn’t Always Better
In recent years, we’ve seen a significant trend in the AI space: big tech companies like Google, OpenAI, Meta, Stability AI, and others have been steadily increasing the amount of compute resources allocated to training and inference of AI models. These companies are building enormous data centers filled with thousands of GPUs, TPUs, and custom chips, driving the development of state-of-the-art models. The assumption behind this approach is simple: larger models and more compute will result in more intelligent systems.
However, this strategy may be missing a critical insight: human intelligence isn’t a product of sheer computational power. The human brain operates with remarkable efficiency, using only about 20 watts of energy, less than a household light bulb, to perform tasks far beyond the capability of today’s AI models. This comparison between AI and the brain raises an important question: is scaling compute the right path to AGI, or are we overlooking smarter, more sustainable approaches?
Section 2 | The Race For More Compute: Where Does it Lead?
The logic behind scaling compute is clear at first glance: larger models like GPT-4, PaLM, or DALL·E, trained on billions of parameters, exhibit impressive capabilities. These models can generate text, answer questions, create images, and handle complex tasks that were unimaginable just a few years ago. Google, for instance, has been ramping up TPU clusters, and OpenAI’s GPT models have scaled to hundreds of billions of parameters, requiring massive computational resources to train and deploy.
However, the human brain offers a stark contrast in how intelligence is achieved. Neuronal signals in the brain propagate far more slowly than modern processors, yet the brain’s massively parallel architecture allows it to perform complex computations efficiently. This organic parallelism is missing in current AI systems, which rely on serial processing and hardware-based parallelization. Even with billions of operations per second, current AI systems still cannot match the adaptability and efficiency of the brain.
Moreover, scaling compute to achieve better AI results is reaching a point of diminishing returns. While it’s true that more GPUs and larger datasets can improve performance, this brute-force approach is inefficient.
There are several limitations:
- Training a model like GPT-4 costs tens of millions of dollars in compute resources alone. As model sizes increase, the cost of both training and inference scales non-linearly.
- Larger models require exponentially more energy, contributing to increasing carbon footprints and making AI less sustainable. The human brain, by contrast, operates on just 20 watts of power while performing tasks that AI models struggle to emulate.
- Even once a large model is trained, deploying it at scale for inference (i.e., making predictions or generating outputs in real-time) requires significant computational resources, especially for models with hundreds of billions of parameters.
At some point, throwing more compute at these models becomes unsustainable, especially when trying to move towards AGI. What we need is not just “bigger,” but “smarter” approaches that can do more with less, similar to how the brain optimizes its processing power through specialized regions and highly efficient parallelism.
Section 3 | The Efficiency of Reinforcement Learning
Reinforcement learning (RL) offers an alternative that may be more efficient, especially when it comes to scaling intelligence in a way that mimics human learning. RL-based systems learn by interacting with an environment, receiving feedback (in the form of rewards or penalties), and gradually improving their performance. Unlike static supervised deep learning models that require huge amounts of data and compute to learn from correlations, RL agents adapt over time through experience.
One of the main advantages of RL agents is that once they’re trained, they can complete tasks efficiently with significantly lower computational requirements compared to massive pre-trained models.
Let’s break down why this is important:
- Yes, the training phase for RL agents can be resource-intensive, as the agent explores its environment and learns from its mistakes. However, once the agent has been sufficiently trained, the computational cost of executing a task (inference) is much lower than for a massive model like GPT-4.
- The key difference is that while large transformers require a vast amount of compute at both training and inference stages (since all those parameters have to be loaded into memory every time), a well-trained RL agent can efficiently execute the task without needing to call on an extensive amount of parameters or massive compute.
This mirrors the human brain’s ability to process information with minimal energy once it has learned from experience. For example, while an RL agent might initially require thousands of episodes to learn a video game, it can play the game efficiently once trained, just as the brain processes tasks with minimal effort after learning.
Section 5 | Balancing RL with Guided Narrow AI
While RL offers an efficient pathway to AGI, narrow AI models like large language models (LLMs) still have their place. Rather than seeing these as competing approaches, we can combine them effectively. For example, LLMs can act as evaluators or teachers, providing feedback and shaping the RL agent’s behavior during training. This is particularly useful when the RL agent is learning language tasks. A language model can score the agent’s responses, assigning a reward or penalty, helping it learn faster. This would be something like “Reinforcement Learning with AI Feedback.”
This combination of RL and narrow AI allows us to create systems that are more efficient and adaptive. Instead of relying solely on brute-force compute, we use reinforcement learning to hone the agent’s abilities and apply narrow AI models to guide it along the way.
Section 6 | RL’s Long-Term Computational Efficiency
In this section, I want to dig a bit deeper into why RL-trained agents can be so efficient once trained, using a simple analysis:
- RL agents require significant exploration during the training phase. If we denote the number of interactions with the environment as T and the number of states and actions as S and A, respectively, the cost of training is often proportional to T × S × A. However, this cost occurs primarily during training.
- Once trained, an RL agent’s policy function is a mapping from states to actions. If we denote the size of the policy as P, the inference cost is proportional to O(P), which can be significantly smaller than the O(n) inference cost of a traditional model like GPT-4, where n is the number of parameters (in the range of billions for large models).
- The key efficiency gain with RL is that the policy learned by the agent doesn’t require constant recalculation of all parameters at inference time. Instead, it uses its learned policy to make decisions directly, making it more scalable and adaptable to new environments without the need for retraining or massive inference costs.
Thus, while the up-front cost of training an RL agent may be significant, the long-term gains in efficiency, both in terms of compute and scalability, make it an attractive approach for tasks that require adaptability and generalization. This mirrors the brain’s own efficiency once it has learned a task.
Section 7 | Smarter Approaches to AGI
As AI researchers, we are at a crossroads. The current strategy of building ever-larger models and throwing more compute at the problem has led to impressive advancements, but it’s not a sustainable path to AGI. We need to start thinking about intelligence in a more scalable and efficient way, and reinforcement learning offers a promising avenue.
RL’s ability to learn from experience, combined with narrow AI models that act as guides, presents a more human-like path to intelligence. The human brain, after all, doesn’t rely on brute force but rather learns through interaction and adapts over time. Similarly, RL agents, once trained, can adapt to tasks with far lower computational requirements than the large, static models we currently rely on.
AGI won’t come from more GPUs or larger data centers; it will come from creating systems that can learn, adapt, and generalize with fewer resources. That’s the real challenge, and opportunity, before us.