Montezuma's Revenge
— Curiosity, Humanity’s secret algorithm
If you’re careful enough, nothing bad or good will ever happen to you.
— Ashleigh Brilliant
During my travels, I often carry a few books to enrich the journey, after all, it's often from the journey of the mind that we benefit the most from each trip. On a recent trip to Mexico in December 2022, I immersed myself in Brian Christian's enlightening exploration of AI, "The Alignment Problem". Standing in the shadows of ancient pyramids, I was fortunate enough to avoid the colloquial Montezuma's Revenge, a common traveler’s woe. Against the backdrop of these majestic structures and the rich narrative woven by Christian, I had an epiphany pertaining to another dimension of Montezuma's Revenge.
In 2015, a significant stride was made in the field of artificial intelligence with the integration of Deep Learning and Reinforcement Learning, this leap forward was symbolized by a paper published in Nature, "Learning Curve: Self-taught AI Software Attains Human Performance in Video Games." DeepMind showcased its hybrid AI solution (The Deep Q-Network), blending neural networks and reinforcement learning, exhibiting human-comparable and surpassing performance in a plethora of Atari 2600 video games.
The Deep Q-Network (DQN) focused its AI algorithm solely on rewards and punishments, aiming to gain more points and avoid death. This seemingly straightforward algorithm was able to accomplish feats in gaming that far exceeded human capacities. For instance, in Video Pinball, it scored twenty-five times higher than a professional human game tester. In Boxing, it bested human performance by seventeen-fold, and in Breakout, its proficiency outstripped humans by thirteen times. These successes were achieved using a generic model, which adapted seamlessly across many games without the need for individual tweaking.
Yet, certain games presented unique challenges, standing defiant against DQN's AI capabilities, Montezuma's Revenge was one such example: a 1984 classic in which the player, assuming the role of an intrepid explorer named Panama Joe, navigates a complex Aztec pyramid filled with obstacles and traps. Scale ladders, traverse ropes, gather keys to unlock doors and seize precious jewels, swords, and the coveted torch. In the face of slithering snakes, dancing skulls, and secretive spiders lurking within the game's depths, the ultimate triumph is the discovery of a treasure room, concealed at the end of this subterranean labyrinth.
In this scenario, the otherwise formidable DQN could only scrape together a meager 0% of the human benchmark score - its scoreboard read a disheartening zero! You can catch a glimpse of DQN's rather dismal attempt at Montezuma's Revenge on this YouTube link.
What led to this unexpected defeat? Essentially, Montezuma's Revenge is a game that is unforgiving of errors and rewards players mostly at the end of a long series of exploratory and deliberate actions (quite like building a company and achieving a successful exit). The DQN's machine learning strategy, which is based on random button pressing - a method known as epsilon-greedy exploration - proved to be flawed in this harsh penalty and sparse rewards. This created a trap, based on the risk and reward calculation, the AI is not willing to leave the safety of the initial chamber, resulting in stagnation and an inability to progress. In effect, the AI kept Panama Joe safely in the first pyramid chamber, never venturing to the next screen or exploring the subsequent chambers.
Why humans can do so much better in Montezuma’s Revenge than the machines? Why did the reward & punishment-driven AI fail? The reason seems hidden in plain sight: When human players play Montezuma's Revenge, it’s not only for the score, the extra points. We naturally yearn to explore the unseen, to climb that ladder, to reach distant platforms and unlock closed doors, just to see what happens, driven not by the prospect of points, but by a deeper, purer curiosity.
Maybe the key to mastering such games isn't introducing more sophisticated rewards and punishments, but an entirely different perspective. Could the answer lie in designing an AI agent motivated intrinsically, rather than extrinsically? An AI agent driven by curiosity, willing to cross the road not for a reward, but simply to see what lies on the other side?
The reinforcement learning community took note and began to explore the concept of intrinsic novelty preference in computational terms. A straightforward idea was to encourage a learning agent to prefer actions it had never performed. This concept was once proposed by Richard Sutton as early as the 1990s, who suggested rewarding an agent for exploring uncharted territory or revisiting an old path after a long hiatus, in other words, asking the AI to be more curious, and try new things, even means there is no immediate gain or even risk penalties. And that added curiosity pixie dust changed everything, when the novelty reward (curiosity algorithm) was layered to the AI Agent, the AI starts ace in Montezuma’s Revenge, scoring much higher than most experienced human players.
The true secret to creating an AI that mimics human intelligence lies in encapsulating a fundamental aspect of human nature - curiosity. This natural “algorithm” is easily observed in children, with an instinctive drive to explore and understand their world without regard for rewards or punishments. As we transition into adulthood, this intrinsic curiosity often gets stifled or overshadowed by a world overwhelmingly focused on rewards, punishments, and the constant pursuit of “keeping score”.
Our societal structures, workplace norms, and educational systems often prioritize tangible results and direct rewards (extrinsic motivation), causing us to neglect this innate curiosity (intrinsic motivation). This shift is the very antithesis of our childhood selves, who learn and explore purely for the joy and fascination it brings.
To develop an AI that truly mimics human intelligence, we need to reignite this curiosity in our approach. We should seek not to merely replicate adult logic or societal constructs in our AI, but to encapsulate this fundamental child-like curiosity - a trait that remains one of the most potent forces for learning, exploration, and creativity.
Going to the next pyramid chamber, navigating uncharted territories, and embracing novel ways undoubtedly come with risks, yet it's precisely this boldness that embodies the pioneering spirit of innovation and scientific discovery, a spirit deeply ingrained in the ethos of humanity.
Personally, In my younger and more vulnerable years, I ventured into many fruitless new experiences, not out of the pursuit of immediate rewards or avoidance of punishments, but driven by curiosity in exploration. I was fortunate to embark on countless new ventures, meet great people, and stray far beyond my starting point, forging a path distinctively my own.
Looking back on my own journey, I recognize my humble beginnings in Shandong, China, where I was born, raised, and educated. My subsequent move to Texas — the greatest frontier — was akin to Panama Joe daring to leave the safety of his initial Aztec chamber to delve deeper into the uncharted realms of the enigmatic pyramid. Curiosity led me to cross the ocean, from one role to another, amidst numerous failures and adversities, as an entrepreneur, and as a wonderer, I found the path of staying curious is compelling and rewarding.
To you, my fellow entrepreneurs, the innovators, the trailblazers - understand that AI strives to decode the very essence of your curiosity, one of the secret ingredients that set us apart as humans. We go to the new chamber in the pyramid, not because it’s safe, but because we want to explore, we try new things not because they give us points in life, but because it satisfies our curiosity.
Never curb your curiosity, remain eager to try new things, let the thirst for a new horizon drive you, the spark of innovation guide you, and the pursuit of understanding enlighten you. Curiosity is not only a distinctive human trait, it's a transformative force, a catalyst for learning, and innovation, igniting the flame of knowledge and discovery that brightens the path of progress.
Psychologist Daniel Berlyne once said, "My first interest is interest." So, as you forge ahead on your entrepreneurial journey, let curiosity guide your path, and don’t pay too much attention to the scoreboard, your curiosity is the shining spark of the human spirit. It’s the secret code that AI is striving to copy from us, one of humanity's most potent and yet often overlooked algorithms. Montezuma may have its revenge once in a while, but humanity’s secret algorithm, curiosity, always finds the path forward.
By Isaac Shi
June 1, 2023
References:
Brian Christian’s The Alignment Problem
Nature Magazine, Feb 26, 2015 Issue
Google DeepMind Research Publications