Artificial intelligence (AI) programs have been developed to play Minecraft without human intervention, with significant investments in different approaches. OpenAI, the creator of ChatGPT, has spent enormous sums of money to hire human players of the game to capture video footage that can develop AI to play the game by imitating people’s moves. A team led by Zihao Wang of Peking University in Beijing described what the team believes is “the first multi-task agent that can robustly accomplish 70+ Minecraft tasks” in February. However, the state of the art moves fast. A team led by Nvidia last week said they had come up with the first “lifelong learning agent” that refined its approach to the game based on trying out different techniques and then saving its achievements to a library of techniques.
The program, called Voyager, is described in a paper posted on the arXiv pre-print server, penned by Guanzhi Wang of Nvidia and Caltech, and colleagues from UT Austin, Stanford, and Arizona State University. An advisor to the team is Nvidia’s senior director of AI research Anima Anandkumar. Voyager makes use of GPT-4, the latest “large language model” from ChatGPT creator OpenAI. GPT-4 was unveiled in March, although OpenAI declined to describe the technical aspects of the program. The GPT-4 code is better than prior versions and better than many other large language models, or, LLMs, at many tasks for which ChatGPT is used, such as answering natural-language challenges and writing code, according to OpenAI.
GPT-4 is used in three ways in Voyager. One is to take the current inventory of possessions in Minecraft and use them to come up with a new challenge for the Voyager program. The second function of GPT-4 in Voyager is to input that new challenge and generate code to make the next move in Minecraft. GPT-4 writes program code to run in Minecraft, and each bit of code is tested in Minecraft, and the feedback is then fed back into GPT-4, which then refines the code. The authors describe this trial-and-error process of code as “iterative prompting,” because of the loop of code/feedback/recode via the GPT-4 prompt. A second instance of GPT-4 is used as a critic to test each code invention and determine if it is successful. That is known as “self-verification.”
For example, if the initial program code is to send the instruction to Minecraft to fashion an “acacia axe,” an axe made of the acacia plant, it will fail because there is no such thing as an acacia axe in Minecraft. The failure of that instruction is handled by Voyager as an “execution error,” and the program revises its Minecraft code and tries again. The most interesting part comes with what’s called a library, where Voyager stores those bits of code it has tried and tested and found successful, which are known as “skills.” In just the way that GPT-4 predicts the next word in a sentence, Voyager can mine this library for suggested actions in the future. GPT-4 starts with a “query” — something like “craft an iron pickaxe” — then it searches the library for the “key” — the stored description of a skill — and retrieves the required skill as the output, the “value” of that query-key combo, much like a database search.
Using what are called ablation studies — removing parts of the program — Wang and team find that the most critical element in the entire Voyager construction is the critic, the self-verification unit. “Self-verification is the most important among all the feedback types” that Voyager receives, they write. “Removing the module leads to a significant drop (-73%) in the discovered item count,” from which they deduce that “Self-verification serves as a critical mechanism to decide when to move on to a new task or reattempt a previously unsuccessful task.”
A comparison of Nvidia’s Voyager against other automated agents proceeding up through the game’s so-called tech tree of achievements shows that the program is measurably faster at accomplishing new tasks, and it’s so far the only automation of Minecraft that can unlock the highly prized diamond level of implements. The numbers along the bottom of the graphic represent the number of prompt iterations for the programs.
The ability to fashion a diamond tool in the video game Minecraft, one of the game’s high-level challenges, is becoming mundane for artificial intelligence. And now, something like memory is coming to AI’s ability in the popular computer game. Matched against other automatic systems, the technology consistently achieves milestones in Minecraft faster.
To test Voyager against the state of the art in automated Minecraft, the authors cobble together some other AI programs because, as they put it, “there are no LLMs that play Minecraft out of the box.” The programs they test against, what constitute their baseline, include MineDojo, a program developed by some of the same contributors last year that won an “outstanding paper award” at the NeurIPS AI conference; ReAct, a Google invention introduced this year; and MINERL, a program developed by a team from Carnegie Mellon University and others.
In conclusion, the Voyager program is a significant development in the field of artificial intelligence and gaming. With the use of GPT-4 and a self-verification unit, it is able to refine its approach to the game and achieve milestones faster than other automated systems. As AI technology continues to advance, it will be interesting to see how it will impact the gaming industry and what new challenges will arise for AI programs to overcome.