ByteDance, the owner of TikTok, has developed a new system called “Self-Controlled Memory” that enhances the capabilities of language models like ChatGPT. This system allows the AI program to access a vast database of dialogue and characters, enabling it to answer questions about past events with superior accuracy. Researchers from the University of California at Santa Barbara and Microsoft have published a paper titled “Augmenting Language Models with Long-Term Memory,” which introduces a new component to language models. The paper addresses the limitations of current models, such as ChatGPT, which cannot process long-form information beyond a fixed-sized session.
One of the main challenges in expanding the input window for language models is the quadratic computational complexity of the attention operation. Increasing the amount of data fed into the model exponentially increases the time it takes to produce an answer. To overcome this challenge, some scholars have attempted to create a crude memory system. For example, Google introduced the Memorizing Transformer, which stores previous answers for future reference. However, this approach has limitations as the data can become “stale” over time.
The solution proposed by Wang and his team, called “Language Models Augmented with Long-Term Memory” or LongMem, combines a traditional language model with a second neural network called the SideNet. As the language model processes the input, it stores relevant information in its memory bank. The SideNet then compares the current prompt with the contents of the memory to identify relevant matches. Unlike the Memory Transformer, the SideNet can be trained separately, allowing it to improve its ability to select non-stale information from the memory.
To evaluate the performance of LongMem, the researchers conducted tests using three datasets that involved summarizing long texts, including whole articles and textbooks. The results showed that LongMem outperformed both the Memorizing Transformer and OpenAI’s GPT-2 language model. Despite having significantly fewer neural parameters than GPT-3, LongMem achieved a state-of-the-art score of 40.5%. This demonstrates the program’s ability to comprehend long-context information stored in its memory and effectively complete language modeling tasks.
The research conducted by Microsoft aligns with recent work done by ByteDance. In a paper titled “Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System,” ByteDance researcher Xinnian Liang and colleagues introduced a program that allows language models to store and access very long sequences of information. This system, known as the “Self-Controlled Memory system” (SCM), improves the model’s ability to contextualize prompts and generate appropriate responses. The SCM evaluates the user’s input and determines whether it needs to access the memory stream, which contains past interactions between the user and the program.
These advancements in language models and memory systems have the potential to revolutionize AI technology. By enhancing the models’ ability to process long-form information and access relevant memories, AI programs can provide more accurate and contextually appropriate responses. This research paves the way for future developments in the field of generative AI and opens up new possibilities for applications in various industries.