ChatGPT, the popular chatbot powered by the GPT-3 language model, has shown remarkable problem-solving abilities that rival or even surpass those of undergraduate students, according to a recent study conducted by psychologists at the University of California, Los Angeles (UCLA). The researchers discovered that when presented with reasoning problems typically found on intelligence tests or exams like the SAT, GPT-3 performed at a level comparable to US college undergraduates.
To evaluate GPT-3’s capabilities, the researchers converted complex arrays of shapes into a text format that the model could process. They ensured that the model had never encountered these questions before. They also presented the same problems to 40 UCLA undergraduates. Surprisingly, GPT-3 correctly solved 80% of the problems, surpassing the average score of just under 60% achieved by the human participants.
In addition to the reasoning problems, the researchers also tested GPT-3’s ability to solve SAT “analogy” questions, which involve selecting pairs of words that are linked in some way. They specifically chose questions that they believed had not been published on the internet, as GPT-3 had been trained on a vast amount of online data. Comparing the AI’s performance with college applicants’ SAT scores, the UCLA team found that GPT-3 outperformed the average human score.
However, GPT-3 did not fare as well in a different test. The researchers asked both the model and the student volunteers to match a passage of prose with a different short story that conveyed the same meaning. In this particular test, GPT-3 performed worse than the students. Nevertheless, the study revealed that GPT-4, the improved successor to GPT-3, performed better than its predecessor in this task.
The study highlighted GPT-3’s impressive ability to spot patterns and infer relationships, often matching or surpassing human capabilities. Lead author Taylor Webb emphasized that ChatGPT’s underlying model is not equivalent to artificial general intelligence or human-level intelligence. It struggles with social interactions, mathematical reasoning, and problems that require understanding physical space. For instance, GPT-3 finds it challenging to determine the best tools for transferring sweets from one bowl to another. Nonetheless, the technology has made significant progress in its specific area of expertise.
Webb, a postdoctoral researcher in psychology at UCLA, stated, “It’s definitely not fully general human-level intelligence. But it has definitely made progress in a particular area.” The researchers acknowledged that without access to the internal workings of GPT-3, developed by OpenAI in San Francisco, they cannot ascertain how the model’s reasoning abilities function or whether it exhibits a new form of intelligence.
Keith Holyoak, a psychology professor at UCLA, expressed curiosity about GPT-3’s thinking processes, stating, “GPT-3 might be kind of thinking like a human. But on the other hand, people did not learn by ingesting the entire internet, so the training method is completely different. We’d like to know if it’s really doing it the way people do, or if it’s something brand new – a real artificial intelligence – which would be amazing in its own right.”