New Study Shows ChatGPT’s Inaccuracy in Answering Software Engineering Prompts
A recent study conducted by Purdue University has revealed that while OpenAI’s ChatGPT chatbot is a convenient resource for obtaining conversational answers, it may not be suitable for software engineering prompts. Traditionally, programmers have relied on Stack Overflow, a question-and-answer platform, for advice and guidance on their projects. However, the time-consuming nature of waiting for responses led many software engineers to turn to ChatGPT for instant answers.
To assess the effectiveness of ChatGPT in answering software engineering questions, researchers at Purdue University provided the chatbot with 517 Stack Overflow questions and evaluated the accuracy and quality of its responses. The findings were rather alarming, with 52% of ChatGPT’s answers being incorrect, while only 48% were correct. Additionally, a staggering 77% of the answers provided were overly verbose.
Despite the high rate of inaccuracy, the study did reveal that ChatGPT’s responses were comprehensive in addressing all aspects of the questions 65% of the time. To gain further insight into the quality of the chatbot’s responses, the researchers enlisted the help of 12 participants with varying levels of programming expertise. The participants overwhelmingly preferred Stack Overflow’s responses over ChatGPT’s across different categories. However, they struggled to identify incorrect answers generated by ChatGPT, doing so only 60.66% of the time.
The study highlighted that the well-articulated nature of ChatGPT’s responses often led users to overlook incorrect information. “Users overlook incorrect information in ChatGPT answers (39.34% of the time) due to the comprehensive, well-articulated, and humanoid insights in ChatGPT answers,” the authors of the study noted. This issue of generating plausible but incorrect answers is a significant concern across all chatbots, as it can contribute to the spread of misinformation.
In addition to the risk of spreading inaccurate information, the low accuracy scores revealed by the study should give software engineers and programmers pause when considering using ChatGPT for software engineering prompts. While the chatbot may provide comprehensive responses, the lack of accuracy undermines its reliability as a resource for programming advice.
As the field of artificial intelligence continues to evolve, it is crucial to critically evaluate the capabilities and limitations of AI systems like ChatGPT. While chatbots can be valuable tools for certain applications, it is essential to exercise caution and verify the accuracy of their responses, particularly in domains as critical as software engineering.