Are We Offloading Critical Thinking to Chatbots?
Research, much of it by companies with deep investment in AI, suggests that chatbot interactions alter how users think.
In January, researchers at Microsoft and Carnegie Mellon University posted a study online on how artificial intelligence tools like ChatGPT affect critical thinking. They wanted to know how knowledge workers — people who rely on their intellectual skills for their jobs — were interacting with the tools. And through detailed surveys, their findings suggest that the higher the confidence those workers felt in generative AI, the less they themselves relied on critical thinking.
As one of the test subjects noted in the study, “I use AI to save time and don’t have much room to ponder over the result.”
The Microsoft paper is part of a nascent but growing body of research: Over the past two years, as more people have experimented with generative AI tools like ChatGPT at work and at school, cognitive scientists and computer scientists — many of them employed by the very companies that make these AI tools, as well as independent academics — have tried to tease out the effects of these products on how humans think.
Research from major tech companies on their own products often involves promoting them in some way. And indeed, some of the new studies emphasize new opportunities and use cases for generative AI tools. But the research also points to significant potential drawbacks, including hindering developing skills and a general overreliance on the tools. Researchers also suggest that users are putting too much trust in AI chatbots, which often provide inaccurate information. With such findings coming from the tech industry itself, some experts say, it may signal that major Silicon Valley companies are seriously considering potential adverse effects of their own AI on human cognition, at a time when there’s little government regulation.
“I think across all the papers we’ve been looking at, it does show that there’s less effortful cognitive processes,” said Briana Vecchione, a technical researcher at Data & Society, a nonprofit research organization in New York. Vecchione has been studying people’s interactions with the chatbots like ChatGPT and Claude, the latter made by the company Anthropic, and has observed a range of concerns among her study’s participants, including dependence and overreliance. Vecchione notes that some people take chatbot output at face value, without critically considering the text the algorithms produce. In some fields, the error risks could have significant consequences, experts say — for instance if those chatbots are used in medicine or health contexts.
Every technological development naturally comes with both benefits and risks, from word processors to rocket launchers to the internet. But experts like Vecchione and Viktor Kewenig, a cognitive neuroscientist at Microsoft Research Cambridge in the United Kingdom, say that the advent of the technology that girds today’s AI products — large language models, or LLMs — could become something different. Unlike other modern computer-based inventions, such as automation and robotics inside factories, internet search engines, and GPS-powered maps on devices in our pockets, AI chatbots often sound like a thinking person, even if they’re not.
As such, the tools could present new, unforeseen challenges. Compared to older technologies, AI chatbots “are different in that they are a thinking partner to a certain extent, where you’re not just offloading some memory, like memory about dates, to Google,” said Kewenig, who’s not involved in the Microsoft study but collaborates with some of its co-authors. “You are in fact offloading many other critical faculties as well, such as critical thinking.”
Large language models are powerful, or appear powerful, because of the vast information on which they’re based. Such models are trained on colossal amounts of digital data — which may have involved violating copyrights — and in response to a user’s prompt, they’re able to generate new material, unlike older AI products like Siri or Alexa, which simply regurgitate what’s already published online.
As a result, some people may be more likely to trust the chatbot’s output, Kewenig said: “Anthropomorphizing might sometimes be tricky, or dangerous even. You might think the model has a certain thinking process that it actually doesn’t.”
AI chatbots have been observed to occasionally produce flawed outputs, such as recommending to eat rocks and put glue on pizza. Such inaccurate and absurd AI outputs have become widely known as hallucinations, and they arise because the LLMs powering the chatbots are trained on a broad array of websites and digital content. Because of the models’ complexity and the reams of data fed into them, they have significant hallucination rates: 33 percent in the case of OpenAI’s o3 model and higher in its successor, according to a technical report the company released in April.
In the Microsoft study, which was published in proceedings of the Conference on Human Factors in Computing Systems in April, the authors characterized critical thinking with a widely used framework known as Bloom’s taxonomy, which distinguishes types of cognitive activities from simpler to more complex ones, including knowledge, comprehension, application, analysis, synthesis, and evaluation. In general, for these workers, the researchers found that using chatbots tends to change the nature of the effort people invest in critical thinking. It shifts from information gathering to information verification, from problem-solving to incorporating the AI’s output, and it shifts other types of higher-level thinking to merely stewarding the AI, steering the chatbot with their prompts and assessing whether the response is sufficient for their work.
The researchers surveyed 319 knowledge workers in the U.K., U.S., Canada, and other countries in a range of occupations, from computer scientists and mathematicians to jobs related to design and business. The participants were first introduced to concepts and examples of critical thinking in the context of AI use, such as “checking the tone of generated emails, verifying the accuracy of code snippets, and assessing potential biases in data insights.” Then, the participants responded to a list of multiple-choice and free-response questions, providing 936 examples of work-related AI usage, mostly involving generating ideas and finding information, while assessing their own critical thinking.
According to the paper, the connections to critical thinking were nuanced. The paper noted, for instance, that higher confidence in generative AI is associated with less critical thinking, but that among respondents with more self-confidence in their own abilities, there was an increase in critical thinking.