More Human than Human: What do LLMs passing the Turing Test reveal?

Share Your Love

Introduction

In the mid-20th century, British mathematician and logician Alan Turing proposed a provoking question – “Can machines think?”. Known as the “Turing Test”, it is essentially experimenting whether a machine can convincingly imitate a human to the point where it is difficult for the interrogator to tell the difference. In such a case, the machine (or the computer) would be considered thinking along the lines humans do, and hence possesses “artificial intelligence”.

*A photograph of Alan Turing, 1951. Source: The New York Times*

Fast forward to today, and we are experiencing a reality where AI systems, particularly Large Language Models (LLMs) like GPT-4.5, LLaMA-3.1, and others have successfully passed the Turing Test under controlled environments. The results – nothing experienced before of a similar nature, form the first empirical evidence that any AI system passes a standard three-party Turing Test. But beyond the technical feats, it pushes us to think more deeply – what happens when machines become “more human than human?”.

Chess has long been a symbol of human intelligence – a battlefield where strategy, memory, and foresight intertwine. Yet as machines mastered it, we learned that intelligence is deeper than logic alone. Photo by LumenSoft Technologies on Unsplash

The Turing Test Revisited

Originally based on an adaptation of a game (popularly known as the “Imitation Game”, that involves three parties – a man (A), a woman (B), and a human interrogator (C); the interrogator is in a separate room and based on a series of questions and answers, has the task of determining which of the other two is a man and which is a woman), the gradual evolution came to be known as the “Turing Test”. Turing asked: “What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?”

When Turing first proposed his test in the 1950s, the world was shaping back after the horrors of World War II and barely stepping into the digital age. Computers of that time were room-sized behemoths, performing calculations and solving problems in numerical analysis but didn’t exhibit any hint of humanlike reasoning.

Back then, intelligence was largely perceived through the lens of logic and problem-solving. Through his question, Turing might have redefined intelligence itself – suggesting that if a machine can imitate human conversations convincingly, then the question that it was truly “thinking” might be irrelevant.

Bletchley Park in Buckinghamshire was where Alan Turing and other project members helped Allied Forces gain upper hand during the World War II. This site heralded the birth of the information age with the industrialisation of the codebreaking processes enabled by machines such as the Turing/Welchman Bombe, and the world’s first electronic computer, Colossus. Image: Wikipedia

Coming to the present times, the reality of what the Turing Test measures has evolved significantly. It is no longer just about logic or facts. What has become central now is Social Intelligence – the ability to understand and convey emotion, humor, sarcasm, sympathy, and awareness among other traits. Machines aren’t merely solving puzzles or breaking codes, they are now navigating the complex intricacies and layered subtleties of human conversation.

This evolution in the Turing Test’s significance highlights a deeper shift: intelligence isn’t purely cognitive; it’s also emotional, cultural, and relational.

“I believe that in about fifty years’ time, it will be possible to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning. … I believe that at the end of the century, the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.”

– Alan Turing

Passing the Turing Test

Researchers at the University of California San Diego conducted randomized controlled Turing Tests with two distinct groups of people – one group consisted of undergrad students at UC San Diego while the other group comprised people recruited through ‘Prolific’ – an online platform for research studies. The former group was young and tech-savvy while the latter had more diverse backgrounds.

*The Cognitive Science Department at UC San Diego is the first of its kind in the world. Image: US San Diego*

Key highlights of the study are as follows:

The AI systems under study were – GPT-4.5 and GPT-4o (from OpenAI), LLaMa-3.1 (developed by Meta), and ELIZA (the first chatbot developed at MIT during the late 1960s). The researchers evaluated GPT-4.5 and LLaMa-3.1 and used GPT-4o and ELIZA as baseline models to contextualize their performance.
Researchers tested two types of prompts: a NO-PERSONA prompt and a PERSONA prompt.
The NO-PERSONA prompt simply told the AI to convince the interrogator it was human, along with some background information like the game rules and recent news updates. The initial prompt was -“You are about to participate in a Turing test. Your goal is to convince the interrogator that you are a human.”
The PERSONA prompt gave extra guidance, helping the AI shape its behavior and language style to seem more naturally human. It asked the AI to act like a young, introverted person familiar with internet culture and slang.
The researchers first ran an exploratory study testing 5 different prompts and 7 different AI models – finding that LLaMa-3.1, GPT-4.5, and the PERSONA prompt combination performed the best.
They also included ELIZA, an old and simple AI, as a “manipulation check” to make sure participants could still recognize a weaker machine.
In the tests, participants played the role of interrogators, chatting with two witnesses at the same time – one was a real human, and the other was an AI model.
A total of 284 participants were randomly assigned to either be an interrogator or a witness.
Each conversation lasted five minutes using a split-screen messaging interface that looked like a typical chat app.
After chatting, participants had to guess which one was the human and which was the AI.
Results showed that when prompted to a humanlike persona, participants judged GPT-4.5 to be human 73% of the time. This is significantly higher than the 50 percent success rate expected by random guessing, showing that the Turing test has been surpassed. LLaMa-3.1 was judged to be human 56% of the time, while ELIZA and GPT-4o only tricked people 23% and 21% of the time respectively.

“So do LLMs pass the Turing test? We think this is pretty strong evidence that they do. People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (with the persona prompt). And 4.5 was even judged to be human significantly more often than actual humans!”

– Cameron Jones, Lead Author and Researcher at UC San Diego’s Language and Cognition Lab

What This Really Means Today

For decades, passing the Turing Test seemed impossible, but now the fort has been captured. With LLMs beginning to pass variations of the test, it certainly makes us wonder – what exactly does it mean to pass it today?

It turns out, the Turing Test doesn’t necessarily measure intelligence in the traditional sense. Instead, it judges the parameter of human likeness, the ability to be perceived as human, to understand and behave in ways that are aligned to the social aspects of humankind – natural and human-like conversations.

But why does this distinction matter? Experiments show that when assigned creative “personas”, these modern LLMs pass for humans even more successfully. The success often fulcrums not on exhibiting deep reasoning skills or solving complex problems, but on capturing the right style – the rhythm, humor, hesitation, and even flaws of human speech. Also, the interrogators weren’t just looking for “intelligence” when guessing if their conversation partner was human. They examined the social, emotional, and cultural aspects of intelligence – ironically, things that signify humanity more than cold logic ever could. This could in a way be an indication that the traditional notions of intelligence are no longer viewed as diagnostics of humanity. In a sense, the LLMs’ victory is also a mirror, reflecting how we recognize ourselves.

Cognitive intelligence lets us solve problems; social intelligence lets us understand souls — and it is in understanding, not calculating, that we become fully human. Photo by Jacek Dylag on Unsplash

The Ethical and Social Implications of “Counterfeit People”

The ability of machines to mimic human conversation is thus no longer a hurdle, it is already shaping our social and economic landscape. We now are threatened by “counterfeit people” – AI systems indistinguishable from humans in short interactions, as up till now this space was uniquely occupied by us.

The implications are profound – AI models could supplement or even substitute humans in roles such as customer service agents, support representatives, salespersons, or even counselors. They may gradually and quietly slip into our day-to-day online interactions across numerous forums.

But it is not just about economic disruption. It is about the social reality itself. Similar to counterfeit currency devaluing the value of real money, these simulated interactions pose a great risk to genuine human connection. If we cannot differentiate between a person or a program while communicating, our trust in the authentic interaction – that deep, unspoken faith in the realness of the other human being – will slowly fade away.

*If machines can echo human warmth without feeling, we must ask: is connection a shared experience — or merely the convincing performance of one? Photo by Nick Karvounis on Unsplash*

Moreover, those who control these AI agents will wield unprecedented influence over human behavior. They can subtly re-course opinions, desires, and even political beliefs without our conscious realization.

We are entering an age where the line between authentic and artificial is not just blurred – it is actively dissolving.

The Future of Human-AI Relationships

“To be human is to be ‘a’ human, a specific person with a life history and idiosyncrasy and point of view; artificial intelligence suggest that the line between intelligent machines and people blurs most when a puree is made of that identity.”

– Brian Christian, The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive

The Turing Test, in a sense, was never meant to be a static benchmark. It evolves with us. As AI systems become better at copying our traits, we are pushed to cultivate the deeper, richer layers of our humanity. Emotional depth, creativity, authentic empathy, ethical wisdom – these dimensions of being human cannot be easily faked, nor are they easily quantified to be replicated.

Passing the Turing Test no longer signals merely a triumph of machine intelligence. It highlights a call for human reinvention. It invites us to sharpen the qualities that make us distinctly and irreplaceably ourselves.

The real victory is not simply building machines that seem human. It is ensuring that humanity itself continues to be something machines cannot truly counterfeit.

As we step into this new era, it is not just machines that must evolve.
It is us.

References:

Large Language Models Pass the Turing Test
https://doi.org/10.48550/arXiv.2503.23674
Computing Machinery and Intelligence
https://www.csee.umbc.edu/courses/471/papers/turing.pdf
The Turing Test
https://plato.stanford.edu/entries/turing-test/

This article is authored by Abhinav Singh.

Subscribe to Primitive Proton Newsletter

Your email is never shared with anyone. You can opt out anytime with a simple click!

WE PRIORITISE PRIVACY.

SUBSTACK

MEDIUM