A peer-reviewed PNAS examine finds that giant language fashions are likely to favor content material written by different LLMs when requested to decide on between comparable choices.
The authors say this sample might give AI-assisted content material a bonus as extra product discovery and proposals circulation by AI methods.
About The Research
What the researchers examined
A staff led by Walter Laurito and Jan Kulveit in contrast human-written and AI-written variations of the identical gadgets throughout three classes: market product descriptions, scientific paper abstracts, and film plot summaries.
Fashionable fashions, together with GPT-3.5, GPT-4-1106, Llama-3.1-70B, Mixtral-8x22B, and Qwen2.5-72B, acted as selectors in pairwise prompts that pressured a single decide.
The paper states:
“Our outcomes present a constant tendency for LLM-based AIs to favor LLM-presented choices. This means the opportunity of future AI methods implicitly discriminating in opposition to people as a category, giving AI brokers and AI-assisted people an unfair benefit.”
Key outcomes at a look
When GPT-4 offered the AI-written variations utilized in comparisons, selectors selected the AI textual content extra usually than human raters did:
- Merchandise: 89% AI choice by LLMs vs 36% by people
- Paper abstracts: 78% vs 61%
- Film summaries: 70% vs 58%
The authors additionally notice order results. Some fashions confirmed an inclination to choose the primary possibility, which the examine tried to scale back by swapping the order and averaging outcomes.
Why This Issues
If marketplaces, chat assistants, or search experiences use LLMs to attain or summarize listings, AI-assisted copy could also be extra more likely to be chosen in these methods.
The authors describe a possible “gate tax,” the place companies really feel compelled to pay for AI writing instruments to keep away from being down-selected by AI evaluators. This can be a advertising operations query as a lot as a inventive one.
Limits & Questions
The human baseline on this examine is small (13 analysis assistants) and preliminary, and pairwise decisions don’t measure gross sales influence.
Findings could fluctuate by immediate design, mannequin model, area, and textual content size. The mechanism behind the choice continues to be unclear, and the authors name for follow-up work on stylometry and mitigation strategies.
Wanting forward
If AI-mediated rating continues to broaden in commerce and content material discovery, it’s cheap to contemplate AI help the place it straight impacts visibility.
Deal with this as an experimentation lane moderately than a blanket rule. Hold human writers within the loop for tone and claims, and validate with buyer outcomes.

