AI news recap: New Meta AI app, ChatGPT’s bad model behavior [May 2025]

by Bella Baker


Just like AI models, AI news never sleeps.

Every week, we’re inundated with new models, products, industry rumors, legal and ethical crises, and viral trends. If that’s not enough, the rival AI hype/doom chatter online makes it hard to keep track of what’s really important. But we’ve sifted through it all to recap the most notable AI news of the week from the heavyweights like OpenAI and Google, as well as the AI ecosystem at large. Read our last recap, and check back next week for a new edition.

Another week, another batch of AI news coming your way.

This week, Meta held its inaugural LlamaCon event for AI developers, OpenAI struggled with model behavior, and LM Arena was accused of helping AI companies game the system. Congress also passed new laws protecting victims of deepfakes, and new research examines AI’s current and potential harms. Plus, Duolingo and Wikipedia have very different approaches to their new AI strategies.

What happened at Meta’s first LlamaCon

mark zuckerberg in black t-shirt with gold chain


Credit: Chris Unger / Zuffa LLC / Getty Images

At LlamaCon, Meta’s first conference for AI developers, the two big announcements were the launch of a standalone Meta AI app to compete more directly with ChatGPT and the Llama API, now in limited preview. Following reports that this was in the works, CEO Sam Altman once joked that maybe OpenAI should do its own social media app, but now that is reportedly happening for real.

We also went hands-on with the new Llama-powered Meta AI app. For more details about Meta AI’s top features, read Mashable’s breakdown.

During LlamaCon’s closing keynote, Mark Zuckerberg interviewed Microsoft CEO Satya Nadella about a bunch of trends, ranging from agentic AI capabilities to how we should measure AI’s advancements. Nadella also revealed that up to 30 percent of Microsoft’s code is written by AI. Not to be outdone, Zuckerberg said he wants AI to write half of Meta’s code by next year. 

ChatGPT has safety issues, goes shopping

Meta AI and ChatGPT both got busted this week for sexting minors.

OpenAI said this was a bug and they’re working to fix it. Another ChatGPT issue this week made the latest GPT-4o update too much of a suck-up. Altman described the model’s behavior as “sycophant-y and annoying,” but users were concerned about the dangers of releasing a model like this, highlighting problems with iterative deployment and reinforcement learning.

OpenAI was even accused of intentionally tuning the model to keep users more engaged. Joanne Jang, OpenAI’s head of model behavior, jumped on a Reddit AMA to do damage control. “Personally, the most painful part of the latest sycophancy discussions has been people assuming that my colleagues are irresponsibly trying to maximize engagement for the sake of it,” wrote Jang.

Earlier in the week, OpenAI announced new features to make products mentioned in ChatGPT responses more shoppable. The company said it isn’t earning purchase commissions, but it smells an awful lot like the beginnings of a Google Shopping competitor. Did we mention OpenAI would buy Chrome if Google is forced to divest it? Because they totally would, FYI.

Mashable Light Speed

The ChatGPT maker has had a few more problems with its recent models. Last week, we reported that o3 and o4-mini hallucinate more than previous models, by OpenAI’s own admission.

Anyone in the U.S. can now sign up for Google AI Mode

Meanwhile, Google is barreling ahead with AI-powered search features. On Thursday, the tech giant announced that it’s removing the waitlist to test out AI Mode in Labs, so anyone over 18 in the U.S. can try it out. We spoke with Robby Stein, VP of product for Google Search, about how users have responded to its AI features, the future of search, and Google’s responsibility to publishers.

Google also updated Gemini with image editing tools and expanded NotebookLM, its AI podcast generator, to over 50 languages. Bloomberg also reported that Google has been quietly testing ads inside third-party chatbot responses.

We’re keeping a close eye on that final development, and we are very curious how Google plans to inject ads into AI search. Would you trust a chatbot that gave you sponsored answers?

Leaderboard drama 

Researchers from AI company Cohere, Princeton, Stanford, MIT, and Ai2, published a paper this week calling out Chatbot Arena for essentially helping AI heavyweights rig their benchmarking results. The study said the popular crowdsourced benchmarking tool from UC Berkeley allowed Meta, Google, OpenAI, and Amazon “extensive private testing” and gave them more prompt data, which “significantly” improved their rankings. 

In response, LM Arena, the group behind Chatbot Arena said “there are a number of factual errors and misleading statements in this writeup” and posted a pointy-by-point rebuttal to the paper’s claims on X. 

The issue of benchmarking AI models has become increasingly problematic. Benchmark results are largely self-reported by the companies that release them, and the AI community has called for more transparency and accountability by objective third parties. Chatbot Arena seemed to provide a solution by allowing users to choose the best responses in blind tests. But now LM Arena’s practices have come into question, further fueling the conversation around objective evaluations. 

A few weeks ago, Meta got in trouble for using an unreleased version of its Llama 4 Maverick model on LM Arena, which scored a high ranking. LM Arena updated its leaderboard policies, and the publicly available version of Llama 4 Maverick was added instead, ranking way lower than the unreleased version. 

Lastly, LM Arena recently announced plans to form a company of its own.

Regulators and researchers tackle AI’s real-world harms

Now that generative AI has been in the wild for a few years, the real-world implications have started to crystallize. 

This week, U.S. Congress passed the “Take It Down” Act, which requires tech companies to remove nonconsensual intimate imagery within 48 hours of a request. The law also outlines strict punishment for deepfake creators. The legislation had bipartisan support and is expected to be signed by President Donald Trump.

The nonpartisan U.S. Government Accountability Office (GAO) published a report on generative AI’s impact on humans and the environment. The conclusion is that the potential impacts are huge, but exactly how much is unknown because “private developers do not disclose some key technical information.”

And in the realm of the frighteningly real and specific harms of AI, a study from Common Sense Media said AI companion apps like Character.AI and Replika are unequivocally unsafe for teens. The researchers say if you’re too young to buy cigarettes, you’re too young for your own AI companion.

Then there was the report that researchers from the University of Zurich secretly deployed AI bots in the r/changemyview subreddit to try and convince people to change their minds. Some of the bot identities included a statutory rape victim, “a trauma counselor specializing in abuse,” and “a black man opposed to Black Lives Matter.”

Other AI news…

In other news, Duolingo is taking an “AI-first” approach, which means replacing its contract workers with AI whenever possible. On the flip side, Wikipedia announced it’s taking a “human-first” approach to its AI strategy. It won’t replace its volunteers and editors with AI, but will instead “use AI to build features that remove technical barriers to allow the humans at the core of Wikipedia.”

Yelp deployed a bunch of AI features this week, including an AI-powered answering service that takes calls for restaurants, and Governor Gavin Newsom wants to use genAI to solve California’s legendary traffic jams.





Source link

Related Posts

Leave a Comment