When you purchase through links on our site, we may earn an affiliate commission.Heres how it works.

The research document “How Is ChatGPTs Behavior Changing over Time?”

Arguably, the success or failure of chatbots is determined by their accuracy.

ChatGPT privacy settings

But they later determined their performance and behavior were completely different.

They further cited that their performance on certain tasks has been negatively impacted.

Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task.

GPT-4 performance comparison to GPT-3.5

We hope releasing the datasets and generations can help the community to understand how LLM services drift better.

The above figure gives a [quantitative] summary.

As for GPT-3.5, it stuck to the chain-of-thought format but gave out the wrong answer initially.

LLMs solving Maths problems

However, the issue was patched in June, with the model showing enhancements in terms of its performance.

On the other hand, there was about 40% growth in GPT-3.5s response length.

The answer overlap between their March and June versions was also small for both services."

Artificial Intelligence AI Assistant Apps - ChatGPT, Anthropic Claude, Google Gemini, Microsoft Copilot, Perplexity, Poe.

stated the Stanford Researchers.

They further attributed the disparities to the “drifts of chain-of-thoughts effects.”

Whereas, in June, both models blatantly refused to give a response to the same query.

Microsoft Edge Vertical Tabs

So they are tweaking gpt to provide same quality answers with less resources and test them a lot.

If they see regressions, they roll back and try something different.

So in their view, it didnt get any dumber, but it did got a lot cheaper.

Claude AI app by Anthropic is seen displayed on a smartphone screen.

More benchmarks need to be conducted to study these trends.

Admittedly, the company has consistently pushed new updates to the platform, andseveral improvements can be cited.

This highlights the need to continuously evaluate and assess the behavior of LLMs in production applications.

Microsoft 365 Copilot rebrand for Microsoft Office

Marathon gameplay reveal

U.S. President Donald Trump speaks during the United Nations General Assembly seen on a laptop computer in Hastings on the Hudson, New York, U.S., on Tuesday, Sept. 22. 2020.

Lenovo IdeaPad 5x 2-in-1 deal

Promotional screenshot of characters from Zenless Zone Zero

MSI Claw 8 AI+ propped up next to an ROG Ally X on a table.