ChatGPT use declines as users complain about ‘dumber’ answers, and the reason might be AI’s biggest threat for the future::AI for the smart guy?
ChatGPT use declines as users complain about ‘dumber’ answers, and the reason might be AI’s biggest threat for the future::AI for the smart guy?
None of these points are true though. Context has been extended in the webui, markedly. 3.5 turbo is only that, 3.5 but faster. Gpt-4 is a marked improvement on 3.5 and I definitely haven’t seen any conclusive evidence it’s been nerfed in my daily use. Prompts have and still need to be carefully crafted for best results, but the results have been steadily improving not degrading over time.
All of these points are true though. Chatgpt 4 max token is now half of from the webui compared to when gtp-4 was launched. It used to be >8k, it is now >4k. Max number of tokens for the api hasn’t changed for gpt-4, while it was greatly increased for chatgpt-3.5-turbo. The article is however talking about the service chatgpt, used via webui.
ChatGPT-3.5-turbo are different models than those used in the past. You can literally read it in the https://platform.openai.com/docs/models/gpt-3-5
Prompt engineering has been limited as demonstrated by the fact that most jailbreaking techniques don’t work anymore. The way to avoid jailbreaking is exactly to limit ability of users to instruct the model.
Source on the halved token limit for gpt- 4 in the webui? Because that has not been my experience at all. There are now 16k and 32k models for 3.5-turbo, but there’s no evidence 3.5-turbo is nerfed at all from 3.5 and it absolutely out performs 3. Yes, you can see that they offer different snapshots of models, but that doesn’t indicate at all that there’s been a any reduction in their ability. “Breaking” jail breaking isn’t a bug, and it certainly hasn’t been demonstrated that the model is less capable.
Unless they reverted the chance recently, you can test yourself the max number of tokens of gpt-4 from webui, that is now ~4k. It used to be ~ 8k.
What you are talking about are the APIs, that are different, and are not discussed in the news. They are even different models, in the sense that depending on the size of the context you get different results because of the attention mechanism (unless they are using the same model restricting the number of tokens, we don’t know). Unfortunately there is no official benchmark from openai as a comparison between gpt-3.5-turbo models with different context size, but I would not trust them much anyway. They are very defensive on their data, and push out mainly marketing stuff. I would wait for a 3rd party to do the benchmark.
“Breaking” jailbreaking is not a bug, but it limits the ability to instruct the model, i.e. prompt engineering, because it is literally meant to limit prompt engineering, it is the whole idea behind it
Edit. Here a link of a guide where they have the ~4k limit as well for gpt-4 https://the-decoder.com/chatgpt-guide-prompt-strategies/