@The_AI_Skeptic – Long Post: Why Generative AI is Currently Doomed We’re so used to technology getting better. Every year there’s a new iPhone with a faster processor. It’s the way of the world… or so it seems. Sometimes, bigger doesn’t mean better. Take, for example, LLMs like ChatGPT. If you keep scaling them up, they eventually become worse. This inverse scaling leads to them becoming actively bad: ( youtube.com/watch?v=viJt_D) (When @OpenAI were developing the highly anticipated ChatGPT-4, it seems they may have hit this problem already. Because instead of scaling up their training sets, like previous iterations of their model, leaks indicate that ChatGPT-4 may actually be 8 x ChatGPT3 models tethered together ( thealgorithmicbridge.substack.com/p/gpt-4s-secre), explaining why it was delayed and why the dataset size was not revealed.) But even if you believe they’ll find a way around that problem, there’s still plenty of others waiting for us. What if I told you that language model technology isn’t actually new? What if I told you it’s largely the same as it was in the 1980s, but the only thing that’s changed is the transformer technology, allowing for more efficient training, and the sheer size of the training data: The public internet. Yes, the thing that gives ChatGPT (and other LLMs) their “magic” is the fact that the internet exists now and can be scraped. (After it’s being manually catalogued by hundreds of thousands of foreign workers, of course ( theglobeandmail.com/business/artic).) 300 billions words from the internet were used to train ChatGPT-4. It’s the scale of this training dataset that allows it to sound so human and knowledgeable. There’s nothing else like it. Not only are major companies preventing their content from being used in future AI training datasets ( deadline.com/2023/10/bbc-wi) but there’s a lingering question on whether or not it was even legal for them to use their original datasets in the first place ( theverge.com/2023/9/11/2386). But worse than that, the internet is increasingly being polluted with error-ridden AI generated content. So much so that it’s infecting search results ( wired.com/story/fast-for)…”=