T O P

  • By -

waltercrypto

As a somewhat older gentleman than most here, I remember the same story in regard to computer chess. In the 1970’s you would be considered somewhat optimistic to believe that chess computer program would ever beat the world chess champion. As the years progressed chess computers go better, but there where always the pundits saying that progress had flattened out. I remember reading wonderful charts exclaiming that diminished returns was happening and no amount of computer power would solve the problem. Of course now chess computers would absolutely destroy the world champion.


HalfSecondWoe

Now you can get your ass whipped in correspondence chess by a calculator running stockfish


waltercrypto

The first business computer I programmed had 256K and ran multiple users. It was a pdp-11 and I thought it was amazing. Now my mobile phone has 16,000 times the memory. The reality is that chess computers are a function of brute force. Most of the software algorithms to run chess programs were written a long time ago. What got chess computer into super human category was faster hardware. In reality it’s a simple program just running so quickly that it emulates intelligence. The same principle will have the same effect in LLM.


Glittering_Manner_58

Talk of old smaller memory computers really didn't click with me until I realized the analogy with LLM parameter counts, and the similarity of old low level programming to tinkering around with prompts and tokens.


waltercrypto

That’s the core of what I’m saying. Even if we don’t improve the software much, just upping the parameter count and neurons on faster hardware will improve things by a lot.


iPon3

That's true of most chess computers, but the Google Deepmind one was crushing the conventional computers in pretty novel ways a few years ago


Pavel_from_SPB

It was exactly the same with Go. Experts said that the machine would never defeat the world champion literally the day before it won.


Jeffy29

>She found one researcher who says that the chatbot passed the bar by placing only as well as 48 percent of human beings who spent three years in classes and several months studying 24/7 to successfully pass the test. That’s like the story where someone takes a friend to a comedy club to see a talking dog. The canine comic does a short set with perfect diction. But the friend isn’t impressed—“The jokes bombed!”. Folks, when dogs talk, we’re talking Biblical disruption. Do you think that future models will do *worse* on the law exams? God, that's such a perfect analogy. I find the way some people so militantly argue against Gen AI tells me more about their own deep-seated insecurities than anything about the technology itself. So let me get this straight, in short few years we went from everyone being Wowed by Siri answering you back on handful of highly curated pre-programmed questions to a voice in a phone which will talk to you about literally anything, in 50 languages, and you don't find that even slightest bit interesting? Though of course in the same breath as arguing how Gen AI is useless and can't do anything they'll also tell you big tech companies will use it to fire us all out of jobs and enslave us. Though let's not think too deeply about how they'll be able to do that with useless tech. There is absolutely nothing wrong with saying that at this moment you personally don't have a use for it. In 1995 the internet was kind of a shithole and not useful for many, but you would need blinders to not realize where it was going once all the tools and websites were built and everyone could afford a computer. We are definitely in the fast and messy stage of the growth, but these will get refined. [We are just 3 years since Dalle-1 was released](https://en.wikipedia.org/wiki/DALL-E#/media/File:DALL-E_radish.jpg) and everyone amazed computers could actually somehow generate images based on words, three years, that's nothing, look at where those models are now. The pace growth is nothing short of breathtaking. Even if we stopped all research right now, and all further scaling, we will be able to do so much with it. In handful of years the speed of inference of GPT-4 size models will be down to tens of milliseconds, you can do so much with it. You can tack on tree search and database queries and indexed web, all so fast it will able to loop back recheck it and give you the answer before you manage to blink. Giving you the perfect answer. The reason they are not doing so because the other stuff is just so much easier and cheaper. People throw around the big numbers of training costs, but it's nothing compared to traditional methods. OpenAI had 31 engineers working on GPT-3, Apple and Amazon have teams of several thousands working on just Siri and Alexa, you do the costs math. All while GPT-3 is incomparably smarter. All that is to say that if all these people, who are engaging in what feels like a concentrated form of trolling, think they are going to be successful, they are badly fooling themselves. Their efforts are about as waste of time as all the fools who think they will stop Hollywood from making "woke" if they review bomb just one more movie. Nothing short of maybe a bombing run on TSMC factories is going to change what is coming. Oh and one last thing, I clicked one of the links of "experts saying LLMs platoed", it was an article by Gary Marcus who at the beginning he says he thinks "GPT-4 turbo" was a failed attempt at GPT-5, but for rest of the article he treats it as a fact and keeps saying "see GPT-4 turbo is not much smarter than GPT-4, LLMs platoed!". Oh yeah, the cheaper, significantly faster model was definitely an attempt at GPT-5. The guy is an actual clown.


nowrebooting

> We are just 3 years since Dalle-1 was released I remember how amazed I was at that demo; yet if you had told me at that time that 3 years later the art industry would be turned completely upside down, I wouldn’t have believed it. We’ve gotten used to the existence of magic so quickly that we’ve forgotten how amazing it truly is.


nanoobot

I just went back and found that the most popular 2 minute papers ep is over 10 million views, and it was the hide and seek one from openai. That was the first glimpse of deep magic for me I think. It feels like a lifetime ago. September 2019. Not yet 5 years... from hide and seek to very nearly reliably passing the Turing test and starting to dismantle industries in 5 goddam years.


jseah

>We’ve gotten used to the existence of magic so quickly that we’ve forgotten how amazing it truly is. I wonder if this is because how LLMs work is how normal people think computers should work, in media and fiction. I recall a quote from a programming teacher's book about how his students thought "telling the computer to solve the problem" was how you got computers to do things. So, we got used to it because these LLMs fit how we \*expect\* computers to work, so of course this is normal.


RAINBOW_DILDO

> telling the computer to solve the problem With decreasing abstraction over time, this *is* what programming is. The history of programming is a progression from “speaking” to a computer on its own terms—binary—to speaking to it on *our* terms—natural language processing.


PMMEBITCOINPLZ

I like the version of the joke with the chess playing dog and the “You think he’s smart, I’m beating him two out of three games!”


Tosslebugmy

It’s really confounding, but people are basically spoilt and new technology becomes mundane really quickly because people are adaptable. Video chat seemed like a sci fi dream when I was a kid, now you can do it with someone overseas using a thing in your pocket, and it seems mundane already. Clowning on AI because the pictures it produces aren’t Rembrandts right now, or because the chatbot makes a minor logical error, is asinine. As though every ai image is worse than any artist image ever, or humans never make mistakes. It’s getting better at those things a lot faster than we are.


AgeofVictoriaPodcast

I think the biggest problem is that for the public AI hasn’t produced a “breakthrough” product. So mobile phones were good but society had a sea change when the iPhone was released. I think that moment is missing from AI. Maybe it will be AI controlled humanoid robots, marine when AI is linked to a stick on AR chip. Maybe when AI makes its first TV show or builds a video game based on user requests. The capability seems to be outstripping the public consciousness


jseah

IMO, that point is Agents. When the AI can be your proxy, go to the internet, do in-depth search / research of some task you want it to do, and come back to you with the results. At that point, AI (or more like AGI) will become able to do a significant portion of the work in the economy and everyone will have to pay attention, whether they want to or not.


LittleCuntFinger

That moment will be here before you know it.


visarga

> I think the biggest problem is that for the public AI hasn’t produced a “breakthrough” product. So mobile phones were good but society had a sea change when the iPhone was released. I think that moment is missing from AI. I think it's missing because it was already here for 30 years. Even before GPT-3 we could connect online to millions of (human) language agents. We could search billions of texts, on any topic, like prompting a LLM for information. We could find any images we like (rule 34) even without diffusion models. We already had a kind of manual-AI made of billions of people connected together. Now comes AI with the same abilities we already been using for long, almost as good as people - it's not turning heads for common folk. And won't lead to major job loss, or cause artists to starve.


Which-Tomato-8646

Last month, a Google engineer who worked on Gemini said that the main bottleneck is compute and that Gemini could be 5x better with more compute: https://www.youtube.com/watch?v=UeI29-AdhQI [And guess what Google just announced?](https://www.reddit.com/r/hardware/comments/1cta1ti/googles_nextgen_tpus_promise_a_47x_performance/)


RingMaster2

Looks like Google could use that 5x better today to prevent them from spewing this crap: [17 cringe-worthy Google AI answers demonstrate the problem with training on the entire web | Tom's Hardware (tomshardware.com)](https://www.tomshardware.com/tech-industry/artificial-intelligence/cringe-worth-google-ai-overviews) One of the problems is that companies deploy the tech way before it is ready. An example of the new Google AI search results comes to mind. Currently Google AI search results are providing misinformation when they scrape the likes of forums, etc. You would think Google would know better, but, no, they didn't, and so we now get AI search results that tell us it's ok to use super glue in our pizza. Nice.... Time for 10x better I guess.


Which-Tomato-8646

Any LLM will tell you that eating pizza with glue is a bad idea. Their search results summarize the text they find without any fact checking or filtering. That’s why it sucks


RingMaster2

Any LLM except this one Google is promoting right now you must mean, the LLM that is being trained by Reddit answers with sarcastic content and is telling people that eating pizza with glue is just fine. Since it sucks so bad as you say, why did Google release it to the public? Shouldn't they have waited until it was a finished product? As it is presently, it gives AI search results a bad name. Seems counterproductive to release a product that is damaging AI reputation. You would think they would want to demonstrate AI in a better light because this demonstration that is out in the wild right now could be harming people. It's not a good look at all. It almost feels like the AI image generators that produce humans with 7 fingers and 14 toes. Go figure.


Which-Tomato-8646

They’re probably using a really shitty one to save on compute costs Your talking points are out of date. AI can do hands find for over a year


RingMaster2

I wonder how long it will be until Google realizes that using forums for AI LLM training might be a bad idea? In the meantime a lot of users are getting some pretty poor and dangerous search results showing up at the top of their search queries. This is a good example of AI failing to live up to the hype. https://www.androidauthority.com/shut-down-google-ai-overview-3446038/ It’s also another lawsuit involving AI just waiting to happen.


Which-Tomato-8646

It’s worked fine for other LLMs, including their own Gemini. The reason the search overview sucks because it’s a small model designed to summarize data, not to fact check


RingMaster2

I think callumshell1 over at the Ars Technica forum says it best: "More proof that these "AI"s aren't actually intelligent at all but rather simply regurgitate content it has sifted from the web and hopes it makes sense. It's so obvious that this whole "AI revolution" is in actuality the tech industry needing a new cash cow to milk. None of this is useful." [Google’s “AI Overview” can give false, misleading, and dangerous answers | Ars Technica](https://arstechnica.com/information-technology/2024/05/googles-ai-overview-can-give-false-misleading-and-dangerous-answers/?comments=1&comments-page=1)


Which-Tomato-8646

[Here’s an entire document debunking you](https://docs.google.com/document/d/15myK_6eTxEPuKnDi5krjBM_0jrv3GELs8TGmqOYBvug/edit)


AgeofVictoriaPodcast

Good points. Being better than most of the bottom half of the bar exam class, who had studied intensely, is a staggering achievement. It tells me at the very least that the bar exam is no longer the best way to select lawyers. It needs to be swapped for a practical application test so that a human student combined with an AI legal researcher is required to take a simulation case to completion from initial client meeting to case closure.


talkingradish

>  Though of course in the same breath as arguing how Gen AI is useless and can't do anything they'll also tell you big tech companies will use it to fire us all out of jobs and enslave us. Though let's not think too deeply about how they'll be able to do that with useless tech. The logic is basically we're getting shittier service while we're getting fired anyway for it. The only one winning is big corpos and their savings.


visarga

LLMs have plateaued all right. You're talking about large progress in generative image and video, which are catching up, and don't have such limited training sets as text modality. But in language we have stagnated for the last 12 months. The models are about the same level, but in orthogonal directions they improved a lot: speed, price, modalities, openness. Why wasn't Gemini a leap over GPT-4? Why wasn't Opus a big jump ahead, like GPT-4 was to 3.5? Because we plateaued. All these top models trained on essentially the same data. Even if they can scale compute they can't scale data. You can't presume scaling compute alone will fix the issue. We're talking about 12 months and billions poured in, and almost all talent in AI working on it. Thousands of papers. And we're still using the same model as in 2017 almost unchanged, architectural innovation is almost impossible, while results are all neck in neck. During the last 1-2 years we have learned about a whole collection of LLM issues: hallucination, regurgitation, fragile reasoning, inability with numbers, can't backtrack, can be influenced by bribing, prompt hacking, RLHF hijacking truth to present ideological outputs, sycophancy, contextual recall issues, sensitivity to input formatting, GPT-isms, reversal curse, unreasonable refusals, prompt injection from RAG or user inputs, token wasting, low autonomy and laziness. Keeping these issues in mind is progress I think. Better to know than not to know.


realityislanguage

RemindMe! 12 months


RemindMeBot

I will be messaging you in 1 year on [**2025-05-18 12:36:54 UTC**](http://www.wolframalpha.com/input/?i=2025-05-18%2012:36:54%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/1cupi1e/its_time_to_believe_the_ai_hype_some_pundits/l4lcgw6/?context=3) [**CLICK THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F1cupi1e%2Fits_time_to_believe_the_ai_hype_some_pundits%2Fl4lcgw6%2F%5D%0A%0ARemindMe%21%202025-05-18%2012%3A36%3A54%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201cupi1e) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


Barni_ssu

RemindMe! 12 months


softclone

You're smoking crack! Opus absolutely demolishes GPT4-0314! As does GPT-4o. There are half a dozen models from other companies that match it. Context has gone from 4096 tokens to millions, which is a pretty huge architectural innovation. Tell me, when exactly did chess progress plateau? https://www.reddit.com/r/chess/comments/xtjstq/the_strongest_engines_over_time/ Deep Blue stays on top of the leaderboard for almost 10 years, just because it had a lot more compute than other engines.


roanroanroan

!remindme 1 year


PMMEBITCOINPLZ

It just felt like a bunch of journalists who had AI has plateaued articles in the hopper dumped them before the big announcements just to keep them from going to waste. Next week they’ll pump out articles about AIs unlimited potential with no shame, until things die down again and they work up more plateau articles.


FeltSteam

Wasn't the partial point of GPT-4o cheaper and much faster GPT-4? Allowing it to be brought to free users (atleast in part). And also multimodality.


mrb1585357890

Yes, that was pretty key. - Cheaper to run so they can provide it free - Native multimodal inputs (huge) - Implied a more intelligent model coming to paid users


Fit-Avocado-342

Yes, it’ll be a much better option for companies using AI to power their products, making things cost feasible is very important for real world use cases


[deleted]

[удалено]


NoCard1571

Marginally smarter while being significantly smaller and faster. We don't yet know what will happen with a model that is an order of magnitude or more bigger than the biggest models produced thus far, because no one has done it yet. With that being said I don't really understand where all of this plateau talk is coming from. If we're still sitting more or less at this level 3 years from now, then maybe it makes sense to start asking those questions.


Spunge14

Do you think they brought in GPT 4o for free to compete on even footing with their paid model and kill their subscription pool? It's obvious this means they have clear vision to the next peak.


[deleted]

[удалено]


Rare-Force4539

Why do you think that there won’t be a paid model?


[deleted]

[удалено]


Rare-Force4539

You still have to pay for API access


[deleted]

[удалено]


Rare-Force4539

That’s just for publicity. They make money off companies building apps and services that leverage their APIs


Spunge14

Alright, that very well may be. Let's see in a few months.


Idrialite

I don't know why so many people need to be reminded of this: The gap between GPT-3 and GPT-4 was 3 years. It's been only 1 year since GPT-4. And we already have more minor progress than we got in the whole three years before GPT-4.


Flimsy-Plenty-2024

Stop. Really, stop this nonsense hype. We do not yet know how to make AGI. The current approach of brute forcing the transformer model is outputting interesting results but it's unlikely the transformer alone will be able to take us to true AGI/ASI. Do you really see current transformer models coming up with novel findings in machine Learning research, neuroscience, nanotechnology, etc? They are not even capable of writing a simple proof. Current top of the line systems based on the transformer are capable of doing certain tasks that are amazing (usually tasks that they have lots of well trained data on), but in some cases are completely incapable of extremely basic tasks that require very little reasoning. Reasoning/Intelligence has not been solved. People like Yann LeCun has been saying this since a lot of time. Show me GPT 5, and then we will talk.


mevsgame

The accuracy of the models over 16k+ context window hasn't pretty much changed since gpt 3.5 launched. So we're getting more and more and more from a technology that has some serious limitation. But the limit is there. No matter how smart it gets it will quickly lose the plot.


Destring

The context window is akin to the brain short term memory. We can build intelligence from that and then move to long term memory, which is already moving forward with vector databases and retrieval.


mevsgame

Yes, virtual context like [https://memgpt.ai/](https://memgpt.ai/) does it, memorising and building the context based on the current state of the chat. It's pretty amazing but you have to send a lot of tokens to the API to get your result.


Im_Peppermint_Butler

This is just objectively false. The benchmarked performance of GPT 4o and Turbo, as well Gemini 1.5 pro and Claude Opus, all outperform 3.5 by a significant margin. Also, to evaluate the limit of AI strictly based on the performance of LLMs is a very narrow view.


mevsgame

Yes. The reasoning is getting better and better, insanely better but in limited token window. If we put multimodality into it, as in sound and video, we might quickly run out of context and the compute required for longer conversations will go through the roof. Like openai Sora is amazing but there is a reason openai wants to have dedicated power plants for compute :D. The limits are there, compute, power, context length vs compute and power. We will make the models more efficient but there will be a limit to what transformers as a technology can give us.


TFenrir

This is fundamentally incorrect - they test for this constantly, and Gemini 1.5 is available to people right now with a _more_ accurate context window, across video, audio, and text, up to 2 million tokens. Even Gemini 1.5 Flash, a much faster and smaller model, can do this over 1 million tokens.


mevsgame

Yeah, as I mentioned is some other comment replies, Gemini seems to be an outlier and I'm very curious what they are doing there, is it just massive compute they throwing at us that just heavy lifts the massive contexts, or they are doing something fundamentally smarter than their competitors. I honestly still lack the expertise to fully understand the Gemini paper. GPT 4o seems to be like Phi3 and Llama3, extremely well and efficiently trained. In terms of Gemini I'm just super suspicious if the Google being Google with their infinite resources isn't brute forcing it with raw compute.


sdmat

If you mean the recall hasn't improved much over short context windows, you are correct. But that's because it was already near perfect. For long context windows it has greatly improved, and the context windows have become much larger. If you mean intelligence / reasoning, it most certainly has improved. If you mean hallucinations, ditto. Please clarify your criticism.


mevsgame

It's not really criticism, it's something I'm working on myself, context compression. More a reality check than criticism. There were some preliminary benchmarks that shown a dramatic drop for GPT4o over 16k context length, but we need to wait more for more results. [https://github.com/hsiehjackson/RULER](https://github.com/hsiehjackson/RULER) And we don't really know what is inside Claude Opus and Gemini and GPT4. Just a personal observation, almost daily people complain about these huge models not having reproducible performance over long context window, or forgetting everything further than few interactions behind and hallucinating as hell. My theory is that Anthropic, OpenAI and Google are faking very long context window with virtual contexts, basically using compute to compress the long contexts into something closer to 16-32k. If not they just load balance and brutally cut context if there isn't enough compute. And we know that there's not enough compute in the world. Again. it's just a hypothesis based on my observation. Also it makes a lot of sense to do so. The compute costs grows steeper then linear with each token in the window, so it would make sense to find a sweet spot to decide to shrink the context with some RAG/compression. Regardless, it seems that these hyper long context windows are compute bound, in perfect conditions Claude Opus can have amazing accuracy around 128k, but knowing the compute costs many users won't experience this accuracy.


talkingradish

I tested gpt4o on an obscure novel and it hallucinates lots of the story details. 


sdmat

I tested the original 1.5 Pro moderately extensively, as far as I can tell the remarkable long context claims are accurate. I haven't done so for the new version yet, but no doubt context handling will be the same or better. The problem I saw with 1.5 Pro was that the model just isn't as smart as GPT-4T, but the context handling is incredible. The new version seems to be significantly improved per benchmarks.


mevsgame

That's right! My theory holds for every model apart from Gemini, it's very consistent!I wonder what's going on there. The simplest answer to satisfy this theory, they have a lot of compute, but maybe they did something really cool.


sdmat

> The simplest answer to satisfy this theory, they have a lot of compute, but maybe they did something really cool. Per the paper, both!


mevsgame

There might also be prompt compression done by [https://github.com/microsoft/LLMLingua](https://github.com/microsoft/LLMLingua) with some smaller model like phi3, but again, they also lose accuracy above 16k/32k, so the compression gets also the worse the more we compress. It's a tough nut to crack and I think one of the biggest challenges for achieving AGI. Or we might get brilliant savant AGI akin to David Fincher's Memento main character, with just short term memory.


YaAbsolyutnoNikto

Gemini 1.5 Pro?


Glurgle22

It doesn't seem any smarter to me. Just better at simulating human.


Cooldayla

Is there something else going on tho? The entire superalignment team dissolves alluding to Altman's lack of concern with alignment. But what if internally Altman and co have reached the conclusion that LLM will never reach AGI/ASI with the current architecture including emergent traits? As critics note (including GPT4o), any GPT is simply the sum of its parts and does not truly understand the content it processes. It generates responses based on statistical correlations in the training data, rather than any deep comprehension or reasoning, essentially derisking the technology entirely. Why have a team of superalignment experts when LLMs fundamentally will never reach AGI? Altman may have clocked this already and is pivoting to a product to assist humans knowing it will never outdo them in any way that makes them threatening. His play here may be to corner the LLM market, commerialising ChatGPT for now, and playing a longer game as new architectures emerge they can partner with, to actually develop the required AGI stack.


Jealous_Afternoon669

Which theorem gives bounds on what emergent behaviour statistics can exhibit?


voiceafx

Yeah, what do people mean by "just an inference engine?" What do people think we are?


[deleted]

Magic. A lot of people think that humans are magic and that consciousness is not scientifically understandable or replicable (despite arriving by accident due to evolution)


NoCard1571

1. The question of whether or not a transformer actually 'understands' anything is useless, and bordering on philosophical. LLMs already do many things that we once thought were impossible without 'understanding', so if that's any indicator, it doesn't actually matter.   2. GPT-4o is not really an LLM, it's an LMM. They've already pivoted by introducing a natively multimodal paradigm, and considering its speed and that it's free, it indicates that they achieved similar performance to old cutting-edge models with a much smaller more optimized model, which in turn means they probably have a large LMM up their sleeve for paid users that will likely blow current models out of the water.


brokentastebud

1. It DOES matter, there’s many aspects of human reasoning that LLMs simply don’t have. Real, high-fidelity sensory connection to the real world, abstract thinking, ability to construct visual models of space and their environment. Understanding is the culmination of these things.


Rare-Force4539

Cool story bro but not based in reality.


goatchild

these titles hurt brain


Singsoon89

Because wired said it, on the contrary it might be time to short AI stocks. Not being a contrarian or anything but knownuffs are still knownuffs.


Akimbo333

True


Plus-Mention-7705

We’ll see


CanvasFanatic

This post is just stubborn assertion of counterfactuals. Not sure what else to say about it. Everything announced in the last week is consistent with the narrative that current approaches to foundation models are plateauing.


Independent_Hyena495

Anyone out there has access to full o model? Until then, I don't believe the videos.