Discussion about this post

User's avatar
Shawn C's avatar

Btw, if, as you said, the intelligence of current large models is already sufficient, then GPT-5 clearly wouldn’t fail to distinguish between 5.11 and 5.9. In this case, the model is provided with a clear problem specification and sufficient context. Its world model also contains enough knowledge about how numbers are counted and what each digit represents. As long as its reasoning ability is functioning normally, it should be impossible for it to make such a mistake

ref:http://x.com/TimelessMartian/status/19539

Expand full comment
Shawn C's avatar

You mentioned in your article that the bottleneck is not the intelligence of the model, but rather problem specification and context. I think this is a bit too optimistic. In fact, when we provide problem specifications and context to humans, they are often neither precise nor complete, yet people can still use their intelligence to infer the missing but necessary information. Or, when their output does not meet our needs, we can give feedback, and they revise their answers. After repeating this process a few times, we usually arrive at a better result. In this respect, models are not at a greater disadvantage than humans. But the current situation is that when a model’s output is unsatisfactory, even if we provide multiple rounds of feedback and additional information, it still fails to reach the level that a human would achieve in the same situation. Many times, humans can solve problems that models cannot, which clearly shows that the model’s intelligence is still insufficient.

Expand full comment
2 more comments...

No posts

Ready for more?