claude opus 4.6 vs gpt-5.4: what actually feels different
after using both heavily for coding, i think the real difference is not raw intelligence. it's working style. claude creates momentum. gpt creates rigor.
people keep comparing claude opus 4.6 and gpt-5.4 like this is a leaderboard problem.
which one is smarter.
which one is better.
which one wins.
after using both heavily for coding, i think that framing misses the interesting part.
the real difference is not raw intelligence.
it is working style.
after enough hours with these systems, the useful question stops being "which one scores higher?" and becomes "what job would i actually give each one?"
my current answer is basically this:
claude creates momentum.
gpt creates rigor.
that sounds neat and tweetable, but i actually think it is the cleanest way to describe the difference.
claude creates momentum
claude opus 4.6 feels like the over-opinionated builder in the room.
it has direction.
it has taste.
it has the energy of someone who says "trust me" and then starts moving walls before the architect has finished the sentence.
sometimes that is exactly what you want.
you give it something half-baked, and it often comes back with a version that is more complete, more polished, and more convincing than the thing you asked for.
frontend is where this feels most obvious.
claude is often better at making something feel like a product on the first pass. better hierarchy. better polish. more "someone cared."
it is also more willing to freelance.
you ask for a table. it gives you a dashboard.
you ask for a small fix. it comes back with a worldview.
you ask it to stay inside the lines and it says yes, absolutely, then redraws the lines.
that willingness is part of the magic.
it is also part of the danger.
because first drafts are where charisma has the most room to cheat.
claude often gives you the more impressive version before it gives you the more disciplined version.
gpt creates rigor
when i say gpt-5.4 here, i mostly mean through codex-style workflows.
the vibe is completely different.
gpt-5.4 feels like the cracked researcher in the back of the room.
quiet. literal. annoyingly precise.
it is absurdly good at doing exactly what you asked.
which is either amazing or horrifying depending on whether what you asked was actually smart.
if your architecture is clear, gpt-5.4 is excellent.
if your constraints are crisp, gpt-5.4 is excellent.
if your instructions are nonsense, it is still dumb enough to implement the nonsense.
that is the joke.
it is also the truth.
codex has this specific quality where it exposes whether you have an actual system in your head or just a mood and a deadline.
and that is why i often trust gpt more in analysis than in vibes-based exploration.
gpt is the model i want when i already know the shape and need the work done faithfully.
it is also the model i often want reviewing claude.
claude gives you the exciting version.
gpt tells you where the exciting version is bullshit.
claude will confidently invent direction, introduce abstractions, and sell you on the overall shape.
gpt-5.4 will point at the contradiction, the broken edge case, the abstraction leak, the part that does not generalize, and the thing that looked clever but will age terribly.
claude often feels smarter in generation.
gpt often feels smarter in critique.
why both feel weirdly superhuman and subhuman
this is the part people keep flattening.
both of these systems are better than most average developers at some things.
and both are obviously worse than solid human engineers at other things.
they can be faster than humans, more patient than humans, more widely read than humans, and less ego-driven than humans.
they can also be deeply unserious in exactly the wrong moment.
they do not really understand consequences the way humans do.
they do not have taste in the full human sense.
they do not feel embarrassment after introducing an insane edge case at 2am.
and they definitely do not have that quiet human instinct of "this technically works, but i know in my bones this will become disgusting in six months."
so no, i do not think either of them is "a great engineer."
i think both are weirdly gifted partial engineers.
they are better than a lot of humans in narrow slices.
they are worse than good humans in the places that matter most once a system gets real.
the cleanest version of my take
claude is better when you need direction, taste, and a stronger first pass than you were able to specify.
gpt-5.4 is better when you already have direction and need faithful execution, decomposition, and critique.
claude compensates.
gpt sharpens.
claude is the one talking in the meeting.
gpt is the scary quiet one leaving review comments afterwards.
where my preference flips
for loose exploration, claude often wins the first hour.
for deliberate system building, gpt-5.4 often wins the fourth hour.
if i want a model to help me feel my way toward a better direction, claude is often more useful.
if i already know the direction and want the thing executed and then critiqued without a bunch of freelance reinterpretation, i trust gpt-5.4 more.
this matters because maintainability usually does not come from the model being brilliant in isolation.
it comes from the human having a coherent system.
gpt-5.4 is better at respecting that coherence.
claude is better at rescuing you when you do not fully have it yet.
what using both teaches you
claude hides some of your uncertainty by making strong judgement calls.
gpt-5.4 exposes your uncertainty by implementing it faithfully and then, at its best, criticizing it mercilessly.
so they train different muscles.
claude teaches you what better defaults feel like.
gpt teaches you whether you were actually precise.
claude makes you say, "oh, right, that is a better way to structure this."
gpt-5.4 makes you say, "oh no, that is what i asked for."
both are useful educational experiences.
only one of them is spiritually devastating.
my current take
if someone is earlier in their journey, especially in frontend-heavy or product-shaped work, i completely understand why claude opus 4.6 can feel more impressive. it often feels more helpful than you technically earned.
if someone already has strong engineering taste and wants a system that will stay aligned to a defined architecture and review work critically, i can absolutely see them preferring gpt-5.4.
so for me, the comparison is not:
"which one is better?"
it is:
"which one is better at compensating for the specific kind of weakness i have right now?"
claude feels like collaboration with an over-opinionated builder who is sometimes brilliant and sometimes intoxicated by its own taste.
gpt-5.4 feels like leverage for a cracked researcher who will do exactly what you said, then quietly point out why parts of the shiny version do not actually hold up.
both are absurd.
both are useful.
both are overrated.
both are underrated.
and both are already better than a lot of developers at a surprising amount of real work, while still being much worse than good humans at the parts of software that require judgement, restraint, and actual lived taste.