Portia AI: Initial Thoughts on GPT-5



This content originally appeared on DEV Community and was authored by Robbie Heywood

At Portia AI, we’ve been playing around with GPT-5 since it was released a few days ago and we’re excited to announce it will be available to SDK users in tomorrow’s SDK release 🎉

After playing with it for a bit, it definitely feels an incremental improvement rather than a step-change (despite my LinkedIn feed being full of people pronouncing it ‘game-changing!). To pick out some specific aspects:

  • Equivalent Accuracy: on our benchmarks, GPT5’s performance is equal to the existing top model, so this is an incremental improvement (if any).
  • Handles complex tools: GPT-5 is definitely keener to use tools. We’re still playing around with this, but it does seem like it can handle (and prefers) broader, more complex tools. This is exciting – it should make it easier to build more powerful agents, but also means a re-think of the tools you’re using.
  • Slow: With the default parameters, the model is seriously slow – generally 5-10x slower across each of our benchmarks. This makes tuning the new reasoning_effort and verbosity parameters important.
  • I actually miss the model picker! With the model picker gone, you’re left to rely on the fuzzier world of natural language (and the new reasoning_effort and verbosity parameters) to control the model. This is tricky enough that OpenAI have released a new prompt guide and prompt optimiser. I think there will be real changes when there are models that you don’t feel you need to control in this way – but GPT-5 isn’t there yet.
  • Solid pricing: While it is a little more token-hungry on our benchmarks (10-20% more tokens in our benchmarks), at half the price of GPT-4o / 4.1 / o3, it is a good price for the level of intelligence (a great article on this from Latent Space).
  • Reasonable context window: At 256k tokens, the context window is fine – but we’ve had several use-cases that use GPT-4.1 / Gemini’s 1m token windows, so we’d been hoping for more…
  • Coding: In Cursor, I’ve found GPT-5 a bit difficult to work with – it’s slow and often over-thinks problems. I’ve moved back to claude-4, though I do use GPT-5 when looking to one-shot something rather than working with the model.

There are also two aspects that we haven’t dug into yet, but I’m really looking forward to putting them through their paces:

  • Tool Preambles: GPT 5 has been trained to give progress updates in ‘tool preamble’ messages. It’s often really important to keep the user informed as an agent progresses, which can be difficult if the model is being used as a black box. I haven’t seen much talk about this as a feature, but I think it has the potential to be incredibly useful for agent builders.
  • Replanning: In the past, we’ve got ourselves stuck in loops (particularly with OpenAI models) where the model keeps trying the same thing even when it doesn’t work. GPT-5 is supposed to handle these cases that require a replan much better – it’ll be interesting to dive into this more and see if that’s the case.

As a summary, this is still an incremental improvement (if any). It’s sad to see it still can’t count the letters in various fruit and I’m still mostly using claude-4 in cursor.


This content originally appeared on DEV Community and was authored by Robbie Heywood