What I Learned Spending 40+ Hours Building a Custom GPT

Oct 3

I created a custom GPT for what seemed like a simple reason: I needed a faster way to deliver my GTM Org Readiness Assessment to founders. The assessment is comprehensive—38 questions across six pillars that diagnose where early-stage SaaS companies break down as they try to scale beyond founder-led sales.

Originally, I planned to build it as an online survey. Clean, straightforward, functional. Then ChatGPT felt like the obvious answer: a custom GPT could walk founders through the assessment conversationally, score it, and deliver personalized recommendations.

The basics came together fast. Anyone can build a GPT. The interface is intuitive, the setup is simple, and you can knock out a working version in under an hour.

But there’s a massive gap between working and useful.

When simple becomes complex

Shortly into the build I changed the goal. I wanted a virtual version of me: a consultant that could do meaningful discovery, recognize whether a founder needed strategic guidance or tactical help, and adapt accordingly.

That’s where it got complicated.

Every founder’s situation is different. Some are still driving 75% of deals themselves. Others have hired a sales leader but can’t step back. The GPT needed to understand these contexts without me there to steer it. It had to ask smart follow-ups, identify root causes (not just symptoms), and avoid the generic advice that makes people tune out.

Anyone can walk through a house and point out cracks in the foundation. A good inspector knows which cracks signal structural risk, what to ask about history, and whether you need immediate repairs or routine maintenance. The assessment was my inspection checklist. The GPT needed to be the inspector.

It also had to administer a structured 38-question assessment with specific scoring anchors, calculate pillar totals, generate an overall readiness score, and map founders to the right maturity band (At Risk, Emerging, Scaling, Ready to Scale). Then translate that into specific, actionable recommendations—without overwhelming the reader.

And it needed to sound like me, not a robot reading a script.

Falling down the prompt-engineering rabbit hole

This is where I spent 40+ hours: writing and rewriting the system prompt.

It had to be detailed enough to handle the complexity above, but not so bloated it slowed response times. Guardrails needed to be strict enough to stop drift and boilerplate, but flexible enough to handle edge cases. I kept running scenarios, testing, and tweaking.

A founder says pipeline is inconsistent → the GPT asks about ICP definition. Good.

Another mentions high churn → it probes onboarding and CS alignment. Good.

Someone asks a simple tactical question → it launches into full strategic discovery. Not good.

Fix, test, break, repeat.

Somewhere along the way I stopped thinking of it as a “GPT” and started treating it like a product. That shift raised the bar: I wanted it to match the quality of a real consulting engagement. I chased perfect—exactly the trap I warn founders about. I knew better, but I struggled to ship anything that didn’t meet my (unrealistic) standard.

What I wish I’d known about building a GPT

You’ll spend 95% of your time testing and editing your prompt.

Setup is trivial. The work is scenario testing, finding where your instructions break, and refining until responses are consistently useful. I ran dozens of test conversations and edited the prompt after each one. That’s the job of AI prompt testing.

Optimize for the right model—and plan fallbacks.

I burned hours on GPT-4.1 before simply asking which model fit my use case. For me, complex strategic questions ran better on GPT-5, with GPT-4o handling lighter tasks. I added fallback logic to route accordingly. Should’ve done that on day one.

Cross-platform testing is non-negotiable.

What works through the API can wobble in the web UI. I’d love a response, update the prompt, then watch the web version behave differently. Don’t assume consistency. Test where your users will actually interact with it.

Perfect doesn’t exist.

I wanted the GPT to perform exactly like I would on a consulting call. That’s not realistic. At some point, 85% accuracy is valuable—especially compared to no tool at all. Useful beats flawless.

I’m still not fully comfortable with that trade-off. I shipped it anyway.

The unexpected lesson

I built this to accelerate delivery of my assessment—a shortcut. Instead, it forced me to codify my entire consulting methodology in precise, repeatable terms. Every decision tree, every diagnostic question, every interpretive framework had to be translated into instructions a machine could follow.

The custom GPT became a mirror for my process. That alone was worth the effort. It’s done, and it works well enough.

Final thoughts

If you’re thinking about building a custom GPT for your business use case: do it. It’ll teach you more about how GPTs actually work than any article ever will. The messy process of getting it wrong, fixing it, and getting it wrong again is where the learning happens. You can read about prompt engineering all day, but until you watch your own instructions fail in real time, you won’t really understand what these tools can and can’t do.

Start simple; don’t underestimate the work required to make it useful; budget most of your time for prompt engineering and testing; and when you catch yourself chasing perfect, ship.

I wasn’t expecting to spend 40+ hours building this thing. But I understand GPTs now in a way I didn’t before—which should make building the next one that much easier (I hope). That’s worth something.

Patrick Cowan