Building AI agents for business processes: a practical guide

There are a million articles about what AI agents are. This isn't one of them. This is about how to actually build one for a real business process, from start to finish, based on how I do it at BUILD+SHIP.

I'll use a real example throughout: an agent I built for a recruitment consultancy that processes incoming candidate CVs.

Step 1: define the job, not the technology

Before touching any code, I write a one-paragraph job description for the agent. Not a technical spec. A job description, like you'd write for a human.

For the CV agent: "Review incoming candidate CVs. Extract key information (skills, experience, location, salary expectations). Match against open roles. Score the fit. If above 70% match, draft an outreach email. If below, file and tag for future reference."

That's it. If you can't describe what the agent does in plain English, you're not ready to build it.

The recruitment company had been doing this manually. Two people, roughly 4 hours a day between them, processing about 60 CVs. Some days more, some days fewer. They weren't slow at it. They just had better things to do.

Step 2: map the inputs and outputs

What goes in? What comes out?

Inputs: PDF or Word document (the CV), plus a list of open roles with requirements (from their ATS).

Outputs: A structured record per CV containing extracted data, a match score per open role, and either a draft email or a filing action.

I document every field. Not loosely. Exactly. "Years of experience" means what? Total career length? Years in the relevant skill? Both? This is where vagueness kills you later.

The recruitment company and I spent about 2 hours going through 15 example CVs and defining exactly what "good extraction" looked like. They'd been doing it intuitively for years. Turning intuition into explicit rules is one of the hardest parts of agent building.

Step 3: build the simplest possible version

I'm allergic to overengineering on the first pass. The first version of any agent I build does the minimum.

For the CV agent, version 1 did three things:

Extract structured data from the CV (name, skills, experience summary, location)
Compare against open roles using a simple keyword match
Output a match score

No email drafting. No filing. No fancy scoring algorithm. Just: read the CV, pull out the basics, check it against the roles.

I built this in about 4 hours using Claude's API. The prompt was maybe 200 words. The surrounding code was a simple Python script that read from a folder and wrote to a spreadsheet.

It worked on about 80% of CVs first time. The other 20% had issues: unusual formatting, CVs in languages other than English, scanned PDFs that needed OCR first. I noted the failures and moved on.

Step 4: test with real data, not test data

This is where most tutorials fail you. They test with 5 clean examples and call it done.

I ran version 1 against the last 200 CVs the company had processed manually. Then I compared the agent's extraction against the human extraction. Field by field.

Results: name extraction was 99% accurate. Skills extraction was about 85%. Experience summary was 75% (the agent was interpreting "relevant experience" differently than the humans). Location was 98%. Match scoring was all over the place because the keyword matching was too simplistic.

This took a full day. Boring work. Essential work. If I'd shipped version 1 without this step, the recruitment team would've lost trust in it within a week.

Step 5: iterate on the failures

The 75% experience extraction was the biggest problem. I looked at the failures and found two patterns. First, the agent was counting total career years rather than years in the relevant field. Second, it was struggling with career breaks and non-linear career paths.

I rewrote that section of the prompt. Added explicit instructions about what counts as relevant experience. Included two examples of non-linear careers and how to handle them. Re-ran the 200-CV test.

Experience extraction went from 75% to 91%. Good enough for version 2.

For the match scoring, I replaced the keyword matching with a proper semantic comparison. Instead of "does the CV contain the word 'Python'?" it became "does the candidate's experience align with the requirements of this role?" This is where the language model earns its keep. Pattern matching is cheap. Judgement is where AI adds value.

Match scoring accuracy went from about 60% to 88%.

Step 6: add the human layer

Version 2 went live with a human review step. Every CV the agent processed went into a queue. A human reviewed the extraction and the match score, approved or corrected it, and only then did the system take action (draft email or file).

I track correction rates obsessively. Week 1: 22% correction rate. Week 2: 18%. Week 3: I made another round of prompt adjustments based on the patterns in the corrections. Week 4: 9% correction rate.

At 9%, the humans were spending about 30 seconds per CV instead of 4 minutes. The 4 hours of daily work dropped to about 45 minutes of review. Real, measured, defensible.

Step 7: automate the confident ones

After 6 weeks of tracking, I identified that CVs with a match score above 85% had a correction rate under 2%. So we automated those. The agent processes them end-to-end without human review. Everything below 85% still gets reviewed.

That took the human review time down to about 20 minutes a day. For 60 CVs.

The tools

People always ask about the stack. For this agent:

Python for the orchestration
Claude API for the language model
A simple SQLite database for tracking
PyPDF2 for PDF extraction (with a fallback to an OCR service for scanned documents)

Nothing exotic. The value isn't in the technology. It's in the prompt engineering, the testing methodology, and the feedback loop.

What I'd tell you before you start

Pick a process where you can measure the before and after. Build the stupidest version first. Test it on real data, lots of it. Put humans in the loop until you trust the outputs. Measure everything.

If you're looking at a business process right now and thinking "an agent could do that," it probably can. The question is whether it's worth the build and maintenance cost. That's something we help companies figure out through our agent strategy and build service.

The CV agent cost about £8k to build and deploy. It saves the company roughly £45k a year in labour reallocation. The maths worked. But it only worked because we picked the right process and built it properly.