What makes a good user story (INVEST applied, with

User stories are simultaneously the most-written and most-skipped artefact in agile. Most teams write them because the template requires it. Most teams ignore them in standup because they don’t actually shape the work.

Here’s the practical version. The INVEST checklist most agile training references but rarely teaches in concrete terms. What good and bad stories look like side-by-side. The role-template debate. And the rule that fixes 80% of ticket-quality problems with one habit.

The “as a… I want… so that…” template — when it helps and when it lies

The classic format:

As a [role], I want [feature] so that [benefit].

This template is useful in two cases:

The team is unclear who the work is for. Forcing the role makes you face the question.
The team is unclear why the work matters. Forcing the “so that” makes you face the question.

The template is useless in three cases:

The role and benefit are obvious. “As a logged-in user, I want to log out, so that I can stop being logged in” — yes, fine, we all knew. The template adds noise without adding signal.
The role and benefit are fake. Engineers writing stories often invent a plausible-sounding role/benefit to satisfy the format, then ignore both. “As a system administrator, I want database connection pooling, so that performance is improved.” Nobody is checking whether the system administrator wanted that. It’s an internal task.
The story is a chore, not a feature. “Upgrade Rails to 7.1.” There’s no user-facing benefit; the benefit is internal. Forcing user-story format makes the writer lie.

The fix: write whatever clearly conveys the work. Use the template when it sharpens thinking, drop it when it adds ceremony.

INVEST, applied

INVEST is the most-cited story-quality acronym: Independent, Negotiable, Valuable, Estimable, Small, Testable. Most training presents it abstractly. Here’s the practical version of each, with the question to ask.

I — Independent

Can this story be worked on in any order, without depending on a separate not-yet-done story?

Test: “If we picked this story alone, with nothing else from the backlog, could we finish it?”

If no, the dependency needs to be explicit (linked tickets) or the story needs to be merged with its dependency. Hidden dependencies are how mid-sprint discoveries kill goal coherence.

Common failure: “Add caching layer” depends on “Migrate to new Postgres instance” but doesn’t say so. Mid-sprint, the engineer realises the order matters. Sprint goal slips.

N — Negotiable

Is the outcome fixed but the implementation open? Or is the story a step-by-step prescription?

Test: “Could two different engineers deliver this story two different ways and both satisfy the goal?”

If no, the story is a task list disguised as a story. That’s fine sometimes (compliance work, regression tests) but it’s not a story — it’s a recipe. Don’t pretend it’s open when it isn’t.

Common failure: stories with 14 acceptance criteria that read like an implementation plan. The team has no room to find a better solution.

V — Valuable

Does the story produce something a stakeholder can name as a value?

Test: “If we shipped only this story this sprint, could we point to a benefit and a beneficiary?”

If no, the story is probably a sub-task of something else. Either merge it with the parent story or be honest that it’s enabling work, not deliverable work, and accept that it shouldn’t be the headline of the sprint.

Common failure: “Refactor checkout controller” — by itself, no user sees a difference. It’s enabling. The valuable story is “reduce checkout error rate to 0.5%” and the refactor is one possible path to it.

E — Estimable

Can the team agree on a rough size?

Test: “Three engineers look at this and say small/medium/large. Do they agree within one bucket?”

If no, the story has too much uncertainty. The team doesn’t know enough to commit. Either invest in refinement (research, prototype, breakdown) or don’t put it in the sprint.

Common failure: “Add real-time notifications” — without spec on where, how, what triggers them, what the failure mode is. Estimates wander between “1 day” and “3 sprints.”

S — Small

Does the story fit comfortably in one sprint? Ideally in 2-5 days?

Test: “Can a single engineer (or pair) finish this within the sprint, even with normal interruptions?”

If no, split it. Stories larger than half a sprint are the single biggest reason teams miss goals — they can’t be paused, parallelised, or de-risked.

Common failure: stories sized at “13 points” or “20 points” on a 30-point sprint. They consume the team for two weeks; if anything goes wrong, the sprint is gone.

T — Testable

Is there a clear, observable check that determines whether the story is done?

Test: “Can the engineer demo done-ness without explanation?”

If no, the acceptance criteria need work. “Should feel snappy” isn’t testable. “Page loads under 800ms p95” is.

Common failure: tickets where “done” is whatever the engineer thinks looks right. PR review becomes the only check, and reviewers don’t have time to validate functional behaviour.

Side-by-side: bad story → good story

Example 1 — feature

Bad:

Add user profile picture support.

Good:

As an active user, I want to upload a profile picture so my comments are recognisable in busy threads.

Acceptance:

Upload PNG/JPG up to 2MB from the profile settings page

Picture appears next to my display name in comments within 30 seconds of upload

Old picture replaced (no orphan storage)

Default avatar shown if no upload

Non-goals: cropping UI, multiple pictures, animated images.

The good version is testable, small (1-2 days), valuable, has clear scope. The “non-goals” line saves an hour of debate.

Example 2 — infrastructure / platform

Bad:

Improve API performance.

Good:

Reduce P95 latency on GET /api/v1/issues from 850ms to under 300ms.

Acceptance:

24-hour rolling P95 under 300ms in production

No degradation of P99 (currently 1.4s, must stay ≤ 1.6s)

No regressions on /api/v1/issues/* sibling endpoints

Implementation guidance: probably the N+1 on assignee preload, but the engineer is free to pick.

The good version has a concrete outcome, measurable acceptance, and negotiability — engineer picks the implementation.

Example 3 — bug

Bad:

Login is broken sometimes.

Good:

Login times out for 4-6% of users at peak hours (2-4pm UK), specifically users with > 500 sprints in their account.

Acceptance:

Login under 2 seconds at the 95th percentile, peak hours included

Reproducible in staging with seeded data; reproduction script committed

Root cause identified and called out in PR description

Test data: see staging account loadtest-1.

The good version is reproducible, has a clear measurable, includes a non-obvious detail (segment + time of day) that prevents the engineer from chasing the wrong cause.

The 30-second rule that fixes 80% of ticket quality

Before you accept a story into the sprint, anyone on the team should be able to read it in under 30 seconds and answer:

What’s the outcome?
How will we know it’s done?
What’s not in scope?

If any answer takes longer than 5 seconds to find in the ticket, the ticket isn’t ready. Rewrite it now (15 minutes) or cut it from the sprint. You’ll save 4x that time in the sprint.

Three follow-on consequences once teams adopt this:

Refinement becomes 30-min sessions, not 90-min ones, because most stories show up already structured.
Sprint planning takes half as long, because the team isn’t writing tickets at planning.
Mid-sprint “wait, what does this mean?” conversations drop ~70%.

Anti-patterns to kill on sight

A few story patterns that always indicate bad inputs to the sprint:

The dump-paragraph story. 400 words of context, no acceptance criteria, no clear scope. Usually a stakeholder pasted in a Slack message and someone made a ticket out of it.

The 17-acceptance-criteria story. Either the story is too big (split it) or the criteria are masquerading as a spec (push them into a linked design doc and reference it).

The “TBD” story. “Acceptance criteria: TBD.” The story isn’t ready. Don’t sprint-start with TBD tickets.

The chained-bug-fixes story. “Fix X, Y, Z, A, B, C.” Each is its own ticket. A story with five symptoms in the title is five stories.

The “we’ll figure it out” story. The engineer says “I’ll work it out as I go.” Sometimes valid for spikes (call it a spike, time-box it). Almost always invalid for feature work.

How to run a 5-minute story-quality check at planning

If your team’s stories are inconsistent, add this to sprint planning:

For each candidate story:

Read the title aloud. If anyone goes “wait, what?”, flag it.
One person reads the acceptance criteria. If they can’t read them in 30 seconds, flag it.
Ask “what’s not in scope?” If the answer is “nothing was specified,” flag it.

Flagged stories don’t enter the sprint until they’re rewritten. This sounds aggressive. It saves more time than it costs in week one.

The summary

Use the role-template when it sharpens thinking; drop it when it adds noise.
INVEST is decisive when applied concretely — Independent, Negotiable, Valuable, Estimable, Small, Testable.
The 30-second test (outcome, done-test, out-of-scope) catches 80% of bad tickets.
Five anti-patterns to refuse: dump-paragraph, 17-criteria, TBD, chained-bugs, “we’ll figure it out.”

Stories aren’t a documentation artefact — they’re an alignment tool. If they don’t align the team, they’re failing the only job they have.

Tracking story sizes against velocity? SprintFlint calculates rolling velocity, points completed, and goal-hit rate automatically — no spreadsheet drift. Try it free — 300 tickets, no card required.

What makes a good user story (INVEST applied, with side-by-side examples)

The “as a… I want… so that…” template — when it helps and when it lies

INVEST, applied

I — Independent

N — Negotiable

V — Valuable

E — Estimable

S — Small

T — Testable

Side-by-side: bad story → good story

Example 1 — feature

Example 2 — infrastructure / platform

Example 3 — bug

The 30-second rule that fixes 80% of ticket quality

Anti-patterns to kill on sight

How to run a 5-minute story-quality check at planning

The summary

Stop estimating in hours.