Most “how to estimate story points” content reads like a corporate training course: vague principles, planning poker rituals, and 1,500 words that don’t help on a Monday morning. This post is the practical version. By the end you’ll have a way to estimate that your team will actually use, the relative scale that works in practice, and an honest take on the rituals (planning poker, t-shirt sizing) — what’s worth keeping and what to skip.
What story points actually measure
Story points represent relative effort, not time. A 5-point story is roughly five times as much work as a 1-point story. Whether that takes a day, a week, or two engineers depends on the team — and that’s the point. The number says “this much, relative to that other thing.”
The reason teams adopted points over hours: humans are bad at predicting how long something will take but reasonably good at saying “this is bigger than that.” Removing the time-clock pressure also removes the temptation to pad.
Effort here is composed of three things, fused into one number:
- Volume — how many units of work
- Complexity — how much novelty or technical difficulty
- Uncertainty — how confident is the estimate itself
A 5-point story might be a chunky-but-known piece of work, or a small-but-novel investigation. Both deserve the same number because both will take roughly the same effort.
The scale that actually works: modified Fibonacci
The accepted scale: 1, 2, 3, 5, 8, 13, ?
Why these numbers? Because the gaps grow as estimates get bigger, which mirrors human uncertainty. You can usually tell the difference between a 1 and a 2. You probably can’t tell the difference between an 11 and a 12. Forcing the team into 8 vs 13 is more honest.
The “?” replaces 21+. If a story feels bigger than 13, it isn’t a story — it’s an epic, and you should split it before committing to it. (See: story splitting, below.)
Some teams use t-shirt sizes (XS, S, M, L, XL) instead of points. Functionally the same. Use whichever your team prefers — fight battles that matter.
How to set the baseline (this is what most teams skip)
Most failed estimation comes from skipping the baseline-setting step. The team’s points only mean something to themselves — they’re a relative scale. So before you estimate anything, you need an anchor.
The cheapest baseline-setting exercise:
- Pull 10 representative tickets from the last two sprints
- Ask: “which one was the smallest? Call that 1.”
- Ask: “is each remaining ticket about 1×, 2×, 3×, 5×, 8×, or 13× as big?”
- Don’t argue about specific numbers — argue about which bucket
- Write it down and pin it in the team channel
Now you have an anchor: the team knows what a “3” looks like. Future estimation references this. After 5 sprints, the anchor should drift to whatever the team’s consensus has become — that’s fine.
Without this step, every team member estimates against their own private scale and the points are noise. With it, you have something useful within 2 sprints.
The actual estimation: 3 questions, 30 seconds per ticket
This is the practical loop. For each ticket up for estimation:
Q1: Is this similar to anything we’ve done?
If yes, point it like that thing. Done.
Q2: If no, what would it take?
Each engineer says one of: “trivial / small / medium / large / huge / can’t tell.”
Q3: Convergence check.
If everyone said the same: point it (1/2/3/5/8/13). Move on.
If estimates diverge by more than one bucket, the highest and lowest estimator each give a 30-second reason. This is where the real value comes — usually the high estimator knows about a hidden complexity, or the low estimator knows about an existing utility that solves it. After they explain, re-estimate. Don’t average. Pick a number, move on.
This is essentially planning poker, just stripped of the cards and the hour-long ceremony. A 6-person team can estimate 15 tickets in 25 minutes this way.
What planning poker is good at, and where it goes wrong
Planning poker — where everyone reveals their estimate at the same time on a card — is genuinely useful for one specific reason: it prevents anchoring. If a senior engineer says “this is a 5” out loud, juniors will agree even if they think it’s an 8. Hidden reveal stops that.
It goes wrong when:
- It becomes a 90-minute ceremony for 12 tickets
- The team argues about “5 vs 8” instead of moving on
- Estimators fold to the most senior person rather than holding their estimate when they have insight
If your team is junior-heavy and prone to deferring to seniority, keep planning poker as a hidden-reveal mechanism. If your team is experienced and the dynamic is healthy, drop the cards — just have everyone say a number out loud at the same time.
Splitting stories that are too big
Anything 13+ is too big for a sprint. Split it.
Six practical splits, in order of preference:
1. Split by happy path vs edge cases. “Implement login” → “Login (happy path)” + “Login (forgot password / locked / expired token edge cases).”
2. Split by user role. “Build admin dashboard” → “Admin dashboard for super-admin” + “Admin dashboard for org-admin.”
3. Split by data input. “Import from CSV” → “Import schema A (most users)” + “Import schema B (10% of users).”
4. Split by interface layer. “Build the API” → “API endpoint” + “Frontend wiring” + “Background job for the slow part.”
5. Split by validation. “Add the new flow” → “Add the flow” + “Add validation + error handling for the flow.”
6. Split by experiment. “Build the new feature” → “Build it behind a feature flag for 5% of users” + “Roll out + monitor + clean up.”
If you can’t split a 13-pointer into smaller pieces using one of these patterns, the work isn’t actually understood yet. Add a 1-point spike: “investigate X, propose split.”
Things that are NOT story points
Common confusions worth naming:
- Time. A 5-pointer isn’t 5 hours or 5 days. The team’s velocity tells you how points convert to time.
- Difficulty. A novel-but-small problem can be 8 points; a tedious-but-easy migration can be 8 points. Same number, different texture.
- Productivity score. Velocity is a forecasting tool, not a manager’s whip. The moment “we need to push velocity up” enters standup, your numbers stop meaning anything.
What about new teams with no history?
You don’t have a baseline. Three options:
Option A: Borrow. Use scale-modified-Fibonacci anchored to “a typical small bug fix is a 2.” Point everything relative to that. You’ll calibrate within 3-4 sprints.
Option B: Hours-first. For the first two sprints, estimate in ideal hours. Treat 1 day = 1 point, then transition to relative pointing once you have anchors. This gets you forecasting faster but trains the team to think time-first, which is the wrong mental model long-term.
Option C: Don’t estimate. Use #NoEstimates: track ticket throughput per sprint instead. Works for steady streams of similar-sized tickets. Less useful for forecasting feature delivery.
Option A is what most teams should do. Tell the team explicitly: “we’re recalibrating our scale every retro for the first 5 sprints, then it stabilises.”
What about AI-generated estimates?
LLMs can estimate stories from the description text. They’re surprisingly OK at it — usually within one bucket of what the team would say.
Useful as a sanity check (“the AI says 5, I estimated 8 — what am I missing?”). Not a replacement for the team’s judgement, because the AI doesn’t know your codebase, your team’s velocity history, or what’s “trivial here because we have a helper for it.”
SprintFlint does AI-suggested estimates on every ticket. The team can override; the suggestion is for the conversation.
TL;DR
- Story points = relative effort, fusing volume + complexity + uncertainty.
- Use 1, 2, 3, 5, 8, 13, ?. Anything bigger gets split.
- Set a baseline before you start: pin 5-10 reference tickets to the channel.
- 30 seconds per ticket: similar to past work? if yes, point it. if no, vote, divergence triggers a 30-second discussion, repick, move on.
- Planning poker is useful only if your team has a deferring-to-seniority dynamic. Otherwise drop it.
- Six story-splitting patterns; if none fits, you don’t understand the work yet.
- Don’t confuse points with time, with difficulty, or with productivity.