How to estimate story points: the only practical guide you'll actually use

Most story points content is vague training-deck filler. Here's the practical version — what story points actually measure, the modified-Fibonacci scale that works, the baseline-setting step most teams skip, the 30-seconds-per-ticket loop, six patterns for splitting too-big stories, and what AI estimates are worth.

March 15, 2026  ·  10 min read  ·  SprintFlint Team

Most “how to estimate story points” content reads like a corporate training course: vague principles, planning poker rituals, and 1,500 words that don’t help on a Monday morning. This post is the practical version. By the end you’ll have a way to estimate that your team will actually use, the relative scale that works in practice, and an honest take on the rituals (planning poker, t-shirt sizing) — what’s worth keeping and what to skip.

What story points actually measure

Story points represent relative effort, not time. A 5-point story is roughly five times as much work as a 1-point story. Whether that takes a day, a week, or two engineers depends on the team — and that’s the point. The number says “this much, relative to that other thing.”

The reason teams adopted points over hours: humans are bad at predicting how long something will take but reasonably good at saying “this is bigger than that.” Removing the time-clock pressure also removes the temptation to pad.

Effort here is composed of three things, fused into one number:

  1. Volume — how many units of work
  2. Complexity — how much novelty or technical difficulty
  3. Uncertainty — how confident is the estimate itself

A 5-point story might be a chunky-but-known piece of work, or a small-but-novel investigation. Both deserve the same number because both will take roughly the same effort.

The scale that actually works: modified Fibonacci

The accepted scale: 1, 2, 3, 5, 8, 13, ?

Why these numbers? Because the gaps grow as estimates get bigger, which mirrors human uncertainty. You can usually tell the difference between a 1 and a 2. You probably can’t tell the difference between an 11 and a 12. Forcing the team into 8 vs 13 is more honest.

The “?” replaces 21+. If a story feels bigger than 13, it isn’t a story — it’s an epic, and you should split it before committing to it. (See: story splitting, below.)

Some teams use t-shirt sizes (XS, S, M, L, XL) instead of points. Functionally the same. Use whichever your team prefers — fight battles that matter.

How to set the baseline (this is what most teams skip)

Most failed estimation comes from skipping the baseline-setting step. The team’s points only mean something to themselves — they’re a relative scale. So before you estimate anything, you need an anchor.

The cheapest baseline-setting exercise:

  1. Pull 10 representative tickets from the last two sprints
  2. Ask: “which one was the smallest? Call that 1.”
  3. Ask: “is each remaining ticket about 1×, 2×, 3×, 5×, 8×, or 13× as big?”
  4. Don’t argue about specific numbers — argue about which bucket
  5. Write it down and pin it in the team channel

Now you have an anchor: the team knows what a “3” looks like. Future estimation references this. After 5 sprints, the anchor should drift to whatever the team’s consensus has become — that’s fine.

Without this step, every team member estimates against their own private scale and the points are noise. With it, you have something useful within 2 sprints.

The actual estimation: 3 questions, 30 seconds per ticket

This is the practical loop. For each ticket up for estimation:

Q1: Is this similar to anything we’ve done?
If yes, point it like that thing. Done.

Q2: If no, what would it take?
Each engineer says one of: “trivial / small / medium / large / huge / can’t tell.”

Q3: Convergence check.
If everyone said the same: point it (1/2/3/5/8/13). Move on.

If estimates diverge by more than one bucket, the highest and lowest estimator each give a 30-second reason. This is where the real value comes — usually the high estimator knows about a hidden complexity, or the low estimator knows about an existing utility that solves it. After they explain, re-estimate. Don’t average. Pick a number, move on.

This is essentially planning poker, just stripped of the cards and the hour-long ceremony. A 6-person team can estimate 15 tickets in 25 minutes this way.

What planning poker is good at, and where it goes wrong

Planning poker — where everyone reveals their estimate at the same time on a card — is genuinely useful for one specific reason: it prevents anchoring. If a senior engineer says “this is a 5” out loud, juniors will agree even if they think it’s an 8. Hidden reveal stops that.

It goes wrong when:

  • It becomes a 90-minute ceremony for 12 tickets
  • The team argues about “5 vs 8” instead of moving on
  • Estimators fold to the most senior person rather than holding their estimate when they have insight

If your team is junior-heavy and prone to deferring to seniority, keep planning poker as a hidden-reveal mechanism. If your team is experienced and the dynamic is healthy, drop the cards — just have everyone say a number out loud at the same time.

Splitting stories that are too big

Anything 13+ is too big for a sprint. Split it.

Six practical splits, in order of preference:

1. Split by happy path vs edge cases. “Implement login” → “Login (happy path)” + “Login (forgot password / locked / expired token edge cases).”

2. Split by user role. “Build admin dashboard” → “Admin dashboard for super-admin” + “Admin dashboard for org-admin.”

3. Split by data input. “Import from CSV” → “Import schema A (most users)” + “Import schema B (10% of users).”

4. Split by interface layer. “Build the API” → “API endpoint” + “Frontend wiring” + “Background job for the slow part.”

5. Split by validation. “Add the new flow” → “Add the flow” + “Add validation + error handling for the flow.”

6. Split by experiment. “Build the new feature” → “Build it behind a feature flag for 5% of users” + “Roll out + monitor + clean up.”

If you can’t split a 13-pointer into smaller pieces using one of these patterns, the work isn’t actually understood yet. Add a 1-point spike: “investigate X, propose split.”

Things that are NOT story points

Common confusions worth naming:

  • Time. A 5-pointer isn’t 5 hours or 5 days. The team’s velocity tells you how points convert to time.
  • Difficulty. A novel-but-small problem can be 8 points; a tedious-but-easy migration can be 8 points. Same number, different texture.
  • Productivity score. Velocity is a forecasting tool, not a manager’s whip. The moment “we need to push velocity up” enters standup, your numbers stop meaning anything.

What about new teams with no history?

You don’t have a baseline. Three options:

Option A: Borrow. Use scale-modified-Fibonacci anchored to “a typical small bug fix is a 2.” Point everything relative to that. You’ll calibrate within 3-4 sprints.

Option B: Hours-first. For the first two sprints, estimate in ideal hours. Treat 1 day = 1 point, then transition to relative pointing once you have anchors. This gets you forecasting faster but trains the team to think time-first, which is the wrong mental model long-term.

Option C: Don’t estimate. Use #NoEstimates: track ticket throughput per sprint instead. Works for steady streams of similar-sized tickets. Less useful for forecasting feature delivery.

Option A is what most teams should do. Tell the team explicitly: “we’re recalibrating our scale every retro for the first 5 sprints, then it stabilises.”

What about AI-generated estimates?

LLMs can estimate stories from the description text. They’re surprisingly OK at it — usually within one bucket of what the team would say.

Useful as a sanity check (“the AI says 5, I estimated 8 — what am I missing?”). Not a replacement for the team’s judgement, because the AI doesn’t know your codebase, your team’s velocity history, or what’s “trivial here because we have a helper for it.”

SprintFlint does AI-suggested estimates on every ticket. The team can override; the suggestion is for the conversation.

TL;DR

  • Story points = relative effort, fusing volume + complexity + uncertainty.
  • Use 1, 2, 3, 5, 8, 13, ?. Anything bigger gets split.
  • Set a baseline before you start: pin 5-10 reference tickets to the channel.
  • 30 seconds per ticket: similar to past work? if yes, point it. if no, vote, divergence triggers a 30-second discussion, repick, move on.
  • Planning poker is useful only if your team has a deferring-to-seniority dynamic. Otherwise drop it.
  • Six story-splitting patterns; if none fits, you don’t understand the work yet.
  • Don’t confuse points with time, with difficulty, or with productivity.

Estimating with something other than story points? Agile estimation techniques compared puts story points side-by-side with planning poker, t-shirt sizing, ideal hours, and no-estimates — what each surfaces, where each breaks, and how to switch without burning a sprint.

Sprint workflows, without the spreadsheet sprawl.

SprintFlint runs your sprints with velocity, capacity, burndown, retros, and forecasting built in. First 300 tickets free, no credit card.