When My Figure-Skating AI Called a World Champion's Triple Axel "Under-Rotated"

I gave my figure-skating AI the easiest test I could think of: a textbook triple axel from a world champion. A jump you'd put in a coaching manual. If anything on earth was a clean, fully-rotated landing, this was it.

The system flagged it as under-rotated. High confidence on the defect.

I build JumpOnion, an app that analyzes a figure-skating jump from a single phone video. This was mid-development — the world champion's jump wasn't a customer or a test kid, it was how I checked whether the rotation detector was telling the truth or just talking. That one wrong answer is the reason JumpOnion works the way it does today.

Visual thesis map

The easiest test failed

One camera could not prove the landing

Thin evidence crossed the old gate

Silence became a product rule

Signals split into trust tiers

How I was testing it

This was the rotation calibration phase. I ran eleven real skating videos end to end — pose detection, then rotation analysis, then the full analytics pass — to see where the detector agreed with reality and where it didn't. The world champion's jump was supposed to be the easy "yes." It came back a confident "no."

So I opened up the numbers behind that single call.

Diagram 01

input

single phone angle

A clean jump enters the detector as one flattened camera view.

evidence

14 of 44 frames

The landing frames that actually showed the blade were sparse.

artifact

roughly 99 degrees

A projection artifact looked like a rotation defect.

Why a single camera lies

The rotation confidence on that landing was 0.61. Only 14 of the 44 landing frames actually caught the blade clearly enough to measure. From that one phone angle, the blade-to-body angle at touchdown came out roughly 99 degrees apart — the blade looked like it hadn't caught up to the body, which the detector reads as "still rotating on the landing = under-rotated."

It wasn't under-rotated. It was a projection artifact. One camera, the skater's glide direction, and the angle of the blade edge together faked a number that doesn't exist in three dimensions. The jump was fully rotated. The camera just couldn't prove it, and the detector mistook "can't prove" for "didn't happen."

The cheap fix and the expensive one

The cheap fix was one threshold. The detector only diagnoses rotation when its confidence in the measurement clears a gate — and the champion's jump landed just barely over that line. So it squeaked through and got "diagnosed." I raised the gate. Now a borderline reading like that one falls short, the diagnosis is suppressed, and the app says nothing instead of saying something false.

The expensive fix was the principle I had to commit to: I would rather the app stay silent than be confidently wrong.

That sounds obvious until you watch a product do the opposite. Most analysis tools are built to always have an answer. I decided mine wasn't allowed to, unless it could back the answer up.

Diagram 02

borderline measurement

0.61

A number can look precise while the camera evidence is still thin.

gate

product behavior

Strong evidence: diagnosis allowed

Borderline evidence: observation only

Thin evidence: no verdict

Why silence wins here

Figure skating is a small world. A coach or a competitive skater tries the app once, it tells them a clean jump is broken, and that verdict travels the whole rink by the next practice. So when the app isn't sure, I'd rather it say nothing than risk being wrong in front of the one coach who never comes back.

So here's the rule I shipped: if the system can misread a world champion's textbook jump from a bad angle, then an ordinary kid's half-rotation or a non-standard attempt deserves more caution, not less.I calibrate against the hardest jump I can verify, and when the camera's evidence is thin, the app keeps quiet.

What that looks like in the product

Diagram 03

tier 1output

strong

High-confidence rotation signal can become a visible diagnosis.

tier 2output

observation

Risk language stays hedged when the evidence is useful but not decisive.

tier 3output

experimental

Raw angles stay internal until they earn trust frame by frame.

Rotation diagnosis only fires above the confidence gate. Below it, the output stays an observation, not a verdict.
I split the signals into tiers: strong (high-confidence rotation), observation (risk levels, hedged), and experimental (raw angles I don't trust yet and don't show as conclusions).
I calibrate with jumps I can actually check frame by frame — including my own kids', who skate, so I know the ground truth before the model says a word.

The part of JumpOnion that matters most isn't one you can see in a screenshot. It's the answer the app decided not to give.