The Seven-Petal Flower Test: ChatGPT’s Enduring Symmetry Struggle (and Why It’s Getting Better)

There’s a beloved Russian fairy tale about a magical Tsvetik-Semitsvetik — a little flower with exactly seven petals, each granting a wish.

The Seven-Petal Flower Test: ChatGPT’s Enduring Symmetry Struggle (and Why It’s Getting Better) For years, this innocent story has become an unexpected benchmark for testing multimodal AI models.

The prompt is brutally simple:

Draw a flower with exactly seven petals.”

And for the longest time, the answer was almost always… eight.

The Classic Failure Loop

Here’s how the ritual usually goes:

User asks ChatGPT (or similar model) to generate a seven-petal flower.
It produces a beautiful illustration… with 8 petals.
User: “Please count the petals in this image.”
Model confidently replies: “There are exactly seven petals, arranged evenly around the center. Starting from the top and going clockwise: 1, 2, 3, 4, 5, 6, 7.”
User uploads the same image back: “Count again.”
Model has an existential moment.

In one recent exchange, the model openly admitted its mistakes:

“I got carried away by the priming from the original request for seven petals. I performed a quick symmetry-based estimate instead of a careful recount. Upon closer inspection, there are indeed 8 petals. I was wrong and overconfident.”

It then correctly identified two core errors:

Priming bias — the request for seven petals influenced its visual interpretation.
Insufficient verification — it didn’t double-check the actual pixel output.

The Redemption Arc

The Seven-Petal Flower Test: ChatGPT’s Enduring Symmetry Struggle (and Why It’s Getting Better) When the user asked it to try again and generate a version with exactly seven petals, the model succeeded. A clean, symmetrical seven-petal flower appeared.

This is the interesting part. Earlier versions of these models would stubbornly keep generating eight (or nine) petals no matter how many times you asked.

Today, they can:

Recognize their own counting mistake when confronted with evidence.
Reflect on the psychological-like bias that caused it.
Correct the generation in the next attempt.

Why Does This Keep Happening?

The root cause is deceptively simple: training data.

Real flowers in nature and photography rarely have exactly seven petals. Common counts are 5 (roses, buttercups), 6, 8, or irregular numbers. Seven-petal flowers exist but are uncommon, so they’re underrepresented in image-text datasets. When the model tries to generate a “nice symmetrical flower,” it defaults to the most common even or aesthetically pleasing numbers it has seen.

Other models show similar quirks. One competitor (affectionately called “Banana” in the community) honestly notes: “I drew you a flower, but it ended up with 9 petals instead of 7. The model made a counting error.” Then it fails to fix it.

What This Tiny Test Actually Reveals

The seven-petal flower has become a charming stress test for several capabilities:

Precise visual counting;
Symmetry understanding;
Resistance to prompt priming;
Self-correction and metacognition.

It’s not a serious benchmark like GPQA or SWE-bench, but it’s delightfully human. It reminds us that even as frontier models crush complex reasoning, basic perceptual tasks can still trip them up in surprising ways.

Yet the progress is real. The ability to admit “I was wrong because of bias and sloppy checking” — and then fix it — shows genuine improvement in reliability and humility.

So the next time you feel like testing a new image model, don’t reach for math problems or coding challenges. Just say:

“Draw the seven-petaled flower.”

If it gets exactly seven petals on the first try… you’ll know the models have truly leveled up.

The Seven-Petal Flower Test: ChatGPT’s Enduring Symmetry Struggle (and Why It’s Getting Better)

The Classic Failure Loop

The Redemption Arc

Why Does This Keep Happening?

What This Tiny Test Actually Reveals

Subscribe to our newsletter