Benchmarking the 2025 SOTA Text-to-Image Models: A Multi-Domain Quality Evaluation

Over the past few months, the market has begun to shift towards heavy, sometimes fast, and surprisingly powerful models. To establish the real state of affairs, I decided to conduct a full side-by-side benchmark of four models that are most often cited today as leaders in the Text-to-Image domain and are widely discussed: Z-Image-Turbo, Flux-2 Pro, Hunyuan Image 3, and Gemini 3 Pro Image (Nano Banana Pro).

Why these specific models? They consistently hold leading positions in several independent rankings from Artificial Analysis, which measures models' ability to handle visual complexity and compositional requirements, as well as LLM Arena, which signals the real popularity and maturity of models in production scenarios. Additionally, these models are very popular within the community and are frequently mentioned on civitai.com and AI groups on reddit.com.

1) Gemini 3 Pro Image (Nano Banana Pro) (top 1 Artificial Analysis / LLM Arena)
2) Flux-2 Pro (top 2 Artificial Analysis)
3) Hunyuan Image 3 (top 5 LLM Arena, open-source)
4) Z-Image-Turbo (top 10 Artificial Analysis, top 1 in citations among open-source)

In this article, I will compare their performance across four key domains:

1) Realism
2) Style
3) Text & Typography
4) Complex and Abstract Prompts. Hard prompts - everything that most often breaks models in real life.

For each domain, honest, side-by-side examples of images without post-processing will be shown to make it immediately clear: where the model provides stable results, where it starts to slack, and where it unexpectedly surpasses the others.

Realism

Man photo as if he is slipping off the building edge, the woman is reaching out,  use NYC as the style reference for the scene. High-quality action shot from the 1990s
Generate a scene where the character is hugging the person in the first photo from behind, wrapping their arms around to the front, and they are shown together in the shot.
Zoom in on the bee
A majestic, translucent blue whale, made of seawater with visible fish schools inside, swimming through fluffy white clouds at golden hour, sunlight creating a rainbow through its body, in a surreal and grand scene
Photo taken on an iPhone, without a clear plot and sense of composition, like a random shot. The photo is slightly overexposed due to sun or the uneven lighting. The angles are awkward, the composition is confusing, and the image as a whole has a deliberate banality. First-person GoPro perspective of a high-speed mountain bike descent through a dense, misty forest trail, camera slightly shaking with realistic motion blur and light jitter. Golden hour sunlight filters sharply through tall trees overhead, casting dynamic shadows across a narrow dirt path. Center frame: an ultra-detailed, dramatic close-up of a flying squirrel mid-glide, wings fully outstretched, fur ruffled by rushing air, eyes wide in a tense moment just before impact with the lens. The squirrel appears frozen in a split-second of chaos—hyper-real textures, intense eye contact. Background shows blurred foliage whipping past, airborne leaves swirling in the wind. Cinematic depth of field with razor-sharp focus on the squirrel’s face, everything else subtly trailing into dynamic blur. Shot feels visceral, fast, and heart-pounding—like a still from a 4K action cam at 120fps.
High-angle bird's-eye view shot of a female East Asian idol subject lying on the floor of a cluttered closet, strictly following the upside-down pose and anatomical structure shown in image_0.png. She is wearing a rich blue lace-overlay mini dress with a milkmaid bodice, sweetheart neckline, cap sleeves, and a lettuce hem. She wears heavy, knee-high red leather boots with a vertical front seam. Visible tattoos include a barbed wire band on the thigh and small stick-and-poke heart and key motifs on the chest. The floor is covered in piles of mixed textiles, tulle, and clothing. The background walls are painted yellow, featuring white wire shelving, semi-transparent plastic storage drawers, and a packed clothing rack. Lighting is overhead tungsten, creating a warm sepia, vintage 90s disposable camera filter look. The mood is exhausted, messy, and romantically grunge. Fairy grunge aesthetic, girl in a rich blue dress and red leather boots lying upside down in a cluttered closet with yellow walls, pose from image_0.png, sepia tone, high-angle shot. Negative prompt: minimalism, clean floor, bright daylight, cold lighting, organized, empty space, modern furniture, neon colors, HD digital look, glossy finish, wide angle, fisheye, distorted limbs, missing tattoos, incorrect pose
A 1:1 aspect ratio photorealistic close‑up image of freshly baked naan. On the surface of the fluffy, puffed‑up naan, the words You can do naan-thing appear in char marks. A bowl of curry is placed beside it.
Scene Mirror selfie in an otaku-style computer corner, blue color tone. ### Subject * Gender expression: female * Age: around 25 * Ethnicity: East Asian * Body type: slim, with a defined waist; natural body proportions * Skin tone: light neutral tone * Hairstyle:  * Length: waist-length hair  * Style: straight with slightly curled ends  * Color: medium brown * Pose:  * Stance: standing in a slight contrapposto pose  * Right hand: holding a smartphone in front of her face (identity hidden)  * Left arm: naturally hanging down alongside the torso  * Torso: body leaning slightly back; waist and abdomen exposed * Clothing:  * Top: light blue cropped knit cardigan, top two buttons fastened; a blue French-style bra faintly visible  * Bottom: denim ultra-short shorts, with a blue satin ribbon bow on each side of the hips  * Socks: blue and white horizontal striped over-the-knee socks  * Accessory: a blue cute mascot phone case ### Environment * Description: bedroom computer corner seen through a wall-mounted mirror * Furnishings:  * White desk  * Single monitor showing a soft blue wallpaper (no readable text)  * Mechanical keyboard with white keycaps on a blue desk mat  * Mouse on a small blue mouse pad  * PC tower on the right side with blue case lighting  * Three anime figures on or near the PC tower  * A poster of a pagoda on the wall  * Cat-shaped desk lamp with blue accents  * A transparent glass of water  * A tall green leafy plant by the window (on the left side of the frame) * Color replacement: replace all originally pink elements (clothes and room decor) with blue tones (baby blue to sky blue/periwinkle blue). ### Lighting * Light source: daylight coming from a large window on the left side of the camera, through sheer curtains * Light quality: soft, diffused light * White balance (K): 5200 ### Camera * Mode: smartphone rear camera shooting via the mirror (no portrait/bokeh mode) * Equivalent focal length (mm): 26 * Distances (m):  * Subject to mirror: 0.6  * Camera to mirror: 0.5 * Exposure:  * Aperture (f): 1.8  * ISO: 100  * Shutter speed (s): 0.01  * Exposure compensation (EV): -0.3 * Focus: focus on the torso and shorts in the mirror image * Depth of field: natural smartphone deep depth of field; background clearly visible with no artificial blur * Composition:  * Aspect ratio: 1:1  * Crop: from the top of the head to mid-thigh; include the desk, monitor, PC tower, and plant in the frame  * Angle: slightly high angle from the mirror’s point of view  * Composition note: keep the subject centered; to avoid wide-angle edge distortion, have her stand a bit further away and crop to a square later. ### Negative prompts * Any appearance of pink/magenta anywhere * Beauty filters/over-smoothed skin; poreless skin look * Exaggerated or distorted anatomy * , see-through fabrics, wardrobe malfunctions * Logos, brand names, or readable user interface text * Fake portrait-mode blur, CGI/illustration feel

Realism Evaluation - Key Observations

Based on a qualitative, side-by-side inspection of the generated images, several clear trends emerge in the realism domain. Gemini 3 Pro Image (Nano Banana Pro) and Z-Image-Turbo consistently stand out, producing images with higher perceptual realism, more natural lighting, and more convincing material and texture rendering.

Banana Pro will still demonstrate greater reliability when processing realistic queries, preserving a believable scene composition and avoiding common artifacts that typically disrupt immersion. However, Z-Image-Turbo generates fewer artifacts, and in this area (realism), it can be considered a reliable model.

Flux-2 Pro and Hunyuan Image 3 trail slightly behind. While their outputs remain competitive and visually strong overall, there are noticeable cases where they fall short in fine-grained realism - particularly in texture fidelity and photorealistic coherence - when compared to Banana Pro and Z-Image-Turbo.

Z-Image-Turbo deserves special mention given its extremely compact architecture (approximately 6M parameters). Despite its size, it delivers surprisingly strong realism performance, placing it firmly in second position in this category. While it does not fully match Banana Pro at the top, the gap is relatively small and impressive given the model scale.

Overall, the subjective realism ranking observed in this benchmark can be summarized as:
Banana Pro (Gemini 3 Pro Image → Z-Image-Turbo → Flux-2 Pro → Hunyuan Image 3.

Style

Full body Subject toy, attributes/accessories, expression, made of felt, in a place, lighting, friendly and cartoonish appearance, rich and soft textures
Dynamic anime key-visual of a cyberpunk girl jumping between skyscrapers, cel shading, glowing accents, clean stylized lines.
concept art hidden face, mysterious, hooded, full body, dark enigmatic warrior, standing, storm of falling shards, glowing ethereal giant heavy thick sword in hand, warrior, intricate armor, tattered cloak that billows in the wind, atmosphere is charged with an otherworldly energy, fantasy realism, dramatic lighting, detailed textures to bring the scene to life, digital artwork, illustrative, painterly, matte painting, highly detailed
A surreal, high-contrast portrait of an anonymous wanderer etched faintly onto a fractured, rain-slicked concrete wall in an abandoned alley. The face materializes from the crumbling gray texture like a ghost in the stone, eyes hollowed into piercing white voids that stare eternally forward, while the mouth dissolves into an endless negative space chasm a yawning black abyss that swallows light and sound, pulling the viewer's gaze into its silent hunger. Jagged cracks spiderweb across the cheeks and jaw, mimicking veins of forgotten memory, as faint drips of condensation trace ephemeral tears down the uneven surface. Subtle shadows pool in the recesses of the wall's imperfections, amplifying the void's depth, while a single sliver of overcast daylight glances off the edge, casting a brittle gleam on the surrounding grit. Fine motes of urban dust swirl lazily in the air, clinging to the damp stone like whispered secrets. The rest of the alley fades into inky obscurity, rendering the portrait an isolated relic of unspoken loss. Color tension: storm-gray concrete × abyssal black void × brittle pearl gleam. Style: raw urban surrealism, negative space abstraction, textured surface erosion, minimalist yet viscerally unsettling composition. Mood: silent existential dread veiled in indifferent decay.
retro_scifi_90s, retro_artstyle, retro, cyberpunk, The image is a digital illustration in a detailed, vibrant, high contrast, semi-realistic art style with 90s retro futuristic anime-inspired elements. Featuring a young woman with blonde hair and a serene expression. She has long, voluminous, wavy blonde hair cascading around her face and shoulders. Her face is small, cute and soft with full, slightly parted lips. She wears a high-tech, futuristic armor that covers her torso, with intricate metallic patterns and a prominent, circular blue light on her chest, suggesting advanced technology and possibly cyborg elements. Surrounding her, there are vivid, red roses, some in bloom and others in bud, creating a contrast of softness and hardness. The background appears to be a complex, mechanical environment with metallic structures and glowing lights, indicating a futuristic, perhaps post-apocalyptic setting. The composition is dramatic with strong filmic effects, color grading.
Create a digital artwork depicts a surreal, whimsical scene in which a white teacup and saucer sit on a dark surface. The teacup is filled with swirling white cloud-like waves that create a feeling of movement and fluidity. A small, detailed sailboat with a white sail and a dark hull emerges from the waves, creating an eye-catcher in the composition. The background is a rich black, which increases the contrast and clearly highlights the white elements. The lighting appears artificial, casting soft highlights and soft shadows, emphasizing the textures of the clouds and the smooth surface of the teacup. The image uses a shallow depth of field and a blurred background to keep the focus on the teacup and sailboat. The composition follows the rule of thirds, with the sailboat positioned slightly off-center, creating a balanced yet dynamic visual appeal. The overall style is dreamlike and imaginative, combining realistic elements with fantastical concepts.,highly detailed, aesthetic, great lighting, extremely detailed,  perfect composition, a vibrant tissue of hues and textures captured digitally, best quality, realistic, captivating, intricately detailed, sharp focus, high contrast, stylized, clear, whimsical, fantastic, splash art, intricate detailed, hyperdetailed, concept art, sharp focus, harmony, a masterpiece, award winning,pingtu style, illustration-fen, AquarelleIV
Baroque oil painting of a king in golden armor, heavy brushstrokes, warm candlelight, deep shadows.

Style-Focused Prompts — Key Observations

When evaluating performance on style-driven prompts, the ranking shifts noticeably compared to the realism domain. In this category, Flux-2 Pro emerges as the most consistent and expressive model. It demonstrates stronger stylistic diversity and produces visually richer results across a wide range of artistic directions, including anime-inspired, cyberpunk, and illustrative styles.

Gemini 3 Pro Image (Nano Banana Pro) shows weaker performance in this domain. In particular, it tends to underperform on highly stylized prompts -such as anime, cyberpunk, and heavily illustrated aesthetics - suggesting limited exposure to such styles during training. While Banana Pro can still produce acceptable results in some cases, its outputs are generally less expressive and less visually distinctive compared to Flux-2 Pro.

Both Hunyuan Image 3 and Z-Image-Turbo perform reasonably well on style-oriented prompts. They handle a variety of artistic directions with solid consistency and, in many cases, outperform Banana Pro on more unconventional or heavily stylized prompts. Z-Image-Turbo, in particular, shows surprisingly strong stylistic competence given its compact size, delivering stable results across multiple non-photorealistic domains.

That said, beyond the clear first position held by Flux-2 Pro, assigning a strict second, third, and fourth place becomes challenging. Performance varies depending on the specific style domain being tested, and subjective preference plays a significant role. A more accurate framing is to treat stylistic performance as domain-dependent rather than strictly rank-ordered.

In summary, Flux-2 Pro is the clear leader for style-heavy and creative prompts, while Hunyuan Image 3 and Z-Image-Turbo offer competitive and flexible stylistic performance. Banana Pro remains more reliable in realism-focused scenarios but is less suited for highly stylized or unconventional visual domains.

Text & Typography

A wide quote card featuring a famous person, with a brown background and a light-gold serif font for the quote: “Stay Hungry, Stay Foolish” and smaller text: “—Steve Jobs.” There is a large, subtle quotation mark before the text. The portrait of the person is on the left, the text on the right. The text occupies two-thirds of the image and the portrait one-third, with a slight gradient transition effect on the portrait.
Generate a map of USA in watercolor style, on which all federal states are labeled in ballpoint pen
Interior lifestyle shot of a realistic artificial Christmas tree in a cozy living room setting, daylight illumination, 8K hyperrealistic photography. The tree stands tall and symmetrical on a sturdy metal base, its lush green PVC branches full and naturally layered, awaiting decoration. Scene composition: The tree is positioned beside a white fireplace adorned with evergreen garlands, pinecones, and candles, with a matching wreath above. Warm yellow walls and wooden flooring create a comforting, festive atmosphere. Natural light streams through large French doors, softly highlighting the tree’s texture and subtle shadows. A green velvet pouf and minimalist side table with golden decor pieces complete the composition. Lighting: warm, diffused winter daylight — emphasizing realism, color depth, and natural softness of the needles. Color palette: deep evergreen, warm beige, ivory white, golden accents, and soft natural light tones. Mood: calm anticipation, cozy elegance, pre-holiday serenity — capturing the quiet beauty of seasonal preparation. Optional text elements for layout: – Header: “Bring the Holiday Spirit Home” in elegant serif (e.g., Playfair Display Bold). – Subtext: “Full and lifelike artificial pine for timeless festive charm.” in light sans-serif (e.g., Lato Light). – Tagline: “Ready to decorate. Built to last.” in small caps geometric sans-serif (e.g., Futura PT Medium).
Creative automotive advertisement for the Nissan X-Trail, executed in a surreal, high-concept visual style that blends adventure with humor and cinematic polish.Composition:A khaki-green Nissan X-Trail is parked in an urban environment with a futuristic city skyline in the background. The vehicle is clean and sharply lit, emphasizing its contours and robust design. On its roof, however, an enormous pile of wild jungle gear and a massive coiled snake spill outward — including ropes, camping tools, and a bicycle entangled in the creature’s coils. The juxtaposition transforms the sleek SUV into a symbol of wilderness adventure, visually representing the tagline “ADVENTURE INSIDE.”Typography & branding:– Tagline (bottom left): “NUOVO NISSAN X-TRAIL — ADVENTURE INSIDE” in modern sans-serif typography, balancing Italian elegance with rugged energy.– Nissan logo and slogan “Innovation that excites” placed in the top-right corner on a bright red square, grounding the image in the brand’s visual identity.Lighting & tone:Bright daylight illumination with soft urban reflections enhances realism. The color palette combines metallic greens, grays, and warm urban neutrals, emphasizing contrast between civilization and raw adventure.Mood: playful, daring, and imaginative — suggesting that even in the city, the X-Trail carries the spirit of the wild within.Tone: premium yet adventurous, balancing sophistication with off-road freedom.
Cinematic-style photograph, digital medium — High contrast, meticulous detail. Book excerpt promotional page (magazine-like). Title (top-center): "THE MAP OF SMALL THINGS" — Font: Times New Roman Black, color: midnight navy; Excerpt block (center, multi-line paragraph): "It was one of those late afternoons when the alleys seemed to forget the rest of the world..." — Font: Caslon, full paragraph (approx. 80–120 words) styled as a pull quote; Author bio (bottom-left): "— A. Keller" — small serif; CTA (bottom-right): "READ FIRST CHAPTER"
Café table with a branded takeaway cup. Printed sleeve text says: "BREEZE COFFEE ROASTERS — EST. 2004" in bold. Below smaller text: "Medium Roast • Caramel Notes • Smooth Finish". A crumpled receipt next to it shows printed line items: "Latte 12oz — $4.50", "Blueberry Muffin — $3.20", "Order #28719", "Thank you for supporting local!".
Messy home office desk with sticky notes, screen UI, and printed documents. Sticky notes show: "Call Sarah 3PM", "Finalize draft!", "Meeting moved to Friday". A printed document partially visible: "Quarterly Report — Q1 2025", "Revenue increased by 14%…", "Customer retention stable…". Laptop screen displays a file name: "proposal_final_v3.pdf" and toolbar text.

Text Rendering & Typography — Key Observations

Text-heavy prompts reveal the largest performance gap among the evaluated models. In this domain, Gemini 3 Pro Image (Nano Banana Pro) is the clear leader. It consistently produces clean, readable text with no visible artifacts, spurious symbols, or pseudo-glyphs. This advantage is especially pronounced in scenarios involving small or dense text, where many image models typically fail.

Beyond basic legibility, Banana Pro also demonstrates strong typographic composition. It selects appropriate and visually appealing fonts, maintains consistent letter shapes, and places text naturally within the image layout. This makes it particularly well-suited for use cases such as product cards, posters, and advertisement-like visuals, where text–image integration is critical.

Flux-2 Pro also performs strongly in this category. It generates text with high consistency and shows  no visual artifacts. While its typographic composition and layout quality are solid, it slightly lags behind Banana Pro in terms of stylistic variety and overall polish, especially in more design-oriented compositions.

In contrast, Hunyuan Image 3 and Z-Image-Turbo struggle noticeably with text rendering. Both models frequently introduce artifacts in the form of pseudo-characters or glyph-like symbols, particularly when dealing with small text. Between the two, Z-Image-Turbo performs marginally better, but both fall well behind Banana Pro and Flux-2 Pro in this domain.

Overall, the observed ranking for text and typography tasks is:
Banana Pro (Nano) → Flux-2 Pro → Z-Image-Turbo → Hunyuan Image 3.

Other & Complex & Abstract

King Arthur's sword fable, except it's a greatsword and instead of being stuck in a rock it's stuck between two rocks that look like buttcheeks
Infographic about the Jackson Laboratory en
Create a very high-detail infographic image that describes IT careers or IT jobs related to the SaaS
Create a single cinematic illustration that visually represents the following poem, capturing its emotions, metaphors, and atmosphere Good night mother,Good night father,Kiss your little son.Good night sister,Good night brother,Good night everyone.
An amateur photograph from 1998 of a middle-aged artist copying an image by hand from a computer screen to an oil painting on stretched canvas, but the image is itself the photo of the artist painting the recursive image
Generate a series of six candid, documentary-style photos of this Indonesian president in office, in the rice fields, and partying with other presidents.
Generate a 4‑panel comic about the hardships of an embedded engineer
wave-particle duality

Complex & Abstract Prompts — Key Observations

Performance differences become most pronounced when evaluating complex, abstract, and open-ended prompts. In this category, Gemini 3 Pro Image (Nano / Banana Pro) clearly dominates. Its strong world understanding and multimodal reasoning capabilities allow it to interpret prompts that require implicit reasoning, abstract interpretation, and non-literal composition.

Banana Pro consistently handles prompts where the visual outcome is not explicitly specified, successfully constructing coherent and meaningful scenes from high-level or ambiguous descriptions. In abstraction-heavy scenarios, it stands out by a significant margin, operating at a noticeably higher level of conceptual understanding than the other models evaluated.

Flux-2 Pro performs well in this domain and secures a confident second place. While not as consistently strong as Banana Pro, it often manages to reinterpret abstract prompts effectively and generate plausible compositions, particularly when some structural cues are present in the prompt.

In contrast, Z-Image-Turbo and Hunyuan Image 3 struggle with complex and abstract instructions. Both models frequently misinterpret the intent of the prompt or default to generic visual patterns, indicating limited capacity for high-level abstraction. The performance gap between these two is relatively small, making it difficult to assign a definitive third or fourth position. If forced to differentiate, Z-Image-Turbo appears to perform marginally better, but both remain significantly behind the top two models.

Overall, the observed ranking for complex and abstract prompt handling is:
Banana Pro (Nano) → Flux-2 Pro → Z-Image-Turbo ≈ Hunyuan Image 3.

Logo FlyMy.AI