AI Website Building Competition: Which AI Model Builds the Best Website?

The Experiment

We gave 12 of the most popular AI models on OpenRouter the exact same prompt: “Build a professional plumbing company website for Sydney Pro Plumbing.” Each model had to create a homepage and an emergency plumbing services page � completely self-contained HTML with inline CSS, no frameworks, no external dependencies.

Then we compared the results on design quality, content completeness, and cost.

The prompt was identical for all models. No advantages, no tweaking. Pure model capability.

The Winners

Best Overall Design: Claude Sonnet 4.6 � 8.8/10

The most complete and polished website in the competition. Every section � hero, features, services grid, testimonials, service areas, CTA, footer � was fully designed with excellent visual hierarchy. Professional blue/orange color scheme that inspires trust. Cost: $0.48

Best Value: MiniMax M2.7 � 8.6/10 for just $0.04

Nearly tied for first place in design quality at a fraction of the cost. Stunning dark navy/teal/gold palette that stood apart from every other entry. A hidden gem.

Best Free Model: Step 3.5 Flash � 7.35/10 for $0.00

Delivered a solid, professional website for literally zero cost. All required sections present with clean design.

Full Leaderboard

Rank	Model	Score	Cost	Tokens	Time
1	Claude Sonnet 4.6	8.80/10	$0.482	32,821	313s
2	MiniMax M2.7	8.60/10	$0.039	32,759	405s
3	Gemini 3 Flash	8.35/10	$0.028	9,887	47s
4	MiniMax M2.5	8.00/10	$0.026	23,161	283s
5	DeepSeek V3.2	7.90/10	$0.006	17,069	136s
6	MiMo V2 Pro	7.75/10	$0.070	23,892	262s
7	MiMo V2 Omni	7.70/10	$0.035	18,057	135s
8	Step 3.5 Flash (Free)	7.35/10	FREE	17,243	139s
9	Nemotron 3 Super (Free)	7.00/10	FREE	26,721	615s
10	Grok 4.1 Fast	6.40/10	$0.006	11,976	50s
11	Claude Opus 4.6	5.80/10	$0.804	32,821	320s
12	GLM 5 Turbo	5.40/10	$0.129	32,741	401s

Key Findings

1. Price Does NOT Equal Quality

Claude Opus 4.6 was the most expensive model at $0.80 per website � yet it scored near the bottom (5.8/10) due to rendering issues. Meanwhile, MiniMax M2.7 delivered a near-winning design for just $0.04 (50x cheaper).

2. The Best Value in AI Right Now

DeepSeek V3.2 produced a solid 7.9/10 website for just $0.006 (less than a penny). That’s a production-quality website for the cost of rounding error.

3. Free Models Are Surprisingly Good

Step 3.5 Flash (free) scored 7.35/10 � a perfectly usable professional website. Nemotron 3 Super (also free) scored 7.0/10. You can build legitimate websites with free AI models in 2026.

4. Speed King: Gemini 3 Flash

Finished in 47 seconds and scored 8.35/10 (3rd place). Best combination of speed and quality.

5. The Content Trap

Models that hit their token limit (32K tokens) didn’t always produce better results. Some created verbose CSS while others used those tokens for richer content. Token efficiency matters.

6. Flagship Models Can Disappoint

Both Claude Opus 4.6 and GLM 5 Turbo � premium-priced models � underperformed significantly. Their pages had rendering issues and empty sections. More expensive doesn’t mean better at every task.

Value Rankings (Score per Dollar)

Rank	Model	Score/$	Score	Cost
1	Step 3.5 Flash	FREE	7.35	$0.00
2	Nemotron 3 Super	FREE	7.00	$0.00
3	DeepSeek V3.2	1,317 pts/$	7.90	$0.006
4	Grok 4.1 Fast	1,067 pts/$	6.40	$0.006
5	MiniMax M2.5	308 pts/$	8.00	$0.026
6	Gemini 3 Flash	298 pts/$	8.35	$0.028
7	MiniMax M2.7	221 pts/$	8.60	$0.039
8	MiMo V2 Omni	220 pts/$	7.70	$0.035
9	MiMo V2 Pro	111 pts/$	7.75	$0.070
10	GLM 5 Turbo	42 pts/$	5.40	$0.129
11	Claude Sonnet 4.6	18 pts/$	8.80	$0.482
12	Claude Opus 4.6	7 pts/$	5.80	$0.804

Methodology

Prompt: Identical for all 12 models � build a Sydney plumbing company homepage + emergency services page
Requirements: Self-contained HTML, inline CSS, no frameworks, Google Fonts, responsive design
Scoring: Visual design (25%), layout (20%), typography (15%), content (15%), UX (10%), responsiveness (10%), code quality (5%)
API: All models accessed via OpenRouter API
Token limit: 32,000 max tokens per model
Total competition cost: ~$1.62 across all 12 models

Competition run on March 30, 2026 using OpenRouter API. All models used with default temperature (0.7).