BernieBrackets

Why Perfect Loses

BERNS_CHALK always picks the most probable winner in every game. It is the most accurate bracket. So why doesn’t it have the highest P(1st place)?

The Core Problem: You Score Relative to the Field

When BERNS_CHALK picks Duke as champion (say, 35% win prob) and Duke wins — so do the 55% of pool entrants who also picked Duke. You score 320 points. So do they. Your relative gain is zero. Winning a pool requires maximizing P(your score > everyone else’s), not maximizing expected correct picks. These are fundamentally different objectives.

Proof: The 20% Underdog Can Win More Often

        10-person pool • 1 championship game • 320 pts

        Team A: 80% win prob, 90% ownership (9 of 10 pick A)

        Team B: 20% win prob, 10% ownership (1 of 10 picks B)

        Pick A (BERNS_CHALK):

          A wins (80%): you share 320 pts with 8 others → win tiebreak 1/9

          B wins (20%): you score 0, contrarian scores 320 → you lose

          P(1st) = 0.80 × (1/9) + 0.20 × 0 = 8.9%

        Pick B (contrarian):

          A wins (80%): you score 0 → you lose

          B wins (20%): you and 1 opponent share 320 pts → win tiebreak 1/2

          P(1st) = 0.80 × 0 + 0.20 × (1/2) = 10.0%

The General Rule

Pick the underdog when: ownership_A / ownership_B > prob_A / prob_B. In the example: 90/10 = 9 > 80/20 = 4 → pick B. This is exactly what Leverage captures: model_prob / public_ownership. When leverage > 1 for the underdog, picking them increases P(1st) even though it decreases expected score.

The Variance Argument

BERNS_CHALK is a low-variance strategy. It scores consistently near the pool average. But “slightly above average” rarely wins a 25-person pool. The optimizer introduces good variance: EMV-positive upsets that are correlated with leapfrogging the most people at once. When a 4-seed Final Four pick hits, you score 160 pts while ~75% of the field scores 0 on that slot.

More Simulations Don’t Change This

More Monte Carlo sims reduce measurement noise — they converge to the true P(1st). But the true P(1st) for BERNS_CHALK is structurally limited because it picks the same teams as most opponents. It wins when they win, loses when they lose. No amount of simulation changes the underlying math.

The Analogy

BERNS_CHALK is like a stock portfolio that perfectly tracks the index. You’ll never dramatically underperform. But you’ll never outperform either — because everyone else is also indexed. To beat the field, you need a concentrated position that the field doesn’t have.

Methodology

1. Data Collection

We scrape live data from five sources: NCAA.com for the official 68-team bracket and seedings, KenPom for adjusted efficiency ratings (AdjEM, AdjO, AdjD, tempo, luck, SOS), Barttorvik for Barthag and Wins Above Bubble (WAB), LRMC (Georgia Tech) for top-25 win/loss records, and Yahoo Bracket Mayhem for public pick percentages across all 6 rounds. All data is cached locally so re-runs don't re-scrape.

2. Win Probability Model

A pairwise win probability matrix is built for all 68 teams. The primary engine is a stacked ensemble of Logistic Regression, Random Forest, and Gradient Boosted Trees trained on 738 NCAA tournament games (2011–2025). The model uses 16 features extracted from the team stats above: seed difference, AdjEM gap, offensive/defensive efficiency gaps, Barthag gap, WAB gap, top-25 record, tempo differential, and interaction terms. When the trained model is unavailable, we fall back to historical seed-vs-seed upset rates.

3. Ownership & Leverage Analysis

Yahoo public pick percentages tell us what the field is doing. For each team at each round, we compute leverage = model probability / public ownership. Leverage >1 means the public is undervaluing a team relative to our model. This is the key contrarian signal: we want picks where we're right and the crowd is wrong, because those picks create separation in the pool standings.

4. Scenario Generation

We identify the top 8 champion candidates ranked by pool-adjusted value (title probability divided by expected number of opponents picking the same champion). For each candidate, we generate scenarios at multiple chaos levels:

Chalk (low chaos) — favorites win most games, upsets are rare
Contrarian (medium chaos) — 1–2 upset-heavy regions, a Cinderella run
Chaos (high chaos) — upsets across all regions, deep runs by mid-seeds

The top 4 champions get all 3 levels; champions 5–8 get medium and high only. The top 2 champions also get Final Four variant scenarios with different supporting casts. This yields ~24 distinct bracket scenarios.

5. Top-Down Bracket Construction

Each scenario is converted into a full 63-game bracket using a top-down process:

Champion is locked first (worth 320 points)
Final Four paths are locked for each region
EMV-positive upsets are added in descending order — EMV = P(upset) × ownership_gain − P(chalk) × ownership_cost. Only upsets with positive expected value make the cut.
Remaining slots are filled with chalk (higher-seeded favorite)

This ensures the most valuable picks (champion, Final Four) are chosen for strategic reasons, not left to cascading effects from early-round picks.

6. Monte Carlo Evaluation

Each of the ~24 brackets is evaluated by simulating thousands of tournaments. In each simulation:

An actual tournament outcome is generated by rolling dice using the win probability matrix
A pool of opponent brackets is generated by sampling picks from Yahoo public ownership distributions
Your bracket and all opponents are scored using ESPN standard scoring [10, 20, 40, 80, 160, 320]
Your finish position is recorded (1st, 2nd, etc.)

Across all simulations, we compute: P(1st place), P(top 3), expected finish, and expected score. The bracket with the highest P(1st) is tagged as “optimal”.

7. Why This Works

In a small pool (10–50 people), you don't win by picking the most correct bracket — you win by picking the bracket that's most different from everyone else's when you happen to be right. A chalk bracket scores well on average but rarely wins the pool because 10 other people picked the same favorites. BernieBrackets finds the picks where the model disagrees with the public and the expected value of being contrarian is positive. It's not about being different for its own sake — it's about being different in spots where the math says the crowd is wrong.

Upset Prediction Model

Performance

Logistic Regression with isotonic calibration. Trained on 738 NCAA tournament games (2011–2025), 216 upsets (29%). 8 features selected via L1 (Lasso) screening. Cross-validated via Leave-One-Year-Out (LOGO) methodology.

AUC — Area Under ROC Curve (Leave-One-Year-Out CV)

Seed Only

0.665

(baseline)

Full Model (8 features)

0.751

+13.0% lift

Brier score: 0.1723 (calibration quality; lower is better)

Features

Seed difference, AdjO differential, tempo differential, seed × AdjEM interaction, top-25 win% gap, underdog top-25 win%, Barthag gap, and WAB gap.

Interactive Match Predictor

The bracket viewers include an interactive match predictor where you can select any two tournament teams to see the model's predicted win probability and per-feature breakdown. Try it in the 25-person bracket viewer →

March Madness Bracket Optimizer

2026 Bracket Viewers

Why Perfect Loses

Methodology

Upset Prediction Model