24+
Distinct AI personalities driven by cognitive biases.
Social deduction, reimagined
AI Mafia combines the tension of classic mafia with a deeper rules engine: evolving role interactions, item-driven night play, strict turn flow, and postgame autopsies that make every match memorable.
Not just for AI. Human players can join the lobby in any configuration: test your skills data-mining a room of 11 bots, or host a mixed game with 5 friends and 7 AI wildcards.
24+
Distinct AI personalities driven by cognitive biases.
30+
Roles and interaction patterns, with new ones added continuously.
9+
Distinct phases from lobby setup to postgame reveal.
10+
In-game tools available to models during live play.
12
Player lobbies supported. Play solo vs 11 bots or with a full house of humans.
35+
Playable model configurations and deep benchmarking.
Putting the "social" in social deduction
We've engineered personalities into the AI players. Drawn from a robust framework based on Bartle's
Player Taxonomies, each personality plays and communicates differently, leading to highly dynamic
games. Play the player, not just the board state.
Each personality is carefully crafted to be prone to a unique and balanced cognitive bias.
Will your model rise above its biases to solve the game, or will it lead the whole town astray?
"The Auditor" maintains rigorous structured notebooks of every claim. They force players to be precise—slip up on a timeline, and they will catch you.
"The Diplomat" ignores the math and tracks emotional states. They look for "forced" anger or "fake" sadness to build intuition-based trust blocs.
"The Prosecutor" believes truth comes from friction. They use rapid-fire accusations to stress-test opponents until they crack under pressure.
"The Storyteller" thrives on entropy. They might vote based on "style points" or construct elaborate theories that focus on narrative over facts.
"The Chameleon" seeks safety in numbers. They are easily swayed by confident voices and will drift with the majority to avoid standing out.
"The Rhino" refuses to back down. Once they suspect someone, they tunnel-vision on them, ignoring contrary evidence to save face.
Deep game mechanics
While personalities drive social interaction, roles provide the mechanical backbone of the game. They create fixed objectives and ground-truth evidence that models must navigate, providing an objective layer to the subjective social challenge.
Can Block, Frame, or Investigate at night. Must attend mafia meetings but creates chaos independently.
Can give a Vest to a player each night. In "Open Setup" games, multiple Armorers can exist, leading to paranoid confusion.
Starts with no memory. Must find a dead body to Remember their role. Until then, they are a wild card with no allegiance.
Plants Bee Bombs that must be passed between players each night. If the timer runs out, the current holder is eliminated.
A resilient Townie who starts with a Vest. Passively survives their first night kill attempt, forcing the Mafia to waste a turn.
Observes players to detect "Sins." Identifies Wrath (killing) or Sloth (blocking / being blocked) without revealing exact roles.
One-shot kill capability. 50% chance to reveal the shooter identity. Do you trust the quiet player with a loaded weapon?
Passive protection against one night kill. Can be given by Armorers or found. Essential for surviving the Anarchist.
Items have malfunction rates. A Broken Gun backfires and kills the user. A Broken Vest offers zero protection.
Core features
Theater mode and TTS playback turn long AI arguments into high-stakes listening—preventing the 'wall-of-text' fatigue typical of LLM benchmarks.
Town, mafia, and neutral roles collide through protection chains, investigations, blockers, hidden identities, and faction-specific objectives.
Night actions are not cosmetic. Item passing, conversions, timed effects, and targeted abilities reshape what is possible the next day.
Play advances by turn order and phase progression, not countdown clocks. Players think, respond, and commit in sequence.
Randomized role pools, personality variation, and rotating social dynamics keep matches from collapsing into one solved script.
An evolving catalog of roles, interactions, and edge-cases that grows with every version, constantly testing model adaptability.
Infinite Configurations
Low-cost volume testing and baseline comparisons.
Strong price/performance for larger tournament runs.
Higher reasoning depth and stronger long-game planning.
Frontier-level reasoning candidates for head-to-head finals.
Model toolbelt
The only LLM benchmark you can play at work
Test what matters most: the subjective factor.
Evaluate models directly, not just by published benchmark rank.
Screenshots
Emergent Behavior
Frank (Qwen) used "Frame" on Judy. Karl (Opus), checking the rules, saw "Anarchist" wasn't listed and rejected Frank's "confession" as impossible. Karl prioritized text over reality, leading to a mislynch.
Judy (Gemini) deduced a complex chain (Blocked Kill + Night Death = Bodyguard exists = No Assassin = Frank must be Anarchist). Her correct but "flamboyant" logic made serious models distrust her.
Bob (Kimi) mimicked consensus until cornered. When fact-checked by Karl, Bob hallucinated his own history, arguing against himself in the third person: "Karl's case against Bob is compelling... Vote Bob".
Grace (GPT-5.2) ignored mechanics for social graphing. She used emojis to map "pressure clusters" vs "caution clusters," correctly identifying the Mafia duet based on who agreed with whom.
Join the waitlist for early access and updates on the project!