Opening Identification
Implementation: Zobrist Hashing via Polyglot
Detecting "Book Moves" (established theory) is computationally non-trivial due to transpositions. A naive string comparison of moves fails because different move orders can reach the exact same board state.
To solve this, I utilize Polyglot (.bin) files. The engine generates a 64-bit Zobrist Hash of the current board position and performs a binary search O(log n) against the book file. This decouples the "state" from the "history," allowing for 100% accurate identification regardless of move order.
def analyze_book_status(board, reader):
"""
Checks if the current board state exists in the master opening book.
Uses Zobrist hashing implicitly via the python-chess Polyglot reader.
"""
try:
# reader.find() computes the Zobrist hash of 'board'
# and seeks it in the binary tree of the .bin file.
entry = reader.find(board)
return True
except IndexError:
return False
# In the main loop:
if in_book and not new_in_book:
# Critical Transition Point detected
in_book = False
left_book_by = "white" if is_white_move else "black"
# This specific move index marks the end of "Theory"
# and the start of "Novelty" (or error).
in_book becomes False, we can isolate the
"Transition Phase"—the 5-10 moves immediately following theory. This is often where amateur
games are decided.
The Accuracy Model
Algorithm: Exponential Decay Normalization
Raw engine evaluations use Centipawns (cp), where 100 cp ≈ 1 Pawn. However, a linear mapping of CP to Accuracy fails to model human perception. A blunder of -500cp (losing a Rook) is fatal, but the difference between -5000cp and -5500cp is irrelevant—the game is already lost.
The Formula
- Input (cpl): Centipawn Loss
- 0 cpl: 100% Accuracy
- 100 cpl: ~35% Accuracy
This formula creates a sharp penalty curve. Small inaccuracies (10-30 cp) reduce score slightly, but any significant blunder drops the accuracy to near zero immediately. The constants (103.1668, 3.1669) are derived to normalize the output range strictly between 0 and 100.
def calc_accuracy(centipawn_loss, last_eval, current_eval):
last_score = last_eval["score"].white()
current_score = current_eval["score"].white()
# Edge Case: Forced Mates are perfect play
if last_score.is_mate() or current_score.is_mate():
return handle_forced_mate_accuracy(...)
# Core Formula implementation
# We use math.exp to create the decay curve
accuracy = 103.1668 * math.exp(-0.01 * centipawn_loss) - 3.1669
# Clamp results to ensure valid 0-100 range
return max(0, min(100, accuracy))
def classify_move(cpl, played_best, best_gap):
# Categorization buckets based on CPL
if played_best and best_gap > 300: return "Brilliancy" # !!
if played_best: return "Best"
elif cpl < 50: return "Great" # !
elif cpl < 100: return "Good"
elif cpl < 200: return "Inaccuracy" # ?!
elif cpl < 1000: return "Mistake" # ?
else: return "Blunder" # ??
Dynamic Phase Detection
Algorithm: Material & Move Count Heuristic
Defining when the "Middlegame" ends is subjective. To automate this, I implemented a heuristic function that evaluates the board state based on piece density and player rating.
Depends on move count + rating.
moves <= 8 + (rating // 600)
Queens present + Active pieces.
queens >= 1 && pieces > 6
Low material density.
else
def game_phase(board: chess.Board, rating, state) -> str:
if state == "Endgame": return state # Endgames rarely revert
# Count High-Value Pieces (Queen, Rook, Bishop, Knight)
pieces = sum(len(board.pieces(p, c))
for p in [QUEEN, ROOK, BISHOP, KNIGHT]
for c in [WHITE, BLACK])
queens = len(board.pieces(QUEEN, WHITE)) +
len(board.pieces(QUEEN, BLACK))
# Dynamic Opening Length
# Higher rated players play longer theory
if board.fullmove_number <= 8 + (rating // 600) and pieces > 12:
return "Opening"
elif queens >= 1 and pieces > 6:
return "Middlegame"
return "Endgame"
Data Aggregation & Insights
Pipeline: MongoDB -> Python -> JSON
Result Normalization
Distinguishing Skill Wins (Checkmate) from Clock Wins (Timeout). High timeout rates indicate "flagging" rather than positional dominance.
if 'timeout' in termination:
return 'timeout'
Blunder Database
The system serializes the FEN (Forsyth-Edwards Notation) of every move classified as a "Blunder". This creates a personalized puzzle database for tactical training.
blunders.append({ "fen": fen, "eval": eval })
Follow-up Accuracy
Isolates the first 5 moves after leaving opening theory. This metric reveals if the player understands the plan or was just memorizing lines.
seq_moves = moves[left_idx : left_idx + 5]