v2.0.1 Analysis Engine
View Live Deployment

From Raw PGN to
Cognitive Metrics

ChessMetrics is a bespoke analytics pipeline built to audit chess performance beyond standard win/loss ratios. By leveraging Stockfish 16, Polyglot Hashing, and custom heuristic algorithms, the system decomposes raw game data into actionable insights on opening preparation and endgame precision.

Language Python 3.11
Engine Stockfish 16 (UCI)
Database MongoDB Atlas
Library python-chess
01

Opening Identification

Implementation: Zobrist Hashing via Polyglot

Detecting "Book Moves" (established theory) is computationally non-trivial due to transpositions. A naive string comparison of moves fails because different move orders can reach the exact same board state.

Sequence A: 1. d4 Nf6 2. c4
Sequence B: 1. c4 Nf6 2. d4
Identical Board State

To solve this, I utilize Polyglot (.bin) files. The engine generates a 64-bit Zobrist Hash of the current board position and performs a binary search O(log n) against the book file. This decouples the "state" from the "history," allowing for 100% accurate identification regardless of move order.

analysis.py Zobrist Lookup Implementation
def analyze_book_status(board, reader):
    """
    Checks if the current board state exists in the master opening book.
    Uses Zobrist hashing implicitly via the python-chess Polyglot reader.
    """
    try:
        # reader.find() computes the Zobrist hash of 'board' 
        # and seeks it in the binary tree of the .bin file.
        entry = reader.find(board)
        return True
    except IndexError:
        return False

# In the main loop:
if in_book and not new_in_book:
    # Critical Transition Point detected
    in_book = False
    left_book_by = "white" if is_white_move else "black"
    # This specific move index marks the end of "Theory"
    # and the start of "Novelty" (or error).
Engineering Takeaway By identifying the exact ply where in_book becomes False, we can isolate the "Transition Phase"—the 5-10 moves immediately following theory. This is often where amateur games are decided.
02

The Accuracy Model

Algorithm: Exponential Decay Normalization

Raw engine evaluations use Centipawns (cp), where 100 cp ≈ 1 Pawn. However, a linear mapping of CP to Accuracy fails to model human perception. A blunder of -500cp (losing a Rook) is fatal, but the difference between -5000cp and -5500cp is irrelevant—the game is already lost.

The Formula

Accuracy = 103.1668 * e^(-0.01 * cpl) - 3.1669
  • Input (cpl): Centipawn Loss
  • 0 cpl: 100% Accuracy
  • 100 cpl: ~35% Accuracy

This formula creates a sharp penalty curve. Small inaccuracies (10-30 cp) reduce score slightly, but any significant blunder drops the accuracy to near zero immediately. The constants (103.1668, 3.1669) are derived to normalize the output range strictly between 0 and 100.

analysis.py Non-Linear Accuracy Calculation
def calc_accuracy(centipawn_loss, last_eval, current_eval):
    last_score = last_eval["score"].white()
    current_score = current_eval["score"].white()

    # Edge Case: Forced Mates are perfect play
    if last_score.is_mate() or current_score.is_mate():
        return handle_forced_mate_accuracy(...)

    # Core Formula implementation
    # We use math.exp to create the decay curve
    accuracy = 103.1668 * math.exp(-0.01 * centipawn_loss) - 3.1669
    
    # Clamp results to ensure valid 0-100 range
    return max(0, min(100, accuracy))

def classify_move(cpl, played_best, best_gap):
    # Categorization buckets based on CPL
    if played_best and best_gap > 300: return "Brilliancy" # !!
    if played_best: return "Best"
    elif cpl < 50: return "Great"      # !
    elif cpl < 100: return "Good"
    elif cpl < 200: return "Inaccuracy" # ?!
    elif cpl < 1000: return "Mistake"   # ?
    else: return "Blunder"             # ??
03

Dynamic Phase Detection

Algorithm: Material & Move Count Heuristic

Defining when the "Middlegame" ends is subjective. To automate this, I implemented a heuristic function that evaluates the board state based on piece density and player rating.

Opening

Depends on move count + rating.

moves <= 8 + (rating // 600)
Middlegame

Queens present + Active pieces.

queens >= 1 && pieces > 6
Endgame

Low material density.

else
analysis.py Phase Heuristic
def game_phase(board: chess.Board, rating, state) -> str:
    if state == "Endgame": return state # Endgames rarely revert

    # Count High-Value Pieces (Queen, Rook, Bishop, Knight)
    pieces = sum(len(board.pieces(p, c)) 
                 for p in [QUEEN, ROOK, BISHOP, KNIGHT] 
                 for c in [WHITE, BLACK])
    
    queens = len(board.pieces(QUEEN, WHITE)) + 
             len(board.pieces(QUEEN, BLACK))

    # Dynamic Opening Length
    # Higher rated players play longer theory
    if board.fullmove_number <= 8 + (rating // 600) and pieces > 12:
         return "Opening"
         
    elif queens >= 1 and pieces > 6:
         return "Middlegame"
         
    return "Endgame"
04

Data Aggregation & Insights

Pipeline: MongoDB -> Python -> JSON

data.py

Result Normalization

Distinguishing Skill Wins (Checkmate) from Clock Wins (Timeout). High timeout rates indicate "flagging" rather than positional dominance.

if 'timeout' in termination:
  return 'timeout'
analysis.py

Blunder Database

The system serializes the FEN (Forsyth-Edwards Notation) of every move classified as a "Blunder". This creates a personalized puzzle database for tactical training.

blunders.append({ "fen": fen, "eval": eval })
data.py

Follow-up Accuracy

Isolates the first 5 moves after leaving opening theory. This metric reveals if the player understands the plan or was just memorizing lines.

seq_moves = moves[left_idx : left_idx + 5]