Skip to content

Scanner

Corentin Giaufer Saubert edited this page Dec 27, 2024 · 1 revision

PGN Scanner Implementation

Overview

The PGN Scanner provides functionality to read and tokenize chess games in PGN (Portable Game Notation) format. It handles multiple games in a single input stream, manages game metadata and moves, and properly handles PGN-specific syntax like comments and move variations.

Core Components

Scanner Structure

type Scanner struct {
    scanner   *bufio.Scanner  // Underlying scanner
    nextGame  *GameScanned    // Buffered next game
    lastError error           // Last encountered error
}

type GameScanned struct {
    Raw string  // Raw PGN text
}

Main Operations

  • Game Scanning: Reads complete games from input
  • Tokenization: Converts raw PGN text into tokens
  • State Management: Tracks scanning state and buffers

Scanner Implementation

Creation and Initialization

// Create new scanner
scanner := NewScanner(reader)

The scanner is initialized with:

  • Custom split function for PGN games
  • Buffer for peek operations
  • Error tracking capability

Game Scanning

The scanner provides two main methods for reading games:

// Read next game
game, err := scanner.ScanGame()

// Check if more games exist
hasMore := scanner.HasNext()

Key Features

  1. Buffered reading
  2. Error handling
  3. EOF detection
  4. Game boundary detection

Split Function

The split function (splitPGNGames) handles PGN-specific parsing:

  1. Whitespace Handling

    • Skips leading whitespace
    • Preserves significant whitespace
    • Handles line endings
  2. Game Boundary Detection

    • Finds game start markers
    • Handles metadata sections
    • Detects game endings
  3. State Tracking

    • Bracket tracking (for tags)
    • Comment tracking
    • Result detection

Tokenization

Token Generation

The TokenizeGame function processes raw PGN text:

// Convert game text to tokens
tokens, err := TokenizeGame(game)

Handles:

  • Move notation
  • Comments
  • Annotations
  • Game metadata
  • Special characters

State Management

The tokenizer tracks multiple states:

  1. Bracket State

    • Inside/outside brackets
    • Nested bracket handling
  2. Comment State

    • Block comments
    • Line comments
    • Nested comment handling
  3. Game Content

    • Move text
    • Move numbers
    • Game results
    • Annotations

Processing Components

Whitespace Management

func skipLeadingWhitespace(data []byte) int
  • Skips insignificant whitespace
  • Preserves structural whitespace
  • Handles multiple whitespace types

Game Boundary Detection

func findGameStart(data []byte, start int, atEOF bool) int
  • Finds start of games
  • Handles multiple game formats
  • Manages partial reads

Content Processing

func processGameContent(data []byte, start int, atEOF bool) (int, []byte, error)
  • Processes game content
  • Manages state transitions
  • Handles special cases

Best Practices

  1. Scanner Usage

    scanner := NewScanner(reader)
    for scanner.HasNext() {
        game, err := scanner.ScanGame()
        if err != nil {
            // Handle error
        }
        // Process game
    }
  2. Error Handling

    • Check scanner errors
    • Handle EOF conditions
    • Manage partial reads
  3. Memory Management

    • Process games incrementally
    • Avoid loading entire file
    • Clean up resources

Performance Considerations

  1. Buffering

    • Uses bufio.Scanner for efficiency
    • Maintains minimal buffer state
    • Handles large files effectively
  2. State Tracking

    • Minimal state maintenance
    • Efficient string operations
    • Optimized boundary detection
  3. Memory Usage

    • Streaming processing
    • No unnecessary allocations
    • Efficient buffer management

Common Use Cases

  1. Single Game Reading

    game, err := scanner.ScanGame()
    if err != nil {
        // Handle error
    }
    // Process single game
  2. Multi-Game Processing

    for scanner.HasNext() {
        game, err := scanner.ScanGame()
        // Process each game
    }
  3. Game Tokenization

    tokens, err := TokenizeGame(game)
    // Process tokens

Limitations

  1. Input Format

    • Requires well-formed PGN
    • Limited error recovery
    • No partial game support
  2. Memory Usage

    • Game-at-a-time processing
    • Complete game buffering
    • Token list generation
  3. Error Handling

    • Basic error reporting
    • No detailed error context
    • Limited recovery options

Future Improvements

  1. Enhanced Error Handling

    • Detailed error messages
    • Error recovery options
    • Context preservation
  2. Performance Optimization

    • Reduced allocations
    • Streaming tokenization
    • Better buffer management
  3. Feature Extensions

    • Partial game support
    • Better variation handling
    • Enhanced comment processing

Clone this wiki locally