Skip to content

Latest commit

 

History

History
256 lines (186 loc) · 8.91 KB

File metadata and controls

256 lines (186 loc) · 8.91 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Build and Test Commands

# Using Makefile (recommended)
make build            # Build the application
make test             # Run all tests
make test-race        # Run tests with race detector
make coverage         # Generate HTML coverage report
make lint             # Run golangci-lint
make security         # Run gosec security scan
make sbom             # Generate SBOM (CycloneDX format)
make all              # Lint, test, and build
make install-tools    # Install dev tools (golangci-lint, gosec, cyclonedx-gomod)

# Direct commands (for specific cases)
go test -v -run TestFunctionName ./path/to/package  # Single test
go test -v -coverprofile=coverage.txt ./...         # Coverage profile

Module

  • Module: fjacquet/camt-csv, Go 1.24+
  • PDF support requires poppler-utils (pdftotext CLI)

Architecture Overview

This is a Go CLI tool that converts financial statement formats (CAMT.053 XML, PDF, Revolut CSV, Selma CSV) into standardized CSV with AI-powered categorization.

CAMT File Format (ISO 20022)

The camt parser handles CAMT.053 (Bank to Customer Statement) files:

  • Namespace: urn:iso:std:iso:20022:tech:xsd:camt.053.001.02
  • Standard: ISO 20022
  • Structure defined in: internal/models/iso20022.go

Supported CAMT Types:

graph TD
    A["<b>CAMT Types</b>"]
    B["<b>CAMT.052</b><br/>Bank to Customer Account Report<br/>❌ No"]
    C["<b>CAMT.053</b><br/>Bank to Customer Statement<br/>✓ Yes v001.02"]
    D["<b>CAMT.054</b><br/>Bank to Customer Debit/Credit Notification<br/>❌ No"]
    A --> B
    A --> C
    A --> D
Loading

Known Limitations:

  • Only version 001.02 tested (newer versions may have additional fields)
  • No strict namespace validation (will attempt to parse any XML with matching structure)
  • Swiss bank-specific extensions may not be fully supported

Key Design Patterns

Parser Factory Pattern: Parsers implement segregated interfaces in internal/parser/parser.go:

type Parser interface {
    Parse(r io.Reader) ([]Transaction, error)
}

type Validator interface {
    ValidateFormat(filePath string) (bool, error)
}

type CSVConverter interface {
    ConvertToCSV(inputFile, outputFile string) error
}

type LoggerConfigurable interface {
    SetLogger(logger logging.Logger)
}

type CategorizerConfigurable interface {
    SetCategorizer(categorizer models.TransactionCategorizer)
}

type BatchConverter interface {
    BatchConvert(inputDir, outputDir string) (int, error)
}

// FullParser combines all capabilities
type FullParser interface {
    Parser
    Validator
    CSVConverter
    LoggerConfigurable
    CategorizerConfigurable
    BatchConverter
}

New parsers are registered in internal/factory/factory.go. Important: CLI commands should get parsers from the DI Container (root.GetContainer().GetParser()), not directly from the factory, to ensure categorizers are properly wired.

Four-Tier Categorization (internal/categorizer/):

  1. Direct mapping - exact match from database/creditors.yaml / database/debitors.yaml
  2. Keyword matching - rules from database/categories.yaml
  3. Semantic search - vector embedding similarity via Gemini embeddings (semantic_strategy.go)
  4. AI fallback - Gemini API via AIClient interface (testable abstraction)

When --auto-learn is enabled, AI results save directly to YAML files. When disabled (default), suggestions go to staging files (database/staging_creditors.yaml, database/staging_debtors.yaml) for manual review.

Output Formatter Registry (internal/formatter/formatter.go):

  • "standard" - 29-column, comma-delimited (backward-compatible)
  • "icompta" - 10-column, semicolon-delimited, dd.MM.yyyy dates

CLI usage: --format standard or --format icompta. New formatters: implement OutputFormatter interface, register via registry.Register("name", formatter).

Command Lifecycle (Cobra hooks in cmd/root/root.go):

  1. PersistentPreRun - Loads config, creates DI container
  2. Command RunE - Gets parser via root.GetContainer().GetParser(type)
  3. PersistentPostRun - Saves learned creditor/debtor mappings to YAML

Directory Structure

  • cmd/ - Cobra CLI commands (camt, pdf, batch, categorize, revolut, revolut-crypto, selma, debit, revolut-investment)
  • internal/ - Core application logic:
    • *parser/ packages - Format-specific parsers with adapter.go implementing the interface
    • categorizer/ - Transaction categorization with AI integration
    • models/ - Core data structures (Transaction, Category, Parser interface)
    • config/ - Viper-based hierarchical configuration
    • store/ - YAML category database management
    • common/ - Shared CSV utilities
  • database/ - YAML configuration files for categorization rules

Configuration Hierarchy

Configuration loads in order (later overrides earlier):

  1. Config file: ~/.camt-csv/camt-csv.yaml or .camt-csv/config.yaml
  2. Environment variables (see mapping below)
  3. CLI flags: --log-level, --ai-enabled, etc.

Environment Variable Mapping:

graph LR
    A["<b>Config Key</b>"]
    B["log.level<br/>→ CAMT_LOG_LEVEL<br/>→ --log-level"]
    C["ai.enabled<br/>→ CAMT_AI_ENABLED<br/>→ --ai-enabled"]
    D["ai.model<br/>→ CAMT_AI_MODEL"]
    E["ai.api_key<br/>→ GEMINI_API_KEY"]
    A --> B
    A --> C
    A --> D
    A --> E
Loading

Note: The .env file is auto-loaded from the current directory.

Testing Conventions

  • Use t.TempDir() for file system tests
  • Set TEST_MODE=true to disable real AI API calls
  • Use SetTestCategoryStore() to inject mock stores in categorizer tests
  • Each parser has _test.go with table-driven tests

Adding a New Parser

  1. Create package in internal/{name}parser/
  2. Implement core parsing in {name}parser.go
  3. Create adapter implementing parser.FullParser in adapter.go
  4. Register in internal/factory/factory.go
  5. Add CLI command in cmd/{name}/convert.go
  6. Wire command in main.go

Coding Principles

Detailed patterns and examples: See .claude/skills/golang-expert/ for comprehensive Go patterns including functional programming, interface design, testing, concurrency, error handling, and performance optimization.

Core Principles

  1. KISS - Prefer the simplest solution. No abstraction until needed (Rule of Three).
  2. DRY - Single source of truth. Extract after 3 repetitions.
  3. No Global Mutable State - Use dependency injection via Container.
  4. Immutability - Private fields with getters, return new values.
  5. Pure Functions - Same input = same output, no side effects.
  6. Interface Segregation - Small, focused interfaces composed when needed.

Dependency Injection

All dependencies flow through the Container (internal/container/):

container, err := container.NewContainer(cfg)
logger := container.GetLogger()
parser, _ := container.GetParser(container.CAMT)
categorizer := container.GetCategorizer()

Changelog Management

IMPORTANT: Update CHANGELOG.md for every significant change.

When to Update

Update the changelog when you:

  • Add new features or commands
  • Fix bugs
  • Make breaking changes
  • Change configuration options
  • Modify public APIs or interfaces
  • Add/remove dependencies
  • Make security-related changes

How to Update

  1. Add entries under ## [Unreleased] section

  2. Use the appropriate category:

    • Added - new features
    • Changed - changes in existing functionality
    • Deprecated - soon-to-be removed features
    • Removed - removed features
    • Fixed - bug fixes
    • Security - vulnerability fixes
  3. Write entries in imperative mood: "Add feature" not "Added feature"

  4. Reference issues/PRs when relevant: "Fix parsing error (#123)"

Release Process

When creating a release:

  1. Change ## [Unreleased] to ## [X.Y.Z] - YYYY-MM-DD
  2. Add new empty ## [Unreleased] section above
  3. Update comparison links at bottom of file
  4. Follow semver: breaking=major, features=minor, fixes=patch

AI Provider Configuration

  • OpenRouter model IDs have NO openrouter/ prefix: use mistralai/mistral-small-2603 not openrouter/mistralai/...
  • Default ai.requests_per_minute: 5 is correct for personal finance batch workloads (cost control)
  • cleanCategory handles verbose AI responses: multi-line (takes last line) and **bold** anywhere in string

Categorization Architecture

  • Parser-internal categories (e.g., Selma investment types) are preserved — categorization_helper skips external categorizer when tx.Category is already set and non-empty
  • Selma trade transactions: PartyName = Fund ISIN, Description = Buy/Sell <ISIN>

Security / Linting

  • Never use math/rand — Semgrep blocks it (CWE-338); use time.Now().UnixNano() for non-security jitter

Testing Gotchas

  • Config error message strings in tests go stale after refactors — use assert.Contains with short substrings