Skip to content

Latest commit

 

History

History
42 lines (32 loc) · 1.08 KB

File metadata and controls

42 lines (32 loc) · 1.08 KB

K-Means Analysis Minimal

A minimal project to run k-means split analysis with metadata.

Files

  • explore_splits.py - Main analysis script
  • splits/ - Split data files
    • k_means_split.json - K-means clustering splits
    • intersect_attributes.json - Available attributes list
  • data/ - Required data files
    • all_filepath.txt - Image file paths
    • labels_images.txt - Image labels
    • mcrae-norms-grouped-with-concepts.json - Feature to concepts mapping
    • things/ - Concept metadata
      • concepts-and-categories.json - Concept definitions and categories
      • Categories_final_20200131.tsv - Supercategory mappings

Installation

pip install -r requirements.txt

Usage

python explore_splits.py

This will analyze the "has_wheels" attribute in the k-means splits and show:

  • Split statistics (train/test distribution)
  • Positive/negative sample counts
  • Sample examples with metadata
  • Supercategory distribution
  • Summary statistics

Dependencies

  • numpy
  • pandas
  • Standard Python libraries (json, csv, pathlib, collections, typing)