Skip to content

phinnphace/eugenics-footnotes

Repository files navigation

eugenics-footnotes The Iris Dataset: An Accidental Case Study in Pedagogy, Methodology, and Real-World Implications

The Iris dataset is ubiquitous in introductory statistics and machine learning. It is typically presented as a parable of clean categorical separation. Its documentation explicitly states that two of the three species overlap — yet it serves as the canonical example of perfect classification in statistics and machine learning education.

That contradiction is not accidental. It reveals how statistical pedagogy systematically erases empirical complexity to preserve the myth of clean categorical boundaries. And it opens a longer question: what else gets erased, and whose interests does that erasure serve?

This project began as a course assignment and became something else. Two analysts worked independently on the same dataset and arrived at divergent findings — not because one was wrong, but because their training shaped what each looked for. That divergence became the case study itself. Read the Analysis

Working paper: https://tinyurl.com/IrisConsulting (redacted for collaborator privacy)

Read the Analysis

Data source UCI ML Iris Dataset https://archive.ics.uci.edu/dataset/53/iris Exploratory Data Analysis RPubs https://rpubs.com/marksonp/1376907 Consultant's Analysis (redacted) RPubs https://rpubs.com/marksonp/Consultants_redacted Reanalysis RPubs [(https://rpubs.com/marksonp/Iris_reanalysis) Full sequence tinyurl.com/IrisConsulting

What This Is This is a methodological critique, not a machine learning tutorial. The Iris dataset is the entry point — Fisher's dataset, Fisher's legacy, and the statistical frameworks forged in an era when eugenics, typological thinking, and colonial science were dominant paradigms.

The core argument: the tools we use to observe and interpret the world are not neutral. When we prioritize algorithmic separation over diagnostic complexity, we risk producing models that are mathematically accurate but structurally false. That logic — deviation from a presumed standard — did not stay in the lab.

This paper is submitted in its working state. Critique is welcome. That is the point.

Author's Note The consultant's name has been redacted for privacy and professional courtesy. This case study examines two parallel analytical workflows as expressions of distinct STEM training lineages, not the individuals who performed them. The consultant reviewed and approved broader dissemination of this work in anonymized form.

About

How the Iris Dataset is still "canonical" in ML, intro stats and CS without any discussion of how R.A. Fishers beliefs influence methodology. .

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors