GitHub - phinnphace/eugenics-footnotes: How the Iris Dataset is still "canonical" in ML, intro stats and CS without any discussion of how R.A. Fishers beliefs influence methodology. .

eugenics-footnotes The Iris Dataset: An Accidental Case Study in Pedagogy, Methodology, and Real-World Implications

The Iris dataset is ubiquitous in introductory statistics and machine learning. It is typically presented as a parable of clean categorical separation. Its documentation explicitly states that two of the three species overlap — yet it serves as the canonical example of perfect classification in statistics and machine learning education.

That contradiction is not accidental. It reveals how statistical pedagogy systematically erases empirical complexity to preserve the myth of clean categorical boundaries. And it opens a longer question: what else gets erased, and whose interests does that erasure serve?

This project began as a course assignment and became something else. Two analysts worked independently on the same dataset and arrived at divergent findings — not because one was wrong, but because their training shaped what each looked for. That divergence became the case study itself. Read the Analysis

Working paper: https://tinyurl.com/IrisConsulting (redacted for collaborator privacy)

Read the Analysis

Data source UCI ML Iris Dataset https://archive.ics.uci.edu/dataset/53/iris Exploratory Data Analysis RPubs https://rpubs.com/marksonp/1376907 Consultant's Analysis (redacted) RPubs https://rpubs.com/marksonp/Consultants_redacted Reanalysis RPubs [(https://rpubs.com/marksonp/Iris_reanalysis) Full sequence tinyurl.com/IrisConsulting

What This Is This is a methodological critique, not a machine learning tutorial. The Iris dataset is the entry point — Fisher's dataset, Fisher's legacy, and the statistical frameworks forged in an era when eugenics, typological thinking, and colonial science were dominant paradigms.

The core argument: the tools we use to observe and interpret the world are not neutral. When we prioritize algorithmic separation over diagnostic complexity, we risk producing models that are mathematically accurate but structurally false. That logic — deviation from a presumed standard — did not stay in the lab.

This paper is submitted in its working state. Critique is welcome. That is the point.

Author's Note The consultant's name has been redacted for privacy and professional courtesy. This case study examines two parallel analytical workflows as expressions of distinct STEM training lineages, not the individuals who performed them. The consultant reviewed and approved broader dissemination of this work in anonymized form.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MarksonP_Iris_CaseStudy_Final2025 (5).pdf		MarksonP_Iris_CaseStudy_Final2025 (5).pdf
README.md		README.md
The Iris Dataset: An accidental Case Study in pedagogy, methodology and real-world implications		The Iris Dataset: An accidental Case Study in pedagogy, methodology and real-world implications
eugenics-footnotes_README.md		eugenics-footnotes_README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages