Skip to content

RFC: 008 Environment auto validation #778

@burtenshaw

Description

@burtenshaw

RFC008 Environment auto validation

Contributors: @zkwentz, @jspisak, @Darktex

Context & Framing

OpenEnv launched in October 2025 as a joint project between Hugging Face and Meta. It has since grown to dozens of supporters and over 4k environments on the hub.

This unprecedented growth makes it increasingly hard to discern which environments are high quality and worth adding to a model training collection. Now is the time to implement auto-validation for environments submitted to the OpenEnv hub, to:

  1. Define a clear bar for what a high-quality environment is
  2. Drive up quality broadly
  3. Give the community a scalable way to evaluate their environments — think hackathons!

Summary

Taking a frontier lab view, an environment is only useful for improving models and generalizing capabilities if the following are true:

  1. The environment scales on infra and how well.
  2. The environment is learnable e.g. can a model hill-climb on it after O(100s) steps?
  3. The environment is secure.
  4. The environment isn’t prone to reward hacking.

Criteria

The following acceptance tests validate the image build and delivery pipeline against its source requirements. Each test pairs a concrete pass condition with the requirement it traces back to, spanning build determinism, image composition, format compatibility, runtime startup behavior, and delivery/provenance guarantees. A run is considered conformant only when every test below meets its pass condition.

  • Reproducible build - Two builds from the same source produce an identical digest. (Build determinism)
  • Layer-change isolation - An app-file change moves only the top 1–2 layers. (Layer ordering)
  • Multi-stage hygiene - No build tools, .git, or fixtures appear in the final image. (What goes in / stays out)
  • Archive-free layout - No archive blobs above the threshold, and no single oversize binary. (What goes in / stays out + Hot-path locality)
  • Conversion clean - Both nydusify and ctr-remote estargz conversions succeed. (Format and compatibility)
  • Time-to-first-useful-work - The container starts before the full image pull completes. (Hot-path locality)
  • Composition inspection - dive / nydus-image inspect shows no duplicated large files and reasonable chunk boundaries. (Layer structure)
  • Signature + SBOM - cosign verification passes; an SBOM is attached and parseable. (Delivery)
  • OCI labels - All required labels are populated. (Delivery)
  • Resource declarations - Task metadata includes a CPU / memory / storage budget within sanity bounds. (Per-task resources)
  • Periodic learnability - A short (~500-step) training run produces signal on a known model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions