Skip to content

Feature List: TOSS4: 2020 Q2

Stephen Herbein edited this page Jun 14, 2019 · 3 revisions

What We Will Provide

System Instance

  • Limit KVS Content Growth
    • Garbage Collect after Restart
  • Tolerate Compute Nodes Down
  • Drain Nodes
  • Detect/Monitor Nodes/Resources Up/Down
  • User Management
    • Admin role
    • Dynamically add/remove users (watch /etc/passwd?)
  • Configuration files

Execution System

  • I/O to/from files with file per process
  • Multi-prog support (MPMD)
  • pty support
  • affinity/mapping
  • Jobspec + R -> local
  • Environment
  • Debugger support
    • MPIR
    • Distributed Sync
    • Co-locating processes
  • Launch OpenMPI 3.1+
  • PMI
  • job completion log
    • simple append interface
    • offline & online query (x-post w/ porcelain)
  • real job shell
  • signal jobs (x-post w/ porcelain)

Job Submission

  • Job Priorities (x-post w/ bank/accounting)
  • Job Dependencies
  • Job Feasibility
    • Ingest plugin to ensure job request is not larger than cluster can provide
    • Job request abides by QoS limits

Resource Management

  • Query available/allocated/down resources (x-post w/ porcelain)
  • Resource configuration language
  • Resource discovery vs config file
  • Connect to WhatsUp
    • Provide kvs key with idset of "up" nodes

Porcelain

  • List jobs in queue order with filtering
  • Run/submit
  • scheduler front-end work
  • alter job priorities
    • hold
    • cancel
    • expediate
  • query completed jobs (x-post w/ execution system)
  • Transition Tools
    • flux srun
  • signal jobs (x-post w/ execution system)
  • Resource status summary tool (x-post w/ resource management)
  • User guides for transitions to Flux commands

Bank/Accounting

  • Specify bank on submission
  • Tools/storage for EOY analysis
  • User permissions
  • Fair-Share Scheduling
  • Job Priorities (x-post w/ job submission)
  • Slurm Database

Resource Matching Integration w/ Exec System

  • Resource matching interfaces w/ new exec system
  • Scheduler ? support

Sched Optimization and Resiliency

  • Scheduler performance optimization
  • Scheduler resiliency improvements
    • Support unload/load via job manager
  • Scheduler memory optimization
  • Planner optimization

Support for Queues & Partitions

  • Queue Equivalent (e.g., job tags)
    • W/ policy support (e.g., wall time limit)

ATDM L2 Milestones

  • Power Monitoring
    • monitoring support for job-level power/perf data
    • from various databases
  • Tools Interface
  • Storage ???
    • Burst Buffer support w/in simulator
    • Add stage-in/out support in jobspec
    • Data staging flux module
  • GPU

Security

  • IMP + Contain
  • IMP PAM Support
  • IMP Prolog/Epilog support

What We Will NOT Initially Provide

  • Fully-baked, bulletproof resiliency
    • Node loss within a job allocation will result in job failure
    • Crash/loss of management node will result in running jobs (i.e., they will be killed)
  • Scheduling
    • Resources besides nodes/cores/gpus
    • Standby Jobs
    • Pre-emption
    • Email Notification
    • Job Requeue
    • Modifying job properties post-submission (e.g., walltime, num nodes/cores, queue)
    • Providing "reasons" for job not currently running