API Reference

Packages

kubeflow.org/v1

kubeflow.org/v1

Package v1 is the v1 version of the API.

Package v1 contains API Schema definitions for the kubeflow.org v1 API group

Resource Types

MPIJob
MPIJobList
MXJob
MXJobList
PaddleJob
PaddleJobList
PyTorchJob
PyTorchJobList
TFJob
TFJobList
XGBoostJob
XGBoostJobList

Definitions

ElasticPolicy

Appears In:

PyTorchJobSpec

Field	Description
`minReplicas` integer	minReplicas is the lower limit for the number of replicas to which the training job can scale down. It defaults to null.
`maxReplicas` integer	upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas, defaults to null.
`rdzvBackend` RDZVBackend
`rdzvPort` integer
`rdzvHost` string
`rdzvId` string
`rdzvConf` RDZVConf array	RDZVConf contains additional rendezvous configuration (<key1>=<value1>,<key2>=<value2>,…).
`standalone` boolean	Start a local standalone rendezvous backend that is represented by a C10d TCP store on port 29400. Useful when launching single-node, multi-worker job. If specified --rdzv_backend, --rdzv_endpoint, --rdzv_id are auto-assigned; any explicitly set values are ignored.
`nProcPerNode` integer	Number of workers per node; supported values: [auto, cpu, gpu, int]. Deprecated: This API is deprecated in v1.7+ Use .spec.nprocPerNode instead.
`maxRestarts` integer
`metrics` MetricSpec array	Metrics contains the specifications which are used to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated with multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the HPA will not be created.

JobCondition

JobCondition describes the state of the job at a certain point.

Appears In:

JobStatus

Field	Description
`type` JobConditionType	Type of job condition.
`status` ConditionStatus	Status of the condition, one of True, False, Unknown.
`reason` string	The reason for the condition’s last transition.
`message` string	A human readable message indicating details about the transition.
`lastUpdateTime` Time	The last time this condition was updated.
`lastTransitionTime` Time	Last time the condition transitioned from one status to another.

JobConditionType (string)

JobConditionType defines all kinds of types of JobStatus.

Appears In:

JobCondition

JobModeType (string)

JobModeType id the type for JobMode

Appears In:

MXJobSpec

JobStatus

JobStatus represents the current observed state of the training Job.

Appears In:

MPIJob
MXJob
PaddleJob
PyTorchJob
TFJob
XGBoostJob

Field	Description
`conditions` JobCondition array	Conditions is an array of current observed job conditions.
`replicaStatuses` object (keys:ReplicaType, values:ReplicaStatus)	ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica.
`startTime` Time	Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.
`completionTime` Time	Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.
`lastReconcileTime` Time	Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.

MPIJob

Appears In:

MPIJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`MPIJob`
`TypeMeta` TypeMeta
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` MPIJobSpec
`status` JobStatus

MPIJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`MPIJobList`
`TypeMeta` TypeMeta
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` MPIJob array

MPIJobSpec

Appears In:

MPIJob

Field	Description
`slotsPerWorker` integer	Specifies the number of slots per worker used in hostfile. Defaults to 1.
`cleanPodPolicy` CleanPodPolicy	CleanPodPolicy defines the policy that whether to kill pods after the job completes. Defaults to None.
`mpiReplicaSpecs` object (keys:ReplicaType, values:ReplicaSpec)	`MPIReplicaSpecs` contains maps from `MPIReplicaType` to `ReplicaSpec` that specify the MPI replicas to run.
`mainContainer` string	MainContainer specifies name of the main container which executes the MPI code.
`runPolicy` RunPolicy	`RunPolicy` encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.

MXJob

MXJob is the Schema for the mxjobs API

Appears In:

MXJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`MXJob`
`TypeMeta` TypeMeta
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` MXJobSpec
`status` JobStatus

MXJobList

MXJobList contains a list of MXJob

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`MXJobList`
`TypeMeta` TypeMeta
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` MXJob array

MXJobSpec

MXJobSpec defines the desired state of MXJob

Appears In:

MXJob

Field	Description
`runPolicy` RunPolicy	RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
`jobMode` JobModeType	JobMode specify the kind of MXjob to do. Different mode may have different MXReplicaSpecs request
`mxReplicaSpecs` object (keys:ReplicaType, values:ReplicaSpec)	MXReplicaSpecs is map of ReplicaType and ReplicaSpec specifies the MX replicas to run. For example, { "Scheduler": ReplicaSpec, "Server": ReplicaSpec, "Worker": ReplicaSpec, }

PaddleElasticPolicy

Appears In:

PaddleJobSpec

Field	Description
`minReplicas` integer	minReplicas is the lower limit for the number of replicas to which the training job can scale down. It defaults to null.
`maxReplicas` integer	upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas, defaults to null.
`maxRestarts` integer	MaxRestarts is the limit for restart times of pods in elastic mode.
`metrics` MetricSpec array	Metrics contains the specifications which are used to calculate the desired replica count (the maximum replica count across all metrics will be used). The desired replica count is calculated with multiplying the ratio between the target value and the current value by the current number of pods. Ergo, metrics used must decrease as the pod count is increased, and vice-versa. See the individual metric source types for more information about how each type of metric must respond. If not set, the HPA will not be created.

PaddleJob

PaddleJob Represents a PaddleJob resource.

Appears In:

PaddleJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`PaddleJob`
`TypeMeta` TypeMeta	Standard Kubernetes type metadata.
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` PaddleJobSpec	Specification of the desired state of the PaddleJob.
`status` JobStatus	Most recently observed status of the PaddleJob. Read-only (modified by the system).

PaddleJobList

PaddleJobList is a list of PaddleJobs.

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`PaddleJobList`
`TypeMeta` TypeMeta	Standard type metadata.
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` PaddleJob array	List of PaddleJobs.

PaddleJobSpec

PaddleJobSpec is a desired state description of the PaddleJob.

Appears In:

PaddleJob

Field	Description
`runPolicy` RunPolicy	RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
`elasticPolicy` PaddleElasticPolicy	ElasticPolicy holds the elastic policy for paddle job.
`paddleReplicaSpecs` object (keys:ReplicaType, values:ReplicaSpec)	A map of PaddleReplicaType (type) to ReplicaSpec (value). Specifies the Paddle cluster configuration. For example, { "Master": PaddleReplicaSpec, "Worker": PaddleReplicaSpec, }

PyTorchJob

PyTorchJob Represents a PyTorchJob resource.

Appears In:

PyTorchJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`PyTorchJob`
`TypeMeta` TypeMeta	Standard Kubernetes type metadata.
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` PyTorchJobSpec	Specification of the desired state of the PyTorchJob.
`status` JobStatus	Most recently observed status of the PyTorchJob. Read-only (modified by the system).

PyTorchJobList

PyTorchJobList is a list of PyTorchJobs.

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`PyTorchJobList`
`TypeMeta` TypeMeta	Standard type metadata.
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` PyTorchJob array	List of PyTorchJobs.

PyTorchJobSpec

PyTorchJobSpec is a desired state description of the PyTorchJob.

Appears In:

PyTorchJob

Field	Description
`runPolicy` RunPolicy	RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
`elasticPolicy` ElasticPolicy
`pytorchReplicaSpecs` object (keys:ReplicaType, values:ReplicaSpec)	A map of PyTorchReplicaType (type) to ReplicaSpec (value). Specifies the PyTorch cluster configuration. For example, { "Master": PyTorchReplicaSpec, "Worker": PyTorchReplicaSpec, }
`nprocPerNode` string	Number of workers per node; supported values: [auto, cpu, gpu, int]. For more, https://github.com/pytorch/pytorch/blob/26f7f470df64d90e092081e39507e4ac751f55d6/torch/distributed/run.py#L629-L658. Defaults to auto.

RDZVBackend (string)

Appears In:

ElasticPolicy

RDZVConf

Appears In:

ElasticPolicy

Field	Description
`key` string
`value` string

ReplicaSpec

ReplicaSpec is a description of the replica

Appears In:

MPIJobSpec
MXJobSpec
PaddleJobSpec
PyTorchJobSpec
TFJobSpec
XGBoostJobSpec

Field	Description
`replicas` integer	Replicas is the desired number of replicas of the given template. If unspecified, defaults to 1.
`template` PodTemplateSpec	Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec
`restartPolicy` RestartPolicy	Restart policy for all replicas within the job. One of Always, OnFailure, Never and ExitCode. Default to Never.

ReplicaStatus

ReplicaStatus represents the current observed state of the replica.

Appears In:

JobStatus

Field	Description
`active` integer	The number of actively running pods.
`succeeded` integer	The number of pods which reached phase Succeeded.
`failed` integer	The number of pods which reached phase Failed.
`labelSelector` LabelSelector	Deprecated: Use Selector instead
`selector` string	A Selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty Selector matches all objects. A null Selector matches no objects.

ReplicaType (string)

ReplicaType represents the type of the replica. Each operator needs to define its own set of ReplicaTypes.

Appears In:

JobStatus
MPIJobSpec
MXJobSpec
PaddleJobSpec
PyTorchJobSpec
TFJobSpec
XGBoostJobSpec

RestartPolicy (string)

RestartPolicy describes how the replicas should be restarted. Only one of the following restart policies may be specified. If none of the following policies is specified, the default one is RestartPolicyAlways.

Appears In:

ReplicaSpec

RunPolicy

RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.

Appears In:

MPIJobSpec
MXJobSpec
PaddleJobSpec
PyTorchJobSpec
TFJobSpec
XGBoostJobSpec

Field	Description
`cleanPodPolicy` CleanPodPolicy	CleanPodPolicy defines the policy to kill pods after the job completes. Default to None.
`ttlSecondsAfterFinished` integer	TTLSecondsAfterFinished is the TTL to clean up jobs. It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Default to infinite.
`activeDeadlineSeconds` integer	Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer.
`backoffLimit` integer	Optional number of retries before marking this job failed.
`schedulingPolicy` SchedulingPolicy	SchedulingPolicy defines the policy related to scheduling, e.g. gang-scheduling
`suspend` boolean	suspend specifies whether the Job controller should create Pods or not. If a Job is created with suspend set to true, no Pods are created by the Job controller. If a Job is suspended after creation (i.e. the flag goes from false to true), the Job controller will delete all active Pods and PodGroups associated with this Job. Users must design their workload to gracefully handle this. Suspending a Job will reset the StartTime field of the Job. Defaults to false.

SchedulingPolicy

SchedulingPolicy encapsulates various scheduling policies of the distributed training job, for example minAvailable for gang-scheduling.

Appears In:

RunPolicy

Field	Description
`minAvailable` integer
`queue` string
`minResources` Quantity
`priorityClass` string
`scheduleTimeoutSeconds` integer

SuccessPolicy (string)

SuccessPolicy is the success policy.

Appears In:

TFJobSpec

TFJob

TFJob represents a TFJob resource.

Appears In:

TFJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`TFJob`
`TypeMeta` TypeMeta	Standard Kubernetes type metadata.
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` TFJobSpec	Specification of the desired state of the TFJob.
`status` JobStatus	Most recently observed status of the TFJob. Populated by the system. Read-only.

TFJobList

TFJobList is a list of TFJobs.

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`TFJobList`
`TypeMeta` TypeMeta	Standard type metadata.
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` TFJob array	List of TFJobs.

TFJobSpec

TFJobSpec is a desired state description of the TFJob.

Appears In:

TFJob

Field	Description
`runPolicy` RunPolicy	RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.
`successPolicy` SuccessPolicy	SuccessPolicy defines the policy to mark the TFJob as succeeded. Default to "", using the default rules.
`tfReplicaSpecs` object (keys:ReplicaType, values:ReplicaSpec)	A map of TFReplicaType (type) to ReplicaSpec (value). Specifies the TF cluster configuration. For example, { "PS": ReplicaSpec, "Worker": ReplicaSpec, }
`enableDynamicWorker` boolean	A switch to enable dynamic worker

XGBoostJob

XGBoostJob is the Schema for the xgboostjobs API

Appears In:

XGBoostJobList

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`XGBoostJob`
`TypeMeta` TypeMeta
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` XGBoostJobSpec
`status` JobStatus

XGBoostJobList

XGBoostJobList contains a list of XGBoostJob

Field	Description
`apiVersion` string	`kubeflow.org/v1`
`kind` string	`XGBoostJobList`
`TypeMeta` TypeMeta
`metadata` ListMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`items` XGBoostJob array

XGBoostJobSpec

XGBoostJobSpec defines the desired state of XGBoostJob

Appears In:

XGBoostJob

Field	Description
`runPolicy` RunPolicy	INSERT ADDITIONAL SPEC FIELDS - desired state of cluster Important: Run "make" to regenerate code after modifying this file
`xgbReplicaSpecs` object (keys:ReplicaType, values:ReplicaSpec)

Files

kubeflow.org_v1_generated.asciidoc

Latest commit

History

kubeflow.org_v1_generated.asciidoc

File metadata and controls

API Reference

kubeflow.org/v1

Definitions

ElasticPolicy

JobCondition

JobConditionType (string)

JobModeType (string)

JobStatus

MPIJob

MPIJobList

MPIJobSpec

MXJob

MXJobList

MXJobSpec

PaddleElasticPolicy

PaddleJob

PaddleJobList

PaddleJobSpec

PyTorchJob

PyTorchJobList

PyTorchJobSpec

RDZVBackend (string)

RDZVConf

ReplicaSpec

ReplicaStatus

ReplicaType (string)

RestartPolicy (string)

RunPolicy

SchedulingPolicy

SuccessPolicy (string)

TFJob

TFJobList

TFJobSpec

XGBoostJob

XGBoostJobList

XGBoostJobSpec