TensorFusionWorkload

TensorFusionWorkload is the Schema for the tensorfusionworkloads API.

Kubernetes Resource Information

Field	Value
API Version	tensor-fusion.ai/v1
Kind	TensorFusionWorkload
Scope	Namespaced

API Information
Spec
Status

Spec

WorkloadProfileSpec defines the desired state of WorkloadProfile.

Property	Type	Constraints	Description
autoScalingConfig ↓	object		AutoScalingConfig configured here will override Pool's schedulingConfig This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation, user can set tensor-fusion.ai/auto-limits\|requests\|replicas: 'true'
gpuCount	integer<int32>		The number of GPUs to be used by the workload, default to 1
gpuModel	string		GPUModel specifies the required GPU model (e.g., "A100", "H100")
isLocalGPU	boolean		Schedule the workload to the same GPU server that runs vGPU worker for best performance, default to false
nodeAffinity ↓	object		NodeAffinity specifies the node affinity requirements for the workload
poolName	string
qos	string	low medium high critical	Qos defines the quality of service level for the client.
replicas	integer<int32>		If replicas not set, it will be dynamic based on pending Pod If isLocalGPU set to true, replicas must be dynamic, and this field will be ignored
resources ↓	object

autoScalingConfig

AutoScalingConfig configured here will override Pool's schedulingConfig
This field can not be fully supported in annotation, if user want to enable auto-scaling in annotation,
user can set tensor-fusion.ai/auto-limits|requests|replicas: 'true'

Properties

Property	Type	Description
autoSetLimits ↓	object	layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly VPA-like, aggregate metrics data <1m
autoSetReplicas ↓	object	layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)
autoSetRequests ↓	object	layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks

autoSetLimits

layer 1 vertical auto-scaling, turbo burst to existing GPU cards quickly
VPA-like, aggregate metrics data <1m

Properties

Property	Type	Description
enable	boolean
evaluationPeriod	string
extraTFlopsBufferRatio	string
ignoredDeltaRange	string
maxRatioToRequests	string	the multiplier of requests, to avoid limit set too high, like 5.0
prediction ↓	object
scaleUpStep	string
targetResource	string	target resource to scale limits, such as "tflops", "vram", or "all" by default

prediction

Properties

Property	Type	Constraints	Description
enable	boolean
historyDataPeriod	string
model	string
predictionPeriod	string

autoSetReplicas

layer 2 horizontal auto-scaling, scale up to more GPU cards if max limits threshold hit
HPA-like, aggregate metrics data 1m-1h (when tf-worker scaled-up, should also trigger client pod's owner[Deployment etc.]'s replica increasing, check if KNative works)

Properties

Property	Type	Constraints	Description
enable	boolean
evaluationPeriod	string
scaleDownCoolDownTime	string
scaleDownStep	string
scaleUpCoolDownTime	string
scaleUpStep	string
targetTFlopsOfLimits	string

autoSetRequests

layer 3 adjusting, to match the actual usage in the long run, only for N:M remote vGPU mode, not impl yet
Adjust baseline requests to match the actual usage in longer period, such as 1day - 2weeks

Properties

Property	Type	Description
aggregationPeriod	string
enable	boolean
evaluationPeriod	string
extraBufferRatio	string	the request buffer ratio, for example actual usage is 1.0, 10% buffer will be 1.1 as final preferred requests
percentileForAutoRequests	string
prediction ↓	object
targetResource	string	target resource to scale requests, such as "tflops", "vram", or "all" by default

prediction

Properties

Property	Type	Constraints	Description
enable	boolean
historyDataPeriod	string
model	string
predictionPeriod	string

nodeAffinity

NodeAffinity specifies the node affinity requirements for the workload

Properties

Property	Type	Constraints	Description
preferredDuringSchedulingIgnoredDuringExecution ↓	array		The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred.
requiredDuringSchedulingIgnoredDuringExecution ↓	object		If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node.

preferredDuringSchedulingIgnoredDuringExecution (items)

The scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.

Properties

Property	Type	Constraints	Description
preference ↓	object		A node selector term, associated with the corresponding weight.
weight	integer<int32>		Weight associated with matching the corresponding nodeSelectorTerm, in the range 1-100.

preference

A node selector term, associated with the corresponding weight.

Properties

Property	Type	Constraints	Description
matchExpressions ↓	array		A list of node selector requirements by node's labels.
matchFields ↓	array		A list of node selector requirements by node's fields.

matchExpressions (items)

A list of node selector requirements by node's labels.

Properties

Property	Type	Description
key	string	The label key that the selector applies to.
operator	string	Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
values	array	An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

matchFields (items)

A list of node selector requirements by node's fields.

Properties

Property	Type	Description
key	string	The label key that the selector applies to.
operator	string	Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
values	array	An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

requiredDuringSchedulingIgnoredDuringExecution

If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.

Properties

Property	Type	Constraints	Description
nodeSelectorTerms ↓	array		Required. A list of node selector terms. The terms are ORed.

nodeSelectorTerms (items)

Required. A list of node selector terms. The terms are ORed.

Properties

Property	Type	Constraints	Description
matchExpressions ↓	array		A list of node selector requirements by node's labels.
matchFields ↓	array		A list of node selector requirements by node's fields.

matchExpressions (items)

A list of node selector requirements by node's labels.

Properties

Property	Type	Description
key	string	The label key that the selector applies to.
operator	string	Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
values	array	An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

matchFields (items)

A list of node selector requirements by node's fields.

Properties

Property	Type	Description
key	string	The label key that the selector applies to.
operator	string	Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
values	array	An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

resources

Properties

Property	Type	Constraints	Description
limits ↓	object
requests ↓	object

limits

Properties

Property	Type	Constraints	Description
tflops	any	pattern: Regex
vram	any	pattern: Regex

requests

Properties

Property	Type	Constraints	Description
tflops	any	pattern: Regex
vram	any	pattern: Regex

Status

TensorFusionWorkloadStatus defines the observed state of TensorFusionWorkload.

Property	Type	Constraints	Description
conditions ↓	array		Represents the latest available observations of the workload's current state.
phase	string	Pending Running Failed Unknown	Default: `Pending`
podTemplateHash	string		Hash of the pod template used to create worker pods
readyWorkers	integer<int32>		readyWorkers is the number of vGPU workers ready
workerCount	integer<int32>		workerCount is the number of vGPU workers

conditions (items)

Represents the latest available observations of the workload's current state.

Properties

Property	Type	Constraints	Description
lastTransitionTime	string<date-time>		lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
message	string	maxLength: 32768	message is a human readable message indicating details about the transition. This may be an empty string.
observedGeneration	integer<int64>	min: 0	observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.
reason	string	minLength: 1 maxLength: 1024 pattern: Regex	reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.
status	string	True False Unknown	status of the condition, one of True, False, Unknown.
type	string	maxLength: 316 pattern: Regex	type of condition in CamelCase or in foo.example.com/CamelCase.

TensorFusionWorkload ​

Kubernetes Resource Information ​

Table of Contents ​

Spec ​

autoScalingConfig ​

Properties ​

autoSetLimits ​

Properties ​

prediction ​

Properties ​

autoSetReplicas ​

Properties ​

autoSetRequests ​

Properties ​

prediction ​

Properties ​

nodeAffinity ​

Properties ​

preferredDuringSchedulingIgnoredDuringExecution (items) ​

Properties ​

preference ​

Properties ​

matchExpressions (items) ​

Properties ​

matchFields (items) ​

Properties ​

requiredDuringSchedulingIgnoredDuringExecution ​

Properties ​

nodeSelectorTerms (items) ​

Properties ​

matchExpressions (items) ​

Properties ​

matchFields (items) ​

Properties ​

resources ​

Properties ​

limits ​

Properties ​

requests ​

Properties ​

Status ​

conditions (items) ​

Properties ​

TensorFusionWorkload

Kubernetes Resource Information

Table of Contents

Spec

autoScalingConfig

Properties

autoSetLimits

Properties

prediction

Properties

autoSetReplicas

Properties

autoSetRequests

Properties

prediction

Properties

nodeAffinity

Properties

preferredDuringSchedulingIgnoredDuringExecution (items)

Properties

preference

Properties

matchExpressions (items)

Properties

matchFields (items)

Properties

requiredDuringSchedulingIgnoredDuringExecution

Properties

nodeSelectorTerms (items)

Properties

matchExpressions (items)

Properties

matchFields (items)

Properties

resources

Properties

limits

Properties

requests

Properties

Status

conditions (items)

Properties