Currently, the Edge AI features depend on the object dataset
and model
.
This proposal provides the definitions of dataset and model as the first class of k8s resources.
dataset
and model
objects.The truly format of the AI dataset
, such as imagenet
, coco
or tf-record
etc.
The truly format of the AI model
, such as ckpt
, saved_model
of tensorflow etc.
The truly operations of the AI dataset
, such as shuffle
, crop
etc.
The truly operations of the AI model
, such as train
, inference
etc.
We propose using Kubernetes Custom Resource Definitions (CRDs) to describe
the dataset/model specification/status and a controller to synchronize these updates between edge and cloud.
dataset url
, format
and the nodeName
which owns the dataset.model url
and format
.The Dataset
and Model
CRDs will be namespace-scoped.
The tables below summarize the group, kind and API version details for the CRDs.
Field | Description |
---|---|
Group | sedna.io |
APIVersion | v1alpha1 |
Kind | Dataset |
Field | Description |
---|---|
Group | sedna.io |
APIVersion | v1alpha1 |
Kind | Model |
Dataset
CRDapiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: datasets.sedna.io
spec:
group: sedna.io
names:
kind: Dataset
plural: datasets
scope: Namespaced
versions:
- name: v1alpha1
subresources:
# status enables the status subresource.
status: {}
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required:
- url
- format
properties:
url:
type: string
format:
type: string
nodeName:
type: string
status:
type: object
properties:
numberOfSamples:
type: integer
updateTime:
type: string
format: datatime
additionalPrinterColumns:
- name: NumberOfSamples
type: integer
description: The number of samples in the dataset
jsonPath: ".status.numberOfSamples"
- name: Node
type: string
description: The node name of the dataset
jsonPath: ".spec.nodeName"
- name: spec
type: string
description: The spec of the dataset
jsonPath: ".spec"
format
of datasetWe use this field to report the number of samples for the dataset and do dataset splitting.
Current we support these below formats:
Model
CRDapiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: models.sedna.io
spec:
group: sedna.io
names:
kind: Model
plural: models
scope: Namespaced
versions:
- name: v1alpha1
subresources:
# status enables the status subresource.
status: {}
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required:
- url
- format
properties:
url:
type: string
format:
type: string
status:
type: object
properties:
updateTime:
type: string
format: datetime
metrics:
type: array
items:
type: object
properties:
key:
type: string
value:
type: string
additionalPrinterColumns:
- name: updateAGE
type: date
description: The update age
jsonPath: ".status.updateTime"
- name: metrics
type: string
description: The metrics
jsonPath: ".status.metrics"
Dataset
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// +genclient
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// Dataset describes the data that a dataset resource should have
type Dataset struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec DatasetSpec `json:"spec"`
Status DatasetStatus `json:"status"`
}
// DatasetSpec is a description of a dataset
type DatasetSpec struct {
URL string `json:"url"`
Format string `json:"format"`
NodeName string `json:"nodeName"`
}
// DatasetStatus represents information about the status of a dataset
// including the time a dataset updated, and number of samples in a dataset
type DatasetStatus struct {
UpdateTime *metav1.Time `json:"updateTime,omitempty" protobuf:"bytes,1,opt,name=updateTime"`
NumberOfSamples int `json:"numberOfSamples"`
}
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// DatasetList is a list of Datasets
type DatasetList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata"`
Items []Dataset `json:"items"`
}
Model
package v1alpha1
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// +genclient
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// Model describes the data that a model resource should have
type Model struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ModelSpec `json:"spec"`
Status ModelStatus `json:"status"`
}
// ModelSpec is a description of a model
type ModelSpec struct {
URL string `json:"url"`
Format string `json:"format"`
}
// ModelStatus represents information about the status of a model
// including the time a model updated, and metrics in a model
type ModelStatus struct {
UpdateTime *metav1.Time `json:"updateTime,omitempty" protobuf:"bytes,1,opt,name=updateTime"`
Metrics []Metric `json:"metrics,omitempty" protobuf:"bytes,2,rep,name=metrics"`
}
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// ModelList is a list of Models
type ModelList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata"`
Items []Model `json:"items"`
}
Dataset
apiVersion: sedna.io/v1alpha1
kind: Dataset
metadata:
name: "dataset-examp"
spec:
url: "/code/data"
format: "txt"
nodeName: "edge0"
Model
apiVersion: sedna.io/v1alpha1
kind: Model
metadata:
name: model-examp
spec:
url: "/model/frozen.pb"
format: pb
In the current design there is downstream/upstream controller for dataset
, no downstream/upstream controller for model
.
The dataset controller synchronizes the dataset between the cloud and edge.
Here is the flow of the dataset creation:
For the model:
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》