@@ -0,0 +1,19 @@ | |||
# Auto detect text files and perform LF normalization | |||
* text=auto | |||
*.cs text diff=csharp | |||
*.java text diff=java | |||
*.html text diff=html | |||
*.py text diff=python | |||
*.pl text diff=perl | |||
*.pm text diff=perl | |||
*.css text eol=lf | |||
*.js text eol=lf | |||
*.sql text | |||
*.sh text eol=lf | |||
*.mustache text eol=lf | |||
*.bat text eol=crlf | |||
*.cmd text eol=crlf | |||
*.vcxproj text merge=union eol=crlf | |||
*.csproj text merge=union eol=crlf | |||
*.sln text merge=union eol=crlf | |||
*.tar.gz filter=lfs diff=lfs merge=lfs -text |
@@ -0,0 +1,21 @@ | |||
*.iml | |||
*.gv | |||
*.ipr | |||
*.iws | |||
*.orig | |||
*.rej | |||
*.sdf | |||
*.suo | |||
*.vcxproj.user | |||
*.log | |||
.idea | |||
.svn | |||
.classpath | |||
.project | |||
.settings | |||
.vscode | |||
target/ | |||
build/ | |||
out/ | |||
tmp/ | |||
dist/ |
@@ -18,9 +18,11 @@ OPENI supports GPU scheduling, a key requirement of deep learning job. | |||
For better performance, OPENI supports fine-grained topology-aware job placement that can request for the GPU with a specific location (e.g., under the same PCI-E switch). | |||
OPENI embraces a [microservices](https://en.wikipedia.org/wiki/Microservices) architecture: every component runs in a container. | |||
The system leverages [Kubernetes](https://kubernetes.io/) to deploy and manage static components in the system. | |||
The more dynamic deep learning jobs are scheduled and managed by [Hadoop](http://hadoop.apache.org/) YARN with our [GPU enhancement](https://issues.apache.org/jira/browse/YARN-7481). | |||
The training data and training results are stored in Hadoop HDFS. | |||
The system leverages [Kubernetes](https://kubernetes.io/) to deploy and manage system service. | |||
The latest version of OPENI,the scheduling engine of more dynamic deep learning jobs also uses Kubernetes, | |||
which enables system services and deep learning jobs to be scheduled and managed using Kubernetes. | |||
The storage of training data and results can be customized according to platform/equipment requirements. | |||
Jobs logs are collected by [Filebeat](https://www.elastic.co/cn/products/beats/filebeat) and stored in [Elasticsearch](https://www.elastic.co/cn/products/elasticsearch) cluster. | |||
## An Open AI Platform for R&D and Education | |||
@@ -44,7 +46,7 @@ OPENI operates in an open model: contributions from academia and industry are al | |||
### Prerequisite | |||
The system runs in a cluster of machines each equipped with one or multiple GPUs. | |||
Each machine in the cluster runs Ubuntu 16.04 LTS and has a statically assigned IP address. | |||
Each machine in the cluster runs Ubuntu 18.04 LTS and has a statically assigned IP address. | |||
To deploy services, the system further relies on a Docker registry service (e.g., [Docker hub](https://docs.docker.com/docker-hub/)) | |||
to store the Docker images for the services to be deployed. | |||
The system also requires a dev machine that runs in the same environment that has full access to the cluster. | |||
@@ -53,11 +55,10 @@ And the system need [NTP](http://www.ntp.org/) service for clock synchronization | |||
### Deployment process | |||
To deploy and use the system, the process consists of the following steps. | |||
1. Build the binary for [Hadoop AI](./hadoop-ai/README.md) and place it in the specified path* | |||
2. [Deploy kubernetes and system services](./openi-management/README.md) | |||
1. [Deploy Kubernetes 1.13 and system services](./openi-management/README.md) | |||
2. User Kubernetes to Deploy [FrameworkController](https://github.com/microsoft/frameworkcontroller) | |||
3. Access [web portal](./webportal/README.md) for job submission and cluster management | |||
\* If step 1 is skipped, a standard Hadoop 2.9.0 will be installed instead. | |||
#### Kubernetes deployment | |||
@@ -72,7 +73,7 @@ Please refer to service deployment [readme](./openi-management/README.md) for de | |||
#### Job management | |||
After system services have been deployed, user can access the web portal, a Web UI, for cluster management and job management. | |||
Please refer to this [tutorial](job-tutorial/README.md) for details about job submission. | |||
Please refer to this [tutorial](./user%20manual.pdf) for details about job submission. | |||
#### Cluster management | |||
@@ -88,12 +89,10 @@ The system architecture is illustrated above. | |||
User submits jobs or monitors cluster status through the [Web Portal](./webportal/README.md), | |||
which calls APIs provided by the [REST server](./rest-server/README.md). | |||
Third party tools can also call REST server directly for job management. | |||
Upon receiving API calls, the REST server coordinates with [FrameworkLauncher](./frameworklauncher/README.md) (short for Launcher) | |||
to perform job management. | |||
The Launcher Server handles requests from the REST Server and submits jobs to Hadoop YARN. | |||
The job, scheduled by YARN with [GPU enhancement](https://issues.apache.org/jira/browse/YARN-7481), | |||
can leverage GPUs in the cluster for deep learning computation. Other type of CPU based AI workloads or traditional big data job | |||
Upon receiving API calls, the REST server coordinates with k8s ApiServer, k8s Scheduler will schedule the job to k8s node with CPU,GPU and other resources. | |||
[FrameworkController](https://github.com/microsoft/frameworkcontroller) will monitor the job life cycle in k8s cluster. | |||
Restserver retrieve the status of jobs from k8s ApiServer, and its status can display on Web portal. | |||
Other type of CPU based AI workloads or traditional big data job | |||
can also run in the platform, coexisted with those GPU-based jobs. | |||
The platform leverages HDFS to store data. All jobs are assumed to support HDFS. | |||
All the static services (blue-lined box) are managed by Kubernetes, while jobs (purple-lined box) are managed by Hadoop YARN. | |||
The storage of training data and results can be customized according to platform/equipment requirements. | |||
@@ -18,9 +18,10 @@ OPENI支持在GPU集群中运行AI任务作业(比如深度学习任务作业 | |||
为了能得到更好的性能,OPENI支持细粒度的拓扑感知任务部署,可以获取到指定位置的GPU(比如获取在相同的PCI-E交换机下的GPU)。 | |||
启智采用[microservices](https://en.wikipedia.org/wiki/Microservices) 结构:每一个组件都在一个容器中运行。 | |||
平台利用[Kubernetes](https://kubernetes.io/) 来部署和管理系统中的静态组件。 | |||
其余动态的深度学习任务使用[Hadoop](http://hadoop.apache.org/) YARN和[GPU强化](https://issues.apache.org/jira/browse/YARN-7481)进行调度和管理。 | |||
训练数据和训练结果储存在Hadoop HDFS上。 | |||
平台利用[Kubernetes](https://kubernetes.io/) 来部署和管理系统服务。 | |||
平台的最新版本,动态的深度学习任务的调度引擎也使用Kubernetes,使得系统服务和深度学习任务都使用Kubernetes进行调度和管理。 | |||
训练数据和训练结果储存可根据平台/设备需求自定义。任务日志采用[Filebeat](https://www.elastic.co/cn/products/beats/filebeat)收集, | |||
[Elasticsearch](https://www.elastic.co/cn/products/elasticsearch)集群存储。 | |||
## 用于研发及教育的开源AI平台 | |||
@@ -44,18 +45,16 @@ OPENI以开源的模式运营:来自学术和工业界的贡献我们都非常 | |||
### 前提要求 | |||
该系统在一组机器集群上运行,每台机器都配有一块或多块GPU。 | |||
集群中的每台机器都运行Ubuntu 16.4 LTS,并有一个静态分配的IP地址。为了部署服务,系统进一步使用Docker注册服务 (例如[Docker hub](https://docs.docker.com/docker-hub/)) 来存储要部署的服务的Docker镜像。系统还需要一台可以完全访问集群的、运行有相同环境的开发机器。系统还需要[NTP](http://www.ntp.org/)服务进行时钟同步。 | |||
集群中的每台机器都运行Ubuntu 18.4 LTS,并有一个静态分配的IP地址。为了部署服务,系统进一步使用Docker注册服务 (例如[Docker hub](https://docs.docker.com/docker-hub/)) 来存储要部署的服务的Docker镜像。系统还需要一台可以完全访问集群的、运行有相同环境的开发机器。系统还需要[NTP](http://www.ntp.org/)服务进行时钟同步。 | |||
### 部署过程 | |||
执行以下几个步骤来部署和使用本系统。 | |||
1. 为[Hadoop AI](./hadoop-ai/README.md)构造二进制文件并将其放在指定路径中* | |||
2. [部署kubernetes和系统服务](./openi-management/README.md) | |||
1. [部署kubernetes 1.13和系统服务](./openi-management/README.md) | |||
2. 使用kubernetes部署[FrameworkController服务](https://github.com/microsoft/frameworkcontroller) | |||
3. 访问[web门户页面](./webportal/README.md) 进行任务提交和集群管理 | |||
\* 如果跳过步骤1,则将会安装标准版Hadoop 2.9.0。 | |||
#### Kubernetes部署 | |||
平台使用Kubernetes(k8s)来部署和管理系统服务。 | |||
@@ -69,7 +68,7 @@ OPENI以开源的模式运营:来自学术和工业界的贡献我们都非常 | |||
#### 作业管理 | |||
系统服务部署完成后, 用户可以访问Web门户页面(一个Web UI界面)来进行集群和作业管理。 | |||
关于任务作业的提交,请参阅[指南](job-tutorial/README.md)。 | |||
关于任务作业的提交,请参阅[指南](./user%20manual.pdf)。 | |||
#### 集群管理 | |||
@@ -78,12 +77,12 @@ Web门户上也提供了Web UI进行集群的管理。 | |||
## 系统结构 | |||
<p style="text-align: left;"> | |||
<img src="./sysarch-zh.png" title="System Architecture" alt="System Architecture" /> | |||
<img src="./sysarch.png" title="System Architecture" alt="System Architecture" /> | |||
</p> | |||
系统的整体结构如上图所示。 | |||
用户通过[Web门户](./webportal/README.md)提交了任务作业或集群状态监视的申请,该操作会调用[REST服务器](./rest-server/README.md)提供的API。 | |||
第三方工具也可以直接调用REST服务器进行作业管理。收到API调用后,REST服务器与[FrameworkLauncher](./frameworklauncher/README.md)(简称Launcher)协同工作来进行作业管理。Launcher服务器处理来自REST服务器的请求,并将任务作业提交到Hadoop YARN。由YARN和[GPU强化](https://issues.apache.org/jira/browse/YARN-7481)调度的作业, 可以使用集群中的GPU资源进行深度学习运算。其他基于CPU的AI工作或者传统的大数据任务作业也可以在平台上运行,与那些基于GPU的作业共存。 | |||
平台使用HDFS来存储数据。我们假设所有任务作业都支持HDFS。 所有静态服务(蓝色框)都由Kubernetes管理,而任务作业(紫色框)则由Hadoop YARN管理。 | |||
用户通过[Web门户](./webportal/README.md)提交了任务作业或集群状态监视的申请,该操作会调用[Restserver服务](./rest-server/README.md)提供的API。 | |||
第三方工具也可以直接调用Restserver服务进行作业管理。收到API调用后,Restserver服务会将任务作业提交到k8s ApiServer,k8s的调度引擎负责对任务作业进行调度,调度完成后任务就可以使用集群节点中的GPU资源进行深度学习运算。 | |||
[FrameworkController服务](https://github.com/microsoft/frameworkcontroller)负责监控任务作业在K8s集群中的生命周期。Restserver服务向k8s ApiServer获取任务的状态,并且Web网页可以展示在界面上。 | |||
其他基于CPU的AI工作或者传统的大数据任务作业也可以在平台上运行,与那些基于GPU的作业共存。平台训练数据和训练结果储存可根据平台/设备需求自定义。 | |||
@@ -0,0 +1,6 @@ | |||
上传镜像到harbor | |||
- docker login -u openi -p OpenI:192.168.202.102:5000 | |||
- docker tags cambricon/test/ubuntu:v4.1 192.168.202.74:5000/openi/cambricon-office-ubuntu:v0.4 | |||
- docker push 192.168.202.74:5000/openi/cambricon-office-ubuntu:v0.4 | |||
- docker tags cambricon-test2:v0.4 192.168.202.74:5000/openi/cambricon-Neuware:v0.4 | |||
- docker push 192.168.202.74:5000/openi/cambricon-Neuware:v0.4 |
@@ -0,0 +1,26 @@ | |||
本地k8s环境搭建(使用kubeadm) | |||
- 关闭防火墙 | |||
- systemctl stop firewalld | |||
- systemctl disable firewalld | |||
- 禁用Selinux | |||
- apt install selinux-utils | |||
- setenforce 0 | |||
- 安装指定版本的docker | |||
- 启动docker service | |||
- systemctl enable docker | |||
- systemctl start docker | |||
- systemctl status docker | |||
- 安装kubectl,kubelet,kubeadm | |||
- cat <<EOF >/etc/apt/sources.list.d/kubernetes.list | |||
- deb http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial main | |||
- EOF | |||
- 安装 | |||
- apt-get update && apt-get install -y kubelet kubeadm kubectl | |||
- systemctl enable kubelet | |||
- 配置master | |||
- export KUBECONFIG=/etc/kubernetes/admin.conf | |||
- 重起kubelet | |||
- systemctl daemon-reload | |||
- systemctl restart kubele | |||
- 在master节点上执行 | |||
- kubeadm init --pod-network-cidr=192.168.202.102/16 --apiserver-advertise-address=192.168.202.102 --kubernetes-version=v1.14.1 --ignore-preflight-errors=Swap |
@@ -0,0 +1,6 @@ | |||
一、部署FrameworkController环境 | |||
1、进入https://github.com/Microsoft/frameworkcontroller网址,然后点击Run Controller | |||
2、选择Run By Kubernetes StatefulSet,部署frameworkcontroller,具体操作见https://github.com/microsoft/frameworkcontroller/tree/master/example/run。 | |||
二、获取FrameworkController镜像 | |||
1、从这个网址https://hub.docker.com/r/yyrdl/frameworkcontroller,使用docker pull yyrdl/frameworkcontroller获取镜像。 | |||
2、从这个网址https://hub.docker.com/r/frameworkcontroller/frameworkbarrier,使用docker pull获取镜像。 |
@@ -0,0 +1,65 @@ | |||
# 使用frameworkcontroller管理pod,生成yaml文件如下: | |||
apiVersion: frameworkcontroller.microsoft.com/v1 | |||
kind: Framework | |||
metadata: | |||
name: nputest | |||
spec: | |||
executionType: Start | |||
retryPolicy: | |||
fancyRetryPolicy: true | |||
maxRetryCount: 2 | |||
taskRoles: | |||
- name: ps | |||
taskNumber: 1 | |||
frameworkAttemptCompletionPolicy: | |||
minFailedTaskCount: 1 | |||
minSucceededTaskCount: -1 | |||
task: | |||
retryPolicy: | |||
fancyRetryPolicy: false | |||
maxRetryCount: 0 | |||
pod: | |||
spec: | |||
restartPolicy: Never | |||
hostNetwork: false | |||
containers: | |||
- name: nvidiatest | |||
image: cambricon-test2:v0.4 | |||
command: [ | |||
"sh", "-c", | |||
"/mnt/frameworkbarrier/injector.sh && sleep 10d"] | |||
resources: | |||
limits: | |||
cambricon.com/mlu: 1 | |||
volumeMounts: | |||
- name: cambricon-datasets | |||
mountPath: /Cambricon-MLU100/datasets | |||
- name: model | |||
mountPath: /Cambricon-MLU100/models | |||
- name: frameworkbarrier-volume | |||
mountPath: /mnt/frameworkbarrier | |||
- name: nvidia-runfile | |||
mountPath: /home | |||
serviceAccountName: frameworkbarrier | |||
initContainers: | |||
- name: framenameworkbarrier | |||
image: frameworkcontroller/frameworkbarrier | |||
volumeMounts: | |||
- name: frameworkbarrier-volume | |||
mountPath: /mnt/frameworkbarrier | |||
volumes: | |||
- name: frameworkbarrier-volume | |||
emptyDir: {} | |||
- name: nvidia-runfile | |||
hostPath: | |||
path: /home/amax | |||
- name: cambricon-datasets | |||
hostPath: | |||
path: /home/cambricon/V7.3.2/Cambricon-MLU100/datasets | |||
- name: model | |||
hostPath: | |||
path: /home/cambricon/V7.3.2/Cambricon-MLU100/models | |||
# 使用kubectl create -f ...yaml生成pod | |||
# 进入pod,运行./run_all.sh | |||
@@ -0,0 +1,67 @@ | |||
FROM cambricon/test/ubuntu:v4.1 | |||
WORKDIR /home/Cambricon-Test | |||
COPY Cambricon-MLU100.tar.gz /home/Cambricon-Test/Cambricon-MLU100.tar.gz | |||
RUN tar zvxf /home/Cambricon-Test/Cambricon-MLU100.tar.gz -C /home/Cambricon-Test \ | |||
&& rm /home/Cambricon-Test/Cambricon-MLU100.tar.gz \ | |||
&& mv /home/Cambricon-Test/Cambricon-MLU100 ../ \ | |||
&& rm -rf /home/Cambricon-Test \ | |||
&& mv /home/Cambricon-MLU100 /home/Cambricon-Test | |||
# GLOG_minloglevel Set log level which is output to stderr, 0: INFO/WARNING/ERROR/FATAL, 1: WARNING/ERROR/FATAL, 2: ERROR/FATAL, 3: FATAL, | |||
ENV ROOT_HOME="/home/Cambricon-Test" | |||
ENV NEUWARE_HOME=${ROOT_HOME} \ | |||
NEUWARE_PATH=${ROOT_HOME} \ | |||
CAMBRICON_HOME=${ROOT_HOME} \ | |||
TENSORFLOW_HOME=${ROOT_HOME}/tensorflow | |||
ENV TENSORFLOW_MODEL_HOME=${TENSORFLOW_HOME}/models/online \ | |||
TENSORFLOW_OFFLINE_MODEL_HOME=${TENSORFLOW_HOME}/models/offline \ | |||
TENSORFLOW_MODELS_MODEL_HOME=${ROOT_HOME}/models/tensorflow_models/ \ | |||
TENSORFLOW_MODELS_DATA_HOME=${ROOT_HOME}/datasets/tensorflow_models/ \ | |||
PB_TO_CAMBRICON_PATH=${TENSORFLOW_HOME}/tools/pb_to_cambricon/ \ | |||
tensorflow=${TENSORFLOW_HOME}/src/tensorflow-v1.10 \ | |||
TF_SET_ANDROID_WORKSPACE=0 \ | |||
MXNET_HOME=${ROOT_HOME}/mxnet \ | |||
CAFFE_HOME=${ROOT_HOME}/caffe \ | |||
ONNX_HOME=${ROOT_HOME}/onnx \ | |||
CNRT_HOME=${CAMBRICON_HOME} \ | |||
CNML_HOME=${CAMBRICON_HOME} \ | |||
CNPERF_HOME=${CAMBRICON_HOME}/cnperf \ | |||
CNMON_HOME=${CAMBRICON_HOME}/cnmon | |||
ENV CNDEV_HOME=${CNMON_HOME}/sdk \ | |||
CNSTREAM_HOME=${CAMBRICON_HOME}/cnstream \ | |||
CNCODEC_HOME=${CAMBRICON_HOME}/cncodec \ | |||
DRV_HOME=${CAMBRICON_HOME}/driver \ | |||
DATASET_HOME=${ROOT_HOME}/datasets \ | |||
PYTHONPATH=${PYTHONPATH}:${CAFFE_HOME}/src/caffe/python:${MXNET_HOME}/src/cambricon_mxnet/python \ | |||
PATH=${PATH}:${CAMBRICON_HOME}/bin | |||
ENV LD_LIBRARY_PATH=${CNRT_HOME}/lib:${CNML_HOME}/lib:${CNDEV_HOME}/lib:${CAFFE_HOME}/lib:${CNSTREAM_HOME}/lib:${CNCODEC_HOME}/lib:${MXNET_HOME}/lib:${LD_LIBRARY_PATH} \ | |||
MXNET_ENGINE_TYPE="NaiveEngine" \ | |||
MXNET_EXEC_FUSE_MLU_OPS=true \ | |||
GLOG_alsologtostderr=true \ | |||
GLOG_minloglevel=0 \ | |||
MXNET_MODELS_DIR=${MXNET_HOME}/models \ | |||
MXNET_DATA_DIR=${DATASET_HOME} \ | |||
ONNX_MODELS_DIR=${ONNX_HOME}/models \ | |||
ONNX_SRC_DIR=${ONNX_HOME}/src/onnx \ | |||
ONNX_DATA_DIR=${DATASET_HOME}/imagenet \ | |||
OS_VERSION="ubuntu16.04" | |||
RUN UNAME_V=`cat /etc/issue | head -n 1` | |||
WORKDIR "${ROOT_HOME}/bin" | |||
RUN if [ ! -L "${ROOT_HOME}/bin/cnmon" ]; then find ${OS_VERSION} -type f -exec ln -s {} \; ; fi | |||
WORKDIR "${ROOT_HOME}/lib" | |||
RUN if [ ! -L "${ROOT_HOME}/lib/libcnrt.so" ]; then \ | |||
find ${OS_VERSION} -name '*.so' -exec ln -s {} \; ; \ | |||
ln -s ${OS_VERSION}/libcnrt.so* libcnrt.so ; \ | |||
ln -s ${OS_VERSION}/libcnml.so* libcnml.so ; fi | |||
WORKDIR ${ROOT_HOME} | |||
RUN if [ ! -L lib64 ]; then ln -s lib lib64; fi | |||
ADD configure.sh ${ROOT_HOME}/configure.sh | |||
RUN chmod +x ${ROOT_HOME}/configure.sh && ${ROOT_HOME}/configure.sh | |||
WORKDIR /home/Cambricon-Test | |||
CMD ["/bin/bash"] |
@@ -0,0 +1,33 @@ | |||
1. 在docker环境下运行寒武纪软件栈: | |||
- 进入docker环境。 | |||
./run-cambricon-test-docker.sh | |||
- 环境变量初始化。 | |||
source env.sh | |||
- Caffe example 的编译与运行 | |||
- online example: | |||
- 注释:online example模型中分类模型的运行,数据类型是float16 | |||
- cd Cambricon-Test/caffe/examples/online/c++/classification | |||
./run_fp16.sh | |||
- offline example: | |||
- 注释: offline example模型中分类模型的运行,数据类型是float16 | |||
- cd Cambricon-Test/caffe/examples/offline/c++/classification | |||
./run_fp16.sh | |||
- Tensorflow example 的编译与运行 | |||
- online example: | |||
- cd Cambricon-Test/tensorflow/examples/online/c++/classification | |||
- ./tensorflow-v1.10_online_block.sh alexnet mlu float16 dense 2 4 1 0 1000 | |||
- offline example: | |||
- cd Cambricon-Test/tensorflow/examples/offline/classification | |||
- ./tensorflow-v1.10_online_block.sh alexnet mlu float16 dense 2 4 1 0 1000 | |||
- MXNet example 的编译与运行 | |||
- online example: | |||
- ./run_all.sh | |||
- offline example: | |||
- ./run_all_pipe.sh | |||
- ONNX example 的编译与运行 | |||
- online example: | |||
- cd Cambricon-Test/onnx/examples/online/classification | |||
- ./run_online.sh model is_sparse device_option other_option | |||
- offline example: | |||
- cd Cambricon-Test/onnx/examples/offline/classification | |||
- ./run_offline.sh model option channel_num |
@@ -1,130 +0,0 @@ | |||
# Copyright (c) Microsoft Corporation | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
# | |||
# Copyright (c) Peking University 2018 | |||
# | |||
# The software is released under the Open-Intelligence Open Source License V1.0. | |||
# The copyright owner promises to follow "Open-Intelligence Open Source Platform | |||
# Management Regulation V1.0", which is provided by The New Generation of | |||
# Artificial Intelligence Technology Innovation Strategic Alliance (the AITISA). | |||
# If corresponding values aren't be set in the machine list, the default value will be filled in. | |||
default-machine-properties: | |||
# Account with root permission | |||
username: username | |||
password: password | |||
sshport: port | |||
machine-sku: | |||
NC24R: | |||
mem: 224 | |||
gpu: | |||
# type: gpu{type} | |||
type: teslak80 | |||
count: 4 | |||
cpu: | |||
vcore: 24 | |||
#dataFolder: "/mnt" | |||
#Note: Up to now, the only supported os version is Ubuntu16.04. Please do not change it here. | |||
os: ubuntu16.04 | |||
D8SV3: | |||
mem: 32 | |||
cpu: | |||
vcore: 8 | |||
#dataFolder: "/mnt" | |||
#Note: Up to now, the only supported os version is Ubuntu16.04. Pls don't change it here. | |||
os: ubuntu16.04 | |||
machine-list: | |||
- hostname: hostname (echo `hostname`) | |||
hostip: IP | |||
machine-type: D8SV3 | |||
etcdid: etcdid1 | |||
#sshport: PORT (Optional) | |||
#username: username (Optional) | |||
#password: password (Optional) | |||
k8s-role: master | |||
dashboard: "true" | |||
zkid: "1" | |||
openi-master: "true" | |||
- hostname: hostname | |||
hostip: IP | |||
machine-type: D8SV3 | |||
etcdid: etcdid2 | |||
#sshport: PORT (Optional) | |||
#username: username (Optional) | |||
#password: password (Optional) | |||
k8s-role: master | |||
node-exporter: "true" | |||
- hostname: hostname | |||
hostip: IP | |||
machine-type: D8SV3 | |||
etcdid: etcdid3 | |||
#sshport: PORT (Optional) | |||
#username: username (Optional) | |||
#password: password (Optional) | |||
k8s-role: master | |||
node-exporter: "true" | |||
- hostname: hostname | |||
hostip: IP | |||
machine-type: NC24R | |||
#sshport: PORT (Optional) | |||
#username: username (Optional) | |||
#password: password (Optional) | |||
k8s-role: worker | |||
openi-worker: "true" | |||
- hostname: hostname | |||
hostip: IP | |||
machine-type: NC24R | |||
#sshport: PORT (Optional) | |||
#username: username (Optional) | |||
#password: password (Optional) | |||
k8s-role: worker | |||
openi-worker: "true" | |||
- hostname: hostname | |||
hostip: IP | |||
machine-type: NC24R | |||
#sshport: PORT (Optional) | |||
#username: username (Optional) | |||
#password: password (Optional) | |||
k8s-role: worker | |||
openi-worker: "true" | |||
@@ -1,83 +0,0 @@ | |||
# Copyright (c) Microsoft Corporation | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
# | |||
# Copyright (c) Peking University 2018 | |||
# | |||
# The software is released under the Open-Intelligence Open Source License V1.0. | |||
# The copyright owner promises to follow "Open-Intelligence Open Source Platform | |||
# Management Regulation V1.0", which is provided by The New Generation of | |||
# Artificial Intelligence Technology Innovation Strategic Alliance (the AITISA). | |||
# If corresponding values aren't be set in the machine list, the default value will be filled in. | |||
default-machine-properties: | |||
{%- for key in root['default-machine-properties'] %} | |||
{{key}}: {{root['default-machine-properties'][key]}} | |||
{%- endfor %} | |||
machine-sku: | |||
{% for machine in root['machine-sku'] %} | |||
{{machine}}: | |||
mem: {{root['machine-sku'][machine]['mem']}} | |||
{% if 'gpu' in root['machine-sku'][machine] -%} | |||
gpu: | |||
type: {{root['machine-sku'][machine]['gpu']['type']}} | |||
count: {{root['machine-sku'][machine]['gpu']['count']}} | |||
{% endif -%} | |||
{% if 'cpu' in root['machine-sku'][machine] -%} | |||
cpu: | |||
vcore: {{root['machine-sku'][machine]['cpu']['vcore']}} | |||
{% endif -%} | |||
os: {{root['machine-sku'][machine]['os']}} | |||
{% endfor %} | |||
machine-list: | |||
{% for host in root['machine-list'] %} | |||
- hostname: {{ host['hostname'] }} | |||
hostip: {{ host['hostip'] }} | |||
machine-type: {{ host['machine-type']}} | |||
{% if 'etcdid' in host -%} | |||
etcdid: {{ host['etcdid'] }} | |||
{% endif -%} | |||
{% if 'username' in host -%} | |||
username: {{ host['username'] }} | |||
{% endif -%} | |||
{% if 'password' in host -%} | |||
password: {{ host['password'] }} | |||
{% endif -%} | |||
{% if 'sshport' in host -%} | |||
sshport: {{ host['sshport'] }} | |||
{% endif -%} | |||
k8s-role: {{ host['k8s-role'] }} | |||
{% if 'dashboard' in host -%} | |||
dashboard: "{{ host['dashboard'] }}" | |||
{% endif -%} | |||
{% if 'zkid' in host -%} | |||
zkid: "{{ host['zkid'] }}" | |||
{% endif -%} | |||
{% if 'openi-master' in host -%} | |||
openi-master: "{{ host['openi-master'] }}" | |||
{% endif -%} | |||
{% if 'openi-worker' in host -%} | |||
openi-worker: "{{ host['openi-worker'] }}" | |||
{% endif -%} | |||
{% if 'watchdog' in host -%} | |||
watchdog: "{{ host['watchdog'] }}" | |||
{% endif -%} | |||
{% endfor %} |
@@ -1,85 +0,0 @@ | |||
# Copyright (c) Microsoft Corporation | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
# | |||
# Copyright (c) Peking University 2018 | |||
# | |||
# The software is released under the Open-Intelligence Open Source License V1.0. | |||
# The copyright owner promises to follow "Open-Intelligence Open Source Platform | |||
# Management Regulation V1.0", which is provided by The New Generation of | |||
# Artificial Intelligence Technology Innovation Strategic Alliance (the AITISA). | |||
# the component should be bootstrapped remotely | |||
component-list: | |||
apiserver: | |||
- src: apiserver.yaml | |||
# the full dst path will be " template/generated/&{hostip}/ .... " | |||
dst: src/etc/kubernetes/manifests | |||
controller-manager: | |||
- src: controller-manager.yaml | |||
dst: src/etc/kubernetes/manifests | |||
etcd: | |||
- src: etcd.yaml | |||
dst: src/etc/kubernetes/manifests | |||
scheduler: | |||
- src: scheduler.yaml | |||
dst: src/etc/kubernetes/manifests | |||
kubelet: | |||
- src: kubelet.sh | |||
dst: src/ | |||
kubeconfig: | |||
- src: config | |||
dst: src/etc/kubernetes | |||
haproxy: | |||
- src: haproxy.yaml | |||
dst: src/etc/kubernetes/manifests | |||
- src: haproxy.cfg | |||
dst: src/haproxy | |||
k8s-role: | |||
master: | |||
component: | |||
- name: apiserver | |||
- name: controller-manager | |||
- name: etcd | |||
- name: scheduler | |||
- name: kubelet | |||
- name: kubeconfig | |||
worker: | |||
component: | |||
- name: kubelet | |||
- name: kubeconfig | |||
proxy: | |||
component: | |||
- name: kubelet | |||
- name: haproxy | |||
- name: kubeconfig |
@@ -1,55 +0,0 @@ | |||
# Copyright (c) Microsoft Corporation | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
# | |||
# Copyright (c) Peking University 2018 | |||
# | |||
# The software is released under the Open-Intelligence Open Source License V1.0. | |||
# The copyright owner promises to follow "Open-Intelligence Open Source Platform | |||
# Management Regulation V1.0", which is provided by The New Generation of | |||
# Artificial Intelligence Technology Innovation Strategic Alliance (the AITISA). | |||
kubernetes: | |||
# Find the namesever in /etc/resolv.conf | |||
cluster-dns: IP | |||
# To support k8s ha, you should set an lb address here. | |||
# If deploy k8s with single master node, please set master IP address here | |||
load-balance-ip: IP | |||
# specify an IP range not in the same network segment with the host machine. | |||
service-cluster-ip-range: 169.254.0.0/16 | |||
# According to the etcdversion, you should fill a corresponding backend name. | |||
# If you are not familiar with etcd, please don't change it. | |||
storage-backend: etcd3 | |||
# The docker registry used in the k8s deployment. If you can access to gcr, we suggest to use gcr. | |||
docker-registry: gcr.io/google_containers | |||
# http://gcr.io/google_containers/hyperkube. Or the tag in your registry. | |||
hyperkube-version: v1.9.4 | |||
# http://gcr.io/google_containers/etcd. Or the tag in your registry. | |||
# If you are not familiar with etcd, please don't change it. | |||
etcd-version: 3.2.17 | |||
# http://gcr.io/google_containers/kube-apiserver. Or the tag in your registry. | |||
apiserver-version: v1.9.4 | |||
# http://gcr.io/google_containers/kube-scheduler. Or the tag in your registry. | |||
kube-scheduler-version: v1.9.4 | |||
# http://gcr.io/google_containers/kube-controller-manager | |||
kube-controller-manager-version: v1.9.4 | |||
# http://gcr.io/google_containers/kubernetes-dashboard-amd64 | |||
dashboard-version: v1.8.3 | |||
@@ -1,182 +0,0 @@ | |||
# Copyright (c) Microsoft Corporation | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
# | |||
# Copyright (c) Peking University 2018 | |||
# | |||
# The software is released under the Open-Intelligence Open Source License V1.0. | |||
# The copyright owner promises to follow "Open-Intelligence Open Source Platform | |||
# Management Regulation V1.0", which is provided by The New Generation of | |||
# Artificial Intelligence Technology Innovation Strategic Alliance (the AITISA). | |||
cluster: | |||
clusterid: openi-example | |||
# Choose proper nvidia driver version from this url http://www.nvidia.com/object/linux-amd64-display-archive.html | |||
nvidia-drivers-version: 384.111 | |||
# static docker-version | |||
# https://download.docker.com/linux/static/stable/x86_64/docker-17.06.2-ce.tgz | |||
# Docker client used by hadoop NM (node manager) to launch Docker containers (e.g., of a deep learning job) in the host env. | |||
docker-verison: 17.06.2 | |||
# HDFS, zookeeper data path on your cluster machine. | |||
data-path: "/datastorage" | |||
# the docker registry to store docker images that contain system services like frameworklauncher, hadoop, etc. | |||
docker-registry-info: | |||
# If public, please fill it the same as your username | |||
docker-namespace: your_registry_namespace | |||
# E.g., gcr.io. If public,fill docker_registry_domain with word "public" | |||
# docker_registry_domain: public | |||
docker-registry-domain: your_registry_domain | |||
# If the docker registry doesn't require authentication, please leave docker_username and docker_password empty | |||
docker-username: your_registry_username | |||
docker-password: your_registry_password | |||
docker-tag: your_image_tag | |||
# The name of the secret in kubernetes will be created in your cluster | |||
# Must be lower case, e.g., regsecret. | |||
secret-name: your_secret_name | |||
hadoop: | |||
# If custom_hadoop_binary_path is None, script will download a standard version of hadoop binary for you | |||
# hadoop-version | |||
# http://archive.apache.org/dist/hadoop/common/hadoop-2.9.0/hadoop-2.9.0.tar.gz | |||
custom-hadoop-binary-path: None | |||
hadoop-version: 2.9.0 | |||
# Step 1 of 4 to set up Hadoop queues. | |||
# Define all virtual clusters, equivalent concept of Hadoop queues. | |||
# The capacity of each virtual cluster is specified as the percentage of the whole resources in the system. | |||
# All un-configured resources will go to an auto-generated virtual cluster called 'default'. | |||
virtualClusters: | |||
vc1: | |||
description: VC for Alice's team. | |||
capacity: 20 | |||
vc2: | |||
description: VC for Bob's team. | |||
capacity: 20 | |||
vc3: | |||
description: VC for Charlie's team. | |||
capacity: 20 | |||
volumeMounts: | |||
- mountPath: /gpai | |||
name: scriptdir | |||
- mountPath: /ghome | |||
name: userhome | |||
- mountPath: /gshare | |||
name: share | |||
- mountPath: /gmodel | |||
name: model | |||
volumes: | |||
- name: scriptdir | |||
hostPath: | |||
path: /gpai | |||
- name: userhome | |||
hostPath: | |||
path: /ghome | |||
- name: share | |||
hostPath: | |||
path: /gshare | |||
- name: model | |||
hostPath: | |||
path: /gmodel | |||
frameworklauncher: | |||
frameworklauncher-port: 9086 | |||
restserver: | |||
# port for rest api server | |||
server-port: 9186 | |||
# secret for signing authentication tokens, e.g., "Hello OPENI!" | |||
jwt-secret: your_jwt_secret | |||
# database admin username | |||
default-openi-admin-username: your_default_openi_admin_username | |||
# database admin password | |||
default-openi-admin-password: your_default_openi_admin_password | |||
# openi database | |||
openi_db_host : "db-host-ip" | |||
openi_db_port : 3308 | |||
openi_db_user : "db-user" | |||
openi_db_pwd : "db-user-password" | |||
openi_db_database : "db-database" | |||
templates_store_path: "/var/openi/rest-server/templates" | |||
# iptable path | |||
nat-path: "/var/pai/rest-server/natconfig.json" | |||
volumeMounts: | |||
- mountPath: /gpai | |||
name: scriptdir | |||
volumes: | |||
- name: scriptdir | |||
hostPath: | |||
path: /gpai | |||
webportal: | |||
# port for webportal | |||
server-port: 9286 | |||
grafana: | |||
# port for grafana | |||
grafana-port: 3000 | |||
prometheus: | |||
# port for prometheus port | |||
prometheus-port: 9091 | |||
# port for node exporter | |||
node-exporter-port: 9100 | |||
pylon: | |||
# port of pylon | |||
port: 80 | |||
model-exchange: | |||
port: 6023 | |||
volumeMounts: | |||
- mountPath: /gmodel | |||
name: scriptdir | |||
volumes: | |||
- name: scriptdir | |||
hostPath: | |||
path: /gmodel | |||
model-hub: | |||
server_port: 6024 | |||
mysql: root:root@tcp(192.168.113.221:3308)/modelhub | |||
file_storage_path: /gmodel | |||
volumeMounts: | |||
- mountPath: /gmodel | |||
name: scriptdir | |||
volumes: | |||
- name: scriptdir | |||
hostPath: | |||
path: /gmodel |
@@ -0,0 +1,4 @@ | |||
utils/__pycache__ | |||
services/__pycache__ |
@@ -0,0 +1,97 @@ | |||
# -*- coding: UTF-8 -*- | |||
import os | |||
import argparse | |||
import yaml | |||
import utils.docker | |||
import utils.dir | |||
import utils.setting | |||
import services.rest_server | |||
import services.image_factory_agent | |||
import services.image_factory_shield | |||
import services.log_service_bee | |||
import services.log_service_queen | |||
workdir_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) | |||
def load_config(env): | |||
config = None | |||
if env == "": | |||
env = "dev" | |||
return utils.setting.load(env,os.path.join(workdir_root,"deploy-script/config")) | |||
def get_service_ctx(service_name,config): | |||
ctx = dict() | |||
ctx["workdir"] = "" | |||
if service_name == "rest-server": | |||
ctx["workdir"] = os.path.join(workdir_root,"rest-server/") | |||
ctx["tag"] = services.rest_server.getTag(config) | |||
ctx["buildPrepare"] = services.rest_server.buildPrepare | |||
ctx["buildEnd"] = services.rest_server.buildEnd | |||
if service_name == "image-factory-agent": | |||
ctx["workdir"] = os.path.join(workdir_root,"image-factory/agent") | |||
ctx["tag"] = services.image_factory_agent.getTag(config) | |||
ctx["buildPrepare"] = services.image_factory_agent.buildPrepare | |||
ctx["buildEnd"] = services.image_factory_agent.buildEnd | |||
if service_name == "image-factory-shield": | |||
ctx["workdir"] = os.path.join(workdir_root,"image-factory/shield") | |||
ctx["tag"] = services.image_factory_shield.getTag(config) | |||
ctx["buildPrepare"] = services.image_factory_shield.buildPrepare | |||
ctx["buildEnd"] = services.image_factory_shield.buildEnd | |||
if service_name == "log-service-bee": | |||
ctx["workdir"] = os.path.join(workdir_root,"log-service/bee") | |||
ctx["tag"] = services.log_service_bee.getTag(config) | |||
ctx["buildPrepare"] = services.log_service_bee.buildPrepare | |||
ctx["buildEnd"] = services.log_service_bee.buildEnd | |||
if service_name == "log-service-queen": | |||
ctx["workdir"] = os.path.join(workdir_root,"log-service/queen") | |||
ctx["tag"] = services.log_service_queen.getTag(config) | |||
ctx["buildPrepare"] = services.log_service_queen.buildPrepare | |||
ctx["buildEnd"] = services.log_service_queen.buildEnd | |||
return ctx | |||
def build_and_push_docker_image(): | |||
parser = argparse.ArgumentParser() | |||
parser.add_argument('-s', '--service', required=True, help="the service will be deployed") | |||
parser.add_argument('-e', '--env', required=False, help="env") | |||
workdir_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) | |||
args = parser.parse_args() | |||
config = load_config(args.env) | |||
service_name = args.service | |||
service = get_service_ctx(service_name,config) | |||
if service["workdir"] == "" or service["tag"] == "": | |||
print("Unknown Service :{}".format(service_name)) | |||
return 1 | |||
if None != service["buildPrepare"]: | |||
service["buildPrepare"](workdir_root,config) | |||
print("build",service_name) | |||
utils.docker.build(service["tag"],service["workdir"]) | |||
utils.docker.push(service["tag"],service["workdir"]) | |||
if None != service["buildEnd"]: | |||
service["buildEnd"](workdir_root,config) | |||
print("Successfully") | |||
build_and_push_docker_image() |
@@ -0,0 +1,64 @@ | |||
env: "dev" | |||
cluster: "openi-test" | |||
common: | |||
mysql: | |||
host: '192.168.202.73' | |||
port: 3308 | |||
user: "root" | |||
pwd: "root" | |||
dockerRegistry: | |||
host: "192.168.202.74" | |||
port: "5000" | |||
user: "admin" | |||
pwd: "harboradmin" | |||
prometheus: "http://192.168.202.73:9091" | |||
restServer: | |||
jwtSecret: "helloworld" | |||
serverPort: 9186 | |||
logService: "/log-service" | |||
volumes: | |||
- name: ghome | |||
mountPath: /ghome | |||
hostPath: /ghome | |||
- name: gmodel | |||
mountPath: /gmodel | |||
hostPath: /gmodel | |||
- name: kube-config | |||
mountPath: /kube | |||
hostPath: /home/amax/.kube | |||
k8sApiServer: | |||
host: "https://192.168.202.71:6443" | |||
kubeConfigPath: "/kube/config" | |||
imageFactory: | |||
shield: | |||
port: 9001 | |||
agent: | |||
port: 9002 | |||
shield: "http://192.168.202.71:9001" | |||
volumes: | |||
- name: docker-run | |||
mountPath: /var/run | |||
hostPath: /var/run | |||
- name: docker | |||
mountPath: /var/lib/docker | |||
hostPath: /var/lib/docker | |||
logService: | |||
bee: | |||
port: 9003 | |||
containers: "/var/lib/docker/containers" | |||
volumes: | |||
- name: "container" | |||
mountPath: "/var/lib/docker/containers" | |||
hostPath: "/var/lib/docker/containers" | |||
queen: | |||
port: 9004 | |||
restServer: | |||
host: "http://192.168.202.71:9186" | |||
user: "test123" | |||
pwd: "123456" |
@@ -0,0 +1,35 @@ | |||
env: "prod" | |||
cluster: "openi-test" | |||
common: | |||
mysql: | |||
host: "" | |||
port: "" | |||
user: "" | |||
pwd: "" | |||
influxdb: | |||
host: "" | |||
port: "" | |||
user: "" | |||
pwd: "" | |||
dockerRegistry: | |||
host: "" | |||
port: "" | |||
user: "" | |||
pwd: "" | |||
prometheus: "" | |||
restServer: | |||
jwtSecret: "" | |||
volumes: | |||
- name: "ghome" | |||
mountPath: "/ghome" | |||
hostPath: "/ghome" | |||
- name: "gpai" | |||
mountPath: "/gpai" | |||
hostPath: "/gpai" | |||
k8sApiServer: | |||
host: "" | |||
kubeConfigPath: "" |
@@ -0,0 +1,97 @@ | |||
# -*- coding: UTF-8 -*- | |||
import os | |||
import codecs | |||
import argparse | |||
import yaml | |||
import utils.k8s | |||
import utils.dir | |||
import utils.setting | |||
import services.rest_server | |||
import services.image_factory_agent | |||
import services.image_factory_shield | |||
import services.log_service_bee | |||
import services.log_service_queen | |||
from jinja2 import Template | |||
workdir_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) | |||
def load_config(env): | |||
config = None | |||
if env == "": | |||
env = "dev" | |||
return utils.setting.load(env,os.path.join(workdir_root,"deploy-script/config")) | |||
def get_service_ctx(service_name,config): | |||
ctx = dict() | |||
ctx["workdir"] = "" | |||
if service_name == "rest-server": | |||
ctx["workdir"] = os.path.join(workdir_root,"rest-server/") | |||
ctx["deployTemplateName"] = services.rest_server.deployTemplateName | |||
ctx["historyClean"] = services.rest_server.historyClean | |||
ctx["getDeployConfig"] = services.rest_server.getDeployConfig | |||
if service_name == "image-factory-agent": | |||
ctx["workdir"] = os.path.join(workdir_root,"image-factory/agent") | |||
ctx["deployTemplateName"] = services.image_factory_agent.deployTemplateName | |||
ctx["historyClean"] = services.image_factory_agent.historyClean | |||
ctx["getDeployConfig"] = services.image_factory_agent.getDeployConfig | |||
if service_name == "image-factory-shield": | |||
ctx["workdir"] = os.path.join(workdir_root,"image-factory/shield") | |||
ctx["deployTemplateName"] = services.image_factory_shield.deployTemplateName | |||
ctx["historyClean"] = services.image_factory_shield.historyClean | |||
ctx["getDeployConfig"] = services.image_factory_shield.getDeployConfig | |||
if service_name == "log-service-bee": | |||
ctx["workdir"] = os.path.join(workdir_root,"log-service/bee") | |||
ctx["deployTemplateName"] = services.log_service_bee.deployTemplateName | |||
ctx["historyClean"] = services.log_service_bee.historyClean | |||
ctx["getDeployConfig"] = services.log_service_bee.getDeployConfig | |||
if service_name == "log-service-queen": | |||
ctx["workdir"] = os.path.join(workdir_root,"log-service/queen") | |||
ctx["deployTemplateName"] = services.log_service_queen.deployTemplateName | |||
ctx["historyClean"] = services.log_service_queen.historyClean | |||
ctx["getDeployConfig"] = services.log_service_queen.getDeployConfig | |||
return ctx | |||
def deploy_service(): | |||
parser = argparse.ArgumentParser() | |||
parser.add_argument('-s', '--service', required=True, help="the service will be deployed") | |||
parser.add_argument('-e', '--env', required=False, help="env") | |||
args = parser.parse_args() | |||
config = load_config(args.env) | |||
service_name = args.service | |||
service = get_service_ctx(service_name,config) | |||
if service["workdir"] == "" or service["deployTemplateName"] == "" or service["historyClean"] == None: | |||
print("Unknown Service :{}".format(service_name)) | |||
return 1 | |||
deploy_template = codecs.open(os.path.join(workdir_root,"deploy-script","template",service["deployTemplateName"]),"r","utf-8").read() | |||
deploy_config = service["getDeployConfig"](config) | |||
deploy_yaml = Template(deploy_template).render(deploy_config) | |||
codecs.open(os.path.join(service["workdir"],"deploy.yaml"),"w","utf-8").write(deploy_yaml) | |||
service["historyClean"]() | |||
utils.k8s.deploy("deploy.yaml",service["workdir"]) | |||
print("Successfully") | |||
deploy_service() |
@@ -0,0 +1,39 @@ | |||
# -*- coding: UTF-8 -*- | |||
import utils.k8s | |||
service_name = "image-factory-agent" | |||
deployTemplateName = "image-factory-agent.yaml" | |||
def getTag(config): | |||
docker_registry_host = config.get("common").get("dockerRegistry").get("host") | |||
docker_registry_port = config.get("common").get("dockerRegistry").get("port") | |||
return "{}:{}/openi/{}:v1".format(docker_registry_host,docker_registry_port,service_name) | |||
def getDaemonsetName(): | |||
return "{}-ds".format(service_name) | |||
def getDeployConfig(config): | |||
agent_config = config.get("imageFactory").get("agent") | |||
return { | |||
"ENV":config.get("env"), | |||
"DAEMONSET_NAME":getDaemonsetName(), | |||
"IMAGE_ADDRESS":getTag(config), | |||
"VOLUME_MOUNTS":agent_config.get("volumes"), | |||
"PORT": agent_config.get("port"), | |||
"SHIELD_ADDRESS": agent_config.get("shield") | |||
} | |||
def historyClean(): | |||
ds_name = getDaemonsetName() | |||
utils.k8s.removeDaemonset(ds_name) | |||
def buildPrepare(root,config): | |||
pass | |||
def buildEnd(root,config): | |||
pass |
@@ -0,0 +1,36 @@ | |||
# -*- coding: UTF-8 -*- | |||
import utils.k8s | |||
service_name = "image-factory-shield" | |||
deployTemplateName = "image-factory-shield.yaml" | |||
def getTag(config): | |||
docker_registry_host = config.get("common").get("dockerRegistry").get("host") | |||
docker_registry_port = config.get("common").get("dockerRegistry").get("port") | |||
return "{}:{}/openi/{}:v1".format(docker_registry_host,docker_registry_port,service_name) | |||
def getDaemonsetName(): | |||
return "{}-ds".format(service_name) | |||
def getDeployConfig(config): | |||
agent_config = config.get("imageFactory").get("shield") | |||
return { | |||
"ENV":config.get("env"), | |||
"DAEMONSET_NAME":getDaemonsetName(), | |||
"IMAGE_ADDRESS":getTag(config), | |||
"PORT": agent_config.get("port") | |||
} | |||
def historyClean(): | |||
ds_name = getDaemonsetName() | |||
utils.k8s.removeDaemonset(ds_name) | |||
def buildPrepare(root,config): | |||
pass | |||
def buildEnd(root,config): | |||
pass |
@@ -0,0 +1,39 @@ | |||
# -*- coding: UTF-8 -*- | |||
import utils.k8s | |||
service_name = "log-service-bee" | |||
deployTemplateName = "log-service-bee.yaml" | |||
def getTag(config): | |||
docker_registry_host = config.get("common").get("dockerRegistry").get("host") | |||
docker_registry_port = config.get("common").get("dockerRegistry").get("port") | |||
return "{}:{}/openi/{}:v1".format(docker_registry_host,docker_registry_port,service_name) | |||
def getDaemonsetName(): | |||
return "{}-ds".format(service_name) | |||
def getDeployConfig(config): | |||
bee_config = config.get("logService").get("bee") | |||
return { | |||
"ENV":config.get("env"), | |||
"DAEMONSET_NAME":getDaemonsetName(), | |||
"IMAGE_ADDRESS":getTag(config), | |||
"VOLUME_MOUNTS":bee_config.get("volumes"), | |||
"PORT": bee_config.get("port"), | |||
"CONTAINERS": bee_config.get("containers") | |||
} | |||
def historyClean(): | |||
ds_name = getDaemonsetName() | |||
utils.k8s.removeDaemonset(ds_name) | |||
def buildPrepare(root,config): | |||
pass | |||
def buildEnd(root,config): | |||
pass |
@@ -0,0 +1,40 @@ | |||
# -*- coding: UTF-8 -*- | |||
import utils.k8s | |||
service_name = "log-service-queen" | |||
deployTemplateName = "log-service-queen.yaml" | |||
def getTag(config): | |||
docker_registry_host = config.get("common").get("dockerRegistry").get("host") | |||
docker_registry_port = config.get("common").get("dockerRegistry").get("port") | |||
return "{}:{}/openi/{}:v1".format(docker_registry_host,docker_registry_port,service_name) | |||
def getDaemonsetName(): | |||
return "{}-ds".format(service_name) | |||
def getDeployConfig(config): | |||
queen_config = config.get("logService").get("queen") | |||
return { | |||
"ENV":config.get("env"), | |||
"DAEMONSET_NAME":getDaemonsetName(), | |||
"IMAGE_ADDRESS":getTag(config), | |||
"PORT": queen_config.get("port"), | |||
"REST_SERVER": queen_config.get("restServer").get("host"), | |||
"REST_SERVER_USER": queen_config.get("restServer").get("user"), | |||
"REST_SERVER_PWD": queen_config.get("restServer").get("pwd") | |||
} | |||
def historyClean(): | |||
ds_name = getDaemonsetName() | |||
utils.k8s.removeDaemonset(ds_name) | |||
def buildPrepare(root,config): | |||
pass | |||
def buildEnd(root,config): | |||
pass |
@@ -0,0 +1,47 @@ | |||
# -*- coding: UTF-8 -*- | |||
import utils.k8s | |||
service_name = "rest-server" | |||
deployTemplateName = "rest-server.yaml" | |||
def getTag(config): | |||
docker_registry_host = config.get("common").get("dockerRegistry").get("host") | |||
docker_registry_port = config.get("common").get("dockerRegistry").get("port") | |||
return "{}:{}/openi/{}:v1".format(docker_registry_host,docker_registry_port,service_name) | |||
def getDaemonsetName(): | |||
return "{}-ds".format(service_name) | |||
def getDeployConfig(config): | |||
rest_server = config.get("restServer") | |||
common_config = config.get("common") | |||
return { | |||
"ENV":config.get("env"), | |||
"DAEMONSET_NAME":getDaemonsetName(), | |||
"IMAGE_ADDRESS":getTag(config), | |||
"VOLUME_MOUNTS":rest_server.get("volumes"), | |||
"SERVER_PORT": rest_server.get("serverPort"), | |||
"JWT_SECRET": rest_server.get("jwtSecret"), | |||
"MYSQL_HOST": common_config.get("mysql").get("host"), | |||
"MYSQL_PORT": common_config.get("mysql").get("port"), | |||
"MYSQL_USER": common_config.get("mysql").get("user"), | |||
"MYSQL_PWD": common_config.get("mysql").get("pwd"), | |||
"K8S_API_SERVER":rest_server.get("k8sApiServer").get("host"), | |||
"K8S_CONFIG": rest_server.get("k8sApiServer").get("kubeConfigPath"), | |||
"LOG_SERVICE": rest_server.get("logService") | |||
} | |||
def historyClean(): | |||
ds_name = getDaemonsetName() | |||
utils.k8s.removeDaemonset(ds_name) | |||
def buildPrepare(root,config): | |||
pass | |||
def buildEnd(root,config): | |||
pass |
@@ -0,0 +1,57 @@ | |||
# Copyright (c) PCL | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
apiVersion: apps/v1 | |||
kind: DaemonSet | |||
metadata: | |||
name: {{DAEMONSET_NAME}} | |||
spec: | |||
selector: | |||
matchLabels: | |||
app: image-factory-agent | |||
template: | |||
metadata: | |||
labels: | |||
app: image-factory-agent | |||
name: image-factory-agent | |||
spec: | |||
hostNetwork: true | |||
hostPID: true | |||
containers: | |||
- name: image-factory-agent | |||
image: {{IMAGE_ADDRESS}} | |||
env: | |||
- name: SHIELD_ADDRESS | |||
value: {{SHIELD_ADDRESS}} | |||
{% if VOLUME_MOUNTS %} | |||
volumeMounts: | |||
{% for volumeinfo in VOLUME_MOUNTS %} | |||
- mountPath: {{ volumeinfo['mountPath'] }} | |||
name: {{ volumeinfo['name'] }} | |||
{% endfor %} | |||
{% endif %} | |||
ports: | |||
- name: agent-port | |||
containerPort: {{PORT}} | |||
hostPort: {{PORT}} | |||
{% if VOLUME_MOUNTS %} | |||
volumes: | |||
{% for volumeinfo in VOLUME_MOUNTS %} | |||
- name: {{ volumeinfo['name'] }} | |||
hostPath: | |||
path: {{ volumeinfo['hostPath'] }} | |||
{% endfor %} | |||
{% endif %} |
@@ -0,0 +1,44 @@ | |||
# Copyright (c) PCL | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
apiVersion: apps/v1 | |||
kind: DaemonSet | |||
metadata: | |||
name: {{DAEMONSET_NAME}} | |||
spec: | |||
selector: | |||
matchLabels: | |||
app: image-factory-shield | |||
template: | |||
metadata: | |||
name: image-factory-shield | |||
labels: | |||
app: image-factory-shield | |||
spec: | |||
hostNetwork: false | |||
hostPID: false | |||
nodeSelector: | |||
noderole: "master" | |||
containers: | |||
- name: image-factory-shield | |||
image: {{IMAGE_ADDRESS}} | |||
imagePullPolicy: Always | |||
ports: | |||
- name: shield-port | |||
containerPort: {{PORT}} | |||
hostPort: {{PORT}} | |||
@@ -0,0 +1,57 @@ | |||
# Copyright (c) PCL | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
apiVersion: apps/v1 | |||
kind: DaemonSet | |||
metadata: | |||
name: {{DAEMONSET_NAME}} | |||
spec: | |||
selector: | |||
matchLabels: | |||
app: log-service-bee | |||
template: | |||
metadata: | |||
labels: | |||
app: log-service-bee | |||
name: log-service-bee | |||
spec: | |||
hostNetwork: false | |||
hostPID: false | |||
containers: | |||
- name: log-service-bee | |||
image: {{IMAGE_ADDRESS}} | |||
env: | |||
- name: CONTAINERS | |||
value: {{CONTAINERS}} | |||
{% if VOLUME_MOUNTS %} | |||
volumeMounts: | |||
{% for volumeinfo in VOLUME_MOUNTS %} | |||
- mountPath: {{ volumeinfo['mountPath'] }} | |||
name: {{ volumeinfo['name'] }} | |||
{% endfor %} | |||
{% endif %} | |||
ports: | |||
- name: bee-port | |||
containerPort: {{PORT}} | |||
hostPort: {{PORT}} | |||
{% if VOLUME_MOUNTS %} | |||
volumes: | |||
{% for volumeinfo in VOLUME_MOUNTS %} | |||
- name: {{ volumeinfo['name'] }} | |||
hostPath: | |||
path: {{ volumeinfo['hostPath'] }} | |||
{% endfor %} | |||
{% endif %} |
@@ -0,0 +1,54 @@ | |||
# Copyright (c) PCL | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
# | |||
apiVersion: apps/v1 | |||
kind: DaemonSet | |||
metadata: | |||
name: {{DAEMONSET_NAME}} | |||
spec: | |||
selector: | |||
matchLabels: | |||
app: log-service-queen | |||
template: | |||
metadata: | |||
name: log-service-queen | |||
labels: | |||
app: log-service-queen | |||
spec: | |||
hostNetwork: false | |||
hostPID: false | |||
nodeSelector: | |||
noderole: "master" | |||
containers: | |||
- name: log-service-queen | |||
image: {{IMAGE_ADDRESS}} | |||
imagePullPolicy: Always | |||
ports: | |||
- name: port | |||
containerPort: {{PORT}} | |||
env: | |||
- name: ENV | |||
value: prod | |||
- name: REST_SERVER | |||
value: "{{REST_SERVER}}" | |||
- name: REST_SERVER_USER | |||
value: {{REST_SERVER_USER}} | |||
- name: REST_SERVER_PWD | |||
value: "{{REST_SERVER_PWD}}" | |||
- name: PORT | |||
value: "{{PORT}}" | |||
@@ -0,0 +1,84 @@ | |||
# Copyright (c) PCL | |||
# All rights reserved. | |||
# | |||
# MIT License | |||
# | |||
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | |||
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation | |||
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and | |||
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |||
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |||
# | |||
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING | |||
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND | |||
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, | |||
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
apiVersion: apps/v1 | |||
kind: DaemonSet | |||
metadata: | |||
name: {{DAEMONSET_NAME}} | |||
spec: | |||
selector: | |||
matchLabels: | |||
app: rest-server | |||
template: | |||
metadata: | |||
name: rest-server | |||
labels: | |||
app: rest-server | |||
spec: | |||
hostNetwork: false | |||
hostPID: false | |||
nodeSelector: | |||
noderole: "master" | |||
containers: | |||
- name: rest-server | |||
image: {{ IMAGE_ADDRESS }} | |||
imagePullPolicy: Always | |||
securityContext: | |||
privileged: true | |||
{% if VOLUME_MOUNTS %} | |||
volumeMounts: | |||
{% for volumeinfo in VOLUME_MOUNTS %} | |||
- mountPath: {{ volumeinfo['mountPath'] }} | |||
name: {{ volumeinfo['name'] }} | |||
{% endfor %} | |||
{% endif %} | |||
env: | |||
- name: EGG_SERVER_ENV | |||
value: prod | |||
- name: ENV | |||
value: {{ENV}} | |||
- name: SERVER_PORT | |||
value: "{{SERVER_PORT}}" | |||
- name: JWT_SECRET | |||
value: {{JWT_SECRET}} | |||
- name: MYSQL_HOST | |||
value: "{{ MYSQL_HOST }}" | |||
- name: MYSQL_PORT | |||
value: "{{ MYSQL_PORT }}" | |||
- name: MYSQL_USER | |||
value: {{ MYSQL_USER }} | |||
- name: MYSQL_PWD | |||
value: {{ MYSQL_PWD }} | |||
- name: K8S_API_SERVER | |||
value: "{{K8S_API_SERVER}}" | |||
- name: K8S_CONFIG | |||
value: {{K8S_CONFIG}} | |||
- name: LOG_SERVICE | |||
value: {{LOG_SERVICE}} | |||
ports: | |||
- name: rest-server | |||
containerPort: {{SERVER_PORT}} | |||
hostPort: {{SERVER_PORT}} | |||
{% if VOLUME_MOUNTS %} | |||
volumes: | |||
{% for volumeinfo in VOLUME_MOUNTS %} | |||
- name: {{ volumeinfo['name'] }} | |||
hostPath: | |||
path: {{ volumeinfo['hostPath'] }} | |||
{% endfor %} | |||
{% endif %} |
@@ -0,0 +1,14 @@ | |||
# -*- coding: UTF-8 -*- | |||
import subprocess | |||
def copy(src,target,workdir): | |||
cmd = "cp -r {} {}".format(src,target) | |||
subprocess.check_call(cmd,shell=True,cwd=workdir) | |||
def rm(target,workdir): | |||
cmd = "rm -rf {}".format(target) | |||
subprocess.check_call(cmd,shell=True,cwd=workdir) |
@@ -0,0 +1,14 @@ | |||
# -*- coding: UTF-8 -*- | |||
import subprocess | |||
def build(tag,workdir): | |||
cmd = "docker build -t {} ./".format(tag) | |||
subprocess.check_call(cmd,shell=True,cwd=workdir) | |||
def push(tag,workdir): | |||
cmd = "docker push {}".format(tag) | |||
subprocess.check_call(cmd,shell=True,cwd=workdir) | |||
@@ -0,0 +1,19 @@ | |||
# -*- coding: UTF-8 -*- | |||
import subprocess | |||
def isDaemonsetExist(name): | |||
cmd = "kubectl get daemonset" | |||
output = subprocess.check_output(cmd,shell=True) | |||
return output.find(name) > -1 | |||
def removeDaemonset(name): | |||
if isDaemonsetExist(name): | |||
cmd = "kubectl delete daemonset {}".format(name) | |||
subprocess.check_call(cmd,shell=True) | |||
def deploy(yaml,workdir): | |||
cmd = "kubectl create -f {}".format(yaml) | |||
subprocess.check_call(cmd,shell=True,cwd=workdir) |
@@ -0,0 +1,29 @@ | |||
# -*- coding: UTF-8 -*- | |||
import os | |||
from jinja2 import Template | |||
import socket | |||
import yaml | |||
def getHostIp(): | |||
try: | |||
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) | |||
s.connect(('8.8.8.8', 80)) | |||
ip = s.getsockname()[0] | |||
finally: | |||
s.close() | |||
return ip | |||
def load(env,config_template_path): | |||
file = env + ".yaml" | |||
path = os.path.join(config_template_path,file) | |||
temp = open(path,"r").read() | |||
return yaml.load(temp) | |||
@@ -0,0 +1,89 @@ | |||
# 简介 | |||
openi章鱼filebeat + elasticsearch方案收集子任务对应的容器的日志 | |||
# 方案 | |||
1. 在每个节点上部署elasticsearch分布式文本存储服务,组成es集群 | |||
2. 配置es服务的ingress,通过网关服务访问es集群服务 | |||
3. 部署filebeat收集每个节点的容器日志,配置filebeat的日志接收后端为es集群 | |||
3. webportal访问es集群,给定容器ID参数,可以搜索出特定的容器日志 | |||
请求例子: | |||
POST http://$GatewayIP/es/_search | |||
``` | |||
{ | |||
method:'POST', | |||
body:{ | |||
query: { | |||
match:{ | |||
"log.file.path": "/var/lib/docker/containers/$containerId/$containerId-json.log" | |||
} | |||
}, | |||
size:pageSize, | |||
from:logIndex, | |||
sort: "log.offset" | |||
} | |||
``` | |||
# 镜像 | |||
1. es-$nodename-statefulset.yml => elastic/elasticsearch:7.1.0 | |||
2. filebeat-kubernetes.yaml => docker.elastic.co/beats/filebeat:7.1.0 | |||
# 前提 | |||
1.Kubernetes version >= 1.13 | |||
2.设置节点的主机名,本文档假设集群有两个节点 | |||
* 设置master节点的主机名 | |||
[root@host1 ~]# hostname xp001 | |||
* 设置第二个加入节点的主机名 | |||
[root@host1 ~]# hostname v001 | |||
* 重启kubelet | |||
[root@host1 ~]# systemctl restart kubelet | |||
# [部署](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) | |||
1. 节点准备用户组:用户和处理对外映射的es数据文件夹 | |||
``` | |||
# sudo su | |||
# mkdir /usr/share/elasticsearch | |||
# chmod 0775 /usr/share/elasticsearch | |||
# chown -R 1000:0 /usr/share/elasticsearch | |||
``` | |||
如果没有这个1000用户id,需要创建一个用户 | |||
``` | |||
# adduser -u 1000 -G 0 -d /usr/share/elasticsearch elasticsearch | |||
# chown -R 1000:0 /usr/share/elasticsearch | |||
``` | |||
2. cd openi | |||
3. kubectl apply -f ./efk |
@@ -0,0 +1,19 @@ | |||
apiVersion: v1 | |||
kind: Service | |||
metadata: | |||
name: es-external-service | |||
namespace: kube-system | |||
labels: | |||
k8s-app: elasticsearch-logging | |||
spec: | |||
ports: | |||
- name: es-db | |||
port: 9200 | |||