协同计算任务各阶段耗时统计标准化+预执行回归建模仿真方案实现

协同计算任务各阶段耗时统计标准化：

标准化一个协同计算任务各个重要时间点的统计规则，输出各个主要阶段的一些典型耗时数据

预执行回归建模仿真方案实现

用户的计算任务预执行若干round，统计每个round各阶段的真实耗时
结合每个阶段的回归模型计算整个任务总耗时、各种阶段耗时占比等任务效率

5-20：DistIR仿真与Colossal-AI真实训练效率对比
（1）DistIR仿真MLP-small（4GPU，数据并行）

（2）Colossal-AI真实执行MLP-small(4GPU,数据并行,只包括了forward和backward)

Colossal-AI Benchmark（MLP-small）

（3）Colossal-AI GPT2真实训练效率（4GPU，数据并行）

对比分析：

MLP-small模型真实执行效率随batch size变化规律与DistIR仿真结果不一致，初步分析是仿真方法DistIR的问题，正在分析定位；
Colossal-AI在4GPU上随着并行维度增加，速度降低，但是能够支持更大的batchsize

图片.png

39 KiB

图片.png

221 KiB

图片.png

100 KiB

图片.png

93 KiB

图片.png

26 KiB

Colossal-AI框架模型训练效率仿真方法（基于事件时序建模的方法）主要步骤：
1、单机模拟多机多卡的分布式进程环境及模型加载；（已完成）
2、根据加载的模型获取每个进程的计算图（进行中，已尝试jit.fx/jit.script 方法）；
3、根据计算图表示检测事件，包括计算和通信；
4、建立事件时序模型计算整体效率。

学到了

Deleting a branch is permanent. It CANNOT be undone. Continue?

Dear OpenI User

Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.

For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》

#7 协同计算任务各阶段耗时统计标准化+预执行回归建模仿真方案实现