#408 希望可以支持分时调度以提高集群资源利用率

Open
created 1 year ago by WuxinWang · 0 comments
WuxinWang commented 1 year ago
<!-- 为了更有效地识别与解决您的问题,请尽可能的补充如下信息 --> ### 问题描述 训练任务GPU环境只能调用单卡,对开源训练大模型有一些阻碍,且资源等待时间较长。 ### 相关环境(GPU/NPU) GPU ### 相关集群(启智/智算) ### 期望的解决方案或建议 建议参考幻方萤火集群的[分时调度策略](https://www.high-flyer.cn/blog/hfai/),设计一种适合我们协作平台的分时调度系统。方便大模型的开发和训练任务。
tanglj was assigned by zeizei 1 year ago
tanglj added the
need review
label 1 year ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.