#5358 导入外部模型 - 迁移大文件途中卡死,模型迁移暂停

Open
created 4 weeks ago by liwei03 · 5 comments
liwei03 added this to the V20240402 milestone 4 weeks ago
liwei03 added the
bug
label 4 weeks ago
chenzh was assigned by liwei03 4 weeks ago
chenzh added this to the hf_model branch 4 weeks ago
chenzh commented 4 weeks ago
Owner
hf_model分支已更新代码
liwei03 was assigned by chenzh 4 weeks ago
chenzh added the
test
label 4 weeks ago
liwei03 commented 2 weeks ago
Owner
4.9到4.10,迁移大模型时,发现迁移服务暂停迁移2次
chenzh commented 2 weeks ago
Owner
经排查,发现如下问题: 1、服务未暂停,重新发送文件迁移请求也能继续迁移 2、hf sdk以及下载文件请求存在内存泄漏问题,导致下载进程卡死 解决方案,对迁移服务做如下修改: 1、将下载请求的读取rate改成了1mb,之前是5mb 2、hf sdk存在内存泄漏,替换成requests库 3、开启新的下载请求时时强行回收垃圾内存
liwei03 commented 2 weeks ago
Owner
4.10 上午改一版迁移服务的代码,解决了迁移服务内存泄漏的问题,继续迁移大文件。但是下午大文件迁移似乎又停住了,并发5个文件,目前只剩一个文件以几百K的速度在下载
chenzh commented 2 weeks ago
Owner
4.12更新:发现是数据源不稳定造成的连接中文件流传输中断,目前能捕捉到这个错误并且关闭线程,不会影响到服务状态以及后续等待排队的迁移文件。
liwei03 modified the milestone from V20240402 to V20240423 1 week ago
liwei03 modified the milestone from V20240423 to V20240423.patch 4 days ago
liwei03 modified the milestone from V20240423.patch to V20240520 4 days ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.