#1899 数据集上传速度优化

Closed
created 2 years ago by lewis · 8 comments
lewis commented 2 years ago
目前数据集上传分为两大步:加载数据集和上传数据集。 加载数据集这步主要是在计算文件的MD5,当文件很大时,会很耗时。 考虑使用其他方式来唯一标识数据集文件,比如通过文件大小、文件名、最后修改时间等生成hash值
lewis added the
enhancement
label 2 years ago
tanglj added this to the V20220428 milestone 2 years ago
lewis was assigned by tanglj 2 years ago
lewis commented 2 years ago
Owner
目前计算MD5是通过计算文件所有内容来得到的,可以考虑改用计算所有分片的第一个M来计算。
lewis commented 2 years ago
Owner
已解决,可测试。
lewis added the
test
label 2 years ago
wangj was assigned by lewis 2 years ago
lewis commented 2 years ago
Owner
> 目前计算MD5是通过计算文件所有内容来得到的,可以考虑改用计算所有分片的第一个M来计算。 采用此方式做了优化,优化后,加载文件这个步骤会快很多。
lewis commented 2 years ago
Owner
由于优化后无法计算出文件的真实MD5,故将数据集列表中复制MD5按钮删除。
lewis removed the
test
label 2 years ago
wangj was unassigned by lewis 2 years ago
lewis added the
test
label 2 years ago
wangj was assigned by lewis 2 years ago
wangj commented 2 years ago
Owner
发现一个奇怪的现象:往云脑1上传zip格式文件成功后,浏览器可以下载,迅雷无法下载(云脑2没这问题)。 生产环境,可以用迅雷下载。
wangj removed the
test
label 2 years ago
lewis commented 2 years ago
Owner
> 发现一个奇怪的现象:往云脑1上传zip格式文件成功后,浏览器可以下载,迅雷无法下载(云脑2没这问题)。 > 生产环境,可以用迅雷下载。 测试环境的下载地址是内网地址,迅雷是不能下载的。
lewis commented 2 years ago
Owner
这个改动还会产生一个现象:已有数据集文件A,再次上传文件A,会重新上传一份,不会出现秒传,因为MD5的计算方式不一样了。
wangj added the
test
label 2 years ago
wangj commented 2 years ago
Owner
MD5的计算时间变短了。 往云脑1上传了1个12GB的文件,MD5计算时间从3分钟缩短为6秒; 往云脑1上传了1个152GB的文件,MD5计算时间从36分钟缩短为1分钟; 最终上传成功,并可以下载到本地。 通过测试,关闭此单。
wangj closed this issue 2 years ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.