cftang

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

cftang

Joined on May 06, 2022
Organization
- PengChengMind
- 76
- 5

Loading Heatmap…

cftang created GCU type debugging task cftan202401042072650(deleted)

3 months ago

cftang created GCU type debugging task cftan202401042072650(deleted)

3 months ago

cftang created repository cftang/GCU_PaddlePaddle_ModelZoo

3 months ago

cftang created CPU/GPU type debugging task cftan202310281784281(deleted)

5 months ago

cftang created repository cftang/openi-notebook

5 months ago

cftang commented on issue zeizei/OpenI_Learning#1015

GCU打榜热身活动(4.10~4.16)——完成任务可奖励20-40积分

任务1(pythorch)： bert_base 场景卡 epochs train_batch_size predict_batch_size train_fps_mean exact_match f1 线性度 a 1 1 48 48 67.94706955 79.61210974 87.08313104 b 1 10 48 48 68.29758169 79.02554399 87.23861865 c 8 1 48 48 64.43498743 78.94985809 86.47563451 0.9483115 d 8 10 48 48 63.775345 78.11731315 86.64708563 0.933786284 e 8 100 24 24 23.92116933 77.61589404 85.68731233 根据场景c/a计算线性度=0.9483115（64.43498743/67.94706955）根据场景d/b计算线性度=0.933786284（68.29758169/63.775345）问题：单卡，100 epochs,train_batch_size=96时直接报line 46: 42 Illegal instruction (core dumped)，可能内存不够了，最好有报错信息返回 ./TopsRider_t2x_2.1.52_samples/samples/model/torch/single_card/run_pytorch_bert_base_convergence_test.sh: line 46: 42 Illegal instruction (core dumped) python3 -u ./run_squad.py --device=dtu --do_train --do_predict --do_eval --train_batch_size=96 --predict_batch_size=96 --learning_rate=3e-5 --num_train_epochs=100 --max_steps=-1 --max_seq_length=384 --doc_stride=128 --do_lower_case --bert_model=bert-base-uncased --print_freq=20 --skip_steps=5 --init_checkpoint=${DATASET_DIR}/pytorch_bert_base/bert_base_init/bert_base.pt --train_file=${DATASET_DIR}/pytorch_bert_base/squad/v1.1/train-v1.1.json --predict_file=${DATASET_DIR}/pytorch_bert_base/squad/v1.1/dev-v1.1.json --vocab_file=${DATASET_DIR}/pytorch_bert_base/bert_base_init/vocab.txt --config_file=${DATASET_DIR}/pytorch_bert_base/bert_base_init/bert_config.json --eval_script=${DATASET_DIR}/pytorch_bert_base/squad/v1.1/evaluate-v1.1.py --output_dir=./output > ${LOG_FILE} 2>&1

1 year ago

cftang commented on issue zeizei/OpenI_Learning#1015

GCU打榜热身活动(4.10~4.16)——完成任务可奖励20-40积分

任务二：基于PaddlePaddle + GCU跑通模型并测试GCU性能 resnet50 1.GCU单卡或8卡至少支持1个模型 1.1 单卡 epoch=1 ![image](/attachments/5e747962-eead-422f-825b-283bc1297fda) 1.2 8卡 epoch=1 ![image](/attachments/c80f4aec-945d-43d6-a10e-07a22a90ec6c) 2.统计GCU单卡/8卡线性度: 8卡FPS/(单卡FPS*8) 8卡 "train_fps_mean": 140.81122839865833, 单卡 "train_fps_mean": 174.87976037517075, 线性度 = 0.80518882286077608658489404384979

1 year ago

cftang pushed to master at cftang/ResNet18-Pytorch