cftang
Loading Heatmap…

cftang created GCU type debugging task cftan202401042072650(deleted)

3 months ago

cftang created GCU type debugging task cftan202401042072650(deleted)

3 months ago

cftang created repository cftang/GCU_PaddlePaddle_ModelZoo

3 months ago

cftang created CPU/GPU type debugging task cftan202310281784281(deleted)

5 months ago

cftang created repository cftang/openi-notebook

5 months ago

cftang commented on issue zeizei/OpenI_Learning#1015

GCU打榜热身活动(4.10~4.16)——完成任务可奖励20-40积分

任务1(pythorch): bert_base 场景 卡 epochs train_batch_size predict_batch_size train_fps_mean exact_match f1 线性度 a 1 1 48 48 67.94706955 79.61210974 87.08313104 b 1 10 48 48 68.29758169 79.02554399 87.23861865 c 8 1 48 48 64.43498743 78.94985809 86.47563451 0.9483115 d 8 10 48 48 63.775345 78.11731315 86.64708563 0.933786284 e 8 100 24 24 23.92116933 77.61589404 85.68731233 根据场景c/a计算线性度=0.9483115(64.43498743/67.94706955) 根据场景d/b计算线性度=0.933786284(68.29758169/63.775345) 问题:单卡,100 epochs,train_batch_size=96时直接报line 46: 42 Illegal instruction (core dumped),可能内存不够了,最好有报错信息返回 ./TopsRider_t2x_2.1.52_samples/samples/model/torch/single_card/run_pytorch_bert_base_convergence_test.sh: line 46: 42 Illegal instruction (core dumped) python3 -u ./run_squad.py --device=dtu --do_train --do_predict --do_eval --train_batch_size=96 --predict_batch_size=96 --learning_rate=3e-5 --num_train_epochs=100 --max_steps=-1 --max_seq_length=384 --doc_stride=128 --do_lower_case --bert_model=bert-base-uncased --print_freq=20 --skip_steps=5 --init_checkpoint=${DATASET_DIR}/pytorch_bert_base/bert_base_init/bert_base.pt --train_file=${DATASET_DIR}/pytorch_bert_base/squad/v1.1/train-v1.1.json --predict_file=${DATASET_DIR}/pytorch_bert_base/squad/v1.1/dev-v1.1.json --vocab_file=${DATASET_DIR}/pytorch_bert_base/bert_base_init/vocab.txt --config_file=${DATASET_DIR}/pytorch_bert_base/bert_base_init/bert_config.json --eval_script=${DATASET_DIR}/pytorch_bert_base/squad/v1.1/evaluate-v1.1.py --output_dir=./output > ${LOG_FILE} 2>&1

1 year ago

cftang commented on issue zeizei/OpenI_Learning#1015

GCU打榜热身活动(4.10~4.16)——完成任务可奖励20-40积分

任务二:基于PaddlePaddle + GCU跑通模型并测试GCU性能 resnet50 1.GCU单卡或8卡至少支持1个模型 1.1 单卡 epoch=1 ![image](/attachments/5e747962-eead-422f-825b-283bc1297fda) 1.2 8卡 epoch=1 ![image](/attachments/c80f4aec-945d-43d6-a10e-07a22a90ec6c) 2.统计GCU单卡/8卡线性度: 8卡FPS/(单卡FPS*8) 8卡 "train_fps_mean": 140.81122839865833, 单卡 "train_fps_mean": 174.87976037517075, 线性度 = 0.80518882286077608658489404384979

1 year ago

cftang pushed to master at cftang/ResNet18-Pytorch

1 year ago

cftang created CPU/GPU type debugging task cftan202208141045914(deleted)

1 year ago

cftang created CPU/GPU type debugging task cftan202208141045914(deleted)

1 year ago

cftang created CPU/GPU type debugging task cftan202208112022761(deleted)

1 year ago

cftang created CPU/GPU type debugging task cftan202208112022609(deleted)

1 year ago

cftang created repository cftang/Paddle

1 year ago

cftang created CPU/GPU type debugging task cftan202208071351994(deleted)

1 year ago

cftang created repository cftang/modelbox

1 year ago

cftang created reasoning task mnist_inference

1 year ago

cftang created new model MNIST

1 year ago

cftang created NPU training task cftan202207021132873

1 year ago

cftang created NPU training task cftan202207021132242

1 year ago

cftang created repository cftang/MNIST_Example_Mindspore_NPU

1 year ago