- b501ab904f update readme & output
leadovahs upload dataset pangu-alpha-evolution_small.7z
1 year ago
leadovahs pushed to master at leadovahs/pangu_evolution_distillation
1 year ago
leadovahs created repository leadovahs/pangu_evolution_distillation
1 year ago
leadovahs commented on issue PCL-Platform.Inte.../pcl_pangu#4
增强版模型预测sample不work按照pcl_pangu/docs/README_PGE.md中“预测”章节预测示例,仍然无法正常运行,有以下问题: (1)config = evolution.model_config_gpu(model='2B6', save='2B6/mode/path')中的save似乎应该是load (2)load无法正常读取 盘古增强模型 pangu-alpha-evolution_2.6b_fp16 (模型来自 https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-Alpha-Applications/src/branch/master/model/pangu_evolution https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-Alpha-Applications/datasets?type=0 ) 测试代码如下 ``` from pcl_pangu.context import set_context from pcl_pangu.model import evolution set_context(backend='pytorch') config = evolution.model_config_gpu(model='2B6', load='/workspace/models/pangu-alpha-evolution_2.6b_fp16') evolution.inference(config, input="文本分类:\n基本上可以说是诈骗\n选项:积极,消极\n答案:") ``` 经测试,该代码可以正常读取Pangu-alpha_2.6B_fp16_mgt模型(来自 https://git.openi.org.cn/PCL-Platform.Intelligence/PanGu-Alpha-GPU/src/branch/master/panguAlpha_pytorch/README.md )。 请问盘古增强版模型与原模型在结构上有何不同?为什么无法使用同一个代码读取?该问题如何解决?谢谢! 无法读取增强模型的详细报错信息(可复现) > --------------------------- inference config -------------------------- > > Base Model: [alpha] > > Model Size: [2B6] > > data_path: data > > global batch_size: 8 > > save to path: /workspace/models/pangu-alpha-evolution_2.6b_fp16 > ---------------------------- end of config ---------------------------- > 2022-08-03 08:21:17.764 | INFO | pcl_pangu.model.evolution.evolution:run_pt:86 - > Running /opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/tools/generate_samples_Pangu.py with args: ['--num-layers=31', '--hidden-size=2560', '--num-attention-heads=32', '--seq-length=1024', '--max-position-embeddings=1024', '--model-parallel-size=1', '--batch-size=1', '--train-iters=10000', '--lr-decay-iters=6400', '--save=/workspace/models/pangu-alpha-evolution_2.6b_fp16', '--load=/workspace/models/pangu-alpha-evolution_2.6b_fp16', '--data-path=data', '--vocab-file=/opt/conda/lib/python3.6/site-packages/pcl_pangu/tokenizer/bpe_4w_pcl/vocab', '--merge-file=gpt2-merges.txt', '--data-impl=mmap', '--split=949,50,1', '--distributed-backend=nccl', '--lr=0.00015', '--lr-decay-style=cosine', '--min-lr=1e-05', '--weight-decay=0.01', '--clip-grad=1.0', '--warmup=0.01', '--checkpoint-activations', '--log-interval=100', '--save-interval=1000', '--eval-interval=1000', '--eval-iters=10', '--fp16', '--finetune', '--tokenizer-type=GPT2BPETokenizer', '--top-k=1', '--top-p=0.9', '--sample-input=e69687e69cace58886e7b1bbefbc9a0ae59fbae69cace4b88ae58fafe4bba5e8afb4e698afe8af88e9aa970ae98089e9a1b9efbc9ae7a7afe69e81efbc8ce6b688e69e810ae7ad94e6a188efbc9a'] > 2022-08-03 08:21:17.764 | DEBUG | pcl_pangu.model.launcher_torch:launch:60 - Spawning process 0 with command: ['/opt/conda/bin/python', '-u', '/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/tools/generate_samples_Pangu.py', '--local_rank=0', '--num-layers=31', '--hidden-size=2560', '--num-attention-heads=32', '--seq-length=1024', '--max-position-embeddings=1024', '--model-parallel-size=1', '--batch-size=1', '--train-iters=10000', '--lr-decay-iters=6400', '--save=/workspace/models/pangu-alpha-evolution_2.6b_fp16', '--load=/workspace/models/pangu-alpha-evolution_2.6b_fp16', '--data-path=data', '--vocab-file=/opt/conda/lib/python3.6/site-packages/pcl_pangu/tokenizer/bpe_4w_pcl/vocab', '--merge-file=gpt2-merges.txt', '--data-impl=mmap', '--split=949,50,1', '--distributed-backend=nccl', '--lr=0.00015', '--lr-decay-style=cosine', '--min-lr=1e-05', '--weight-decay=0.01', '--clip-grad=1.0', '--warmup=0.01', '--checkpoint-activations', '--log-interval=100', '--save-interval=1000', '--eval-interval=1000', '--eval-iters=10', '--fp16', '--finetune', '--tokenizer-type=GPT2BPETokenizer', '--top-k=1', '--top-p=0.9', '--sample-input=e69687e69cace58886e7b1bbefbc9a0ae59fbae69cace4b88ae58fafe4bba5e8afb4e698afe8af88e9aa970ae98089e9a1b9efbc9ae7a7afe69e81efbc8ce6b688e69e810ae7ad94e6a188efbc9a'] > /opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch > using world size: 1 and model-parallel size: 1 > using torch.float16 for parameters ... > WARNING: overriding default arguments for tokenizer_type:GPT2BPETokenizer with tokenizer_type:GPT2BPETokenizer > -------------------- arguments -------------------- > adlr_autoresume ................. False > adlr_autoresume_interval ........ 1000 > apply_query_key_layer_scaling ... False > apply_residual_connection_post_layernorm False > attention_dropout ............... 0.1 > attention_softmax_in_fp32 ....... False > batch_size ...................... 1 > bert_load ....................... None > bias_dropout_fusion ............. False > bias_gelu_fusion ................ False > block_data_path ................. None > checkpoint_activations .......... True > checkpoint_num_layers ........... 1 > clip_grad ....................... 1.0 > data_impl ....................... mmap > data_path ....................... data > DDP_impl ........................ local > distribute_checkpointed_activations False > distributed_backend ............. nccl > dynamic_loss_scale .............. True > eod_mask_loss ................... False > eval_interval ................... 1000 > eval_iters ...................... 10 > exit_interval ................... None > faiss_use_gpu ................... False > finetune ........................ True > fp16 ............................ True > fp16_lm_cross_entropy ........... False > fp32_allreduce .................. False > genfile ......................... None > greedy .......................... False > hidden_dropout .................. 0.1 > hidden_size ..................... 2560 > hysteresis ...................... 2 > ict_head_size ................... None > ict_load ........................ None > indexer_batch_size .............. 128 > indexer_log_interval ............ 1000 > init_method_std ................. 0.02 > layernorm_epsilon ............... 1e-05 > lazy_mpu_init ................... None > load ............................ /workspace/models/pangu-alpha-evolution_2.6b_fp16 > local_rank ...................... 0 > log_interval .................... 100 > loss_scale ...................... None > loss_scale_window ............... 1000 > lr .............................. 0.00015 > lr_decay_iters .................. 6400 > lr_decay_style .................. cosine > make_vocab_size_divisible_by .... 1 > mask_prob ....................... 0.15 > max_position_embeddings ......... 1024 > merge_file ...................... gpt2-merges.txt > min_lr .......................... 1e-05 > min_scale ....................... 1 > mmap_warmup ..................... False > model_parallel_size ............. 1 > no_load_optim ................... False > no_load_rng ..................... False > no_save_optim ................... False > no_save_rng ..................... False > num_attention_heads ............. 32 > num_layers ...................... 31 > num_samples ..................... 0 > num_unique_layers ............... None > num_workers ..................... 2 > onnx_safe ....................... None > openai_gelu ..................... False > out_seq_length .................. 1024 > override_lr_scheduler ........... False > param_sharing_style ............. grouped > params_dtype .................... torch.float16 > query_in_block_prob ............. 0.1 > rank ............................ 0 > recompute ....................... False > report_topk_accuracies .......... [] > reset_attention_mask ............ False > reset_position_ids .............. False > sample_input .................... e69687e69cace58886e7b1bbefbc9a0ae59fbae69cace4b88ae58fafe4bba5e8afb4e698afe8af88e9aa970ae98089e9a1b9efbc9ae7a7afe69e81efbc8ce6b688e69e810ae7ad94e6a188efbc9a > sample_input_file ............... None > sample_output_file .............. None > save ............................ /workspace/models/pangu-alpha-evolution_2.6b_fp16 > save_interval ................... 1000 > scaled_upper_triang_masked_softmax_fusion False > seed ............................ 1234 > seq_length ...................... 1024 > short_seq_prob .................. 0.1 > split ........................... 949,50,1 > temperature ..................... 1.0 > tensorboard_dir ................. None > titles_data_path ................ None > tokenizer_type .................. GPT2BPETokenizer > top_k ........................... 1 > top_p ........................... 0.9 > train_iters ..................... 10000 > use_checkpoint_lr_scheduler ..... False > use_cpu_initialization .......... False > use_one_sent_docs ............... False > vocab_file ...................... /opt/conda/lib/python3.6/site-packages/pcl_pangu/tokenizer/bpe_4w_pcl/vocab > warmup .......................... 0.01 > weight_decay .................... 0.01 > world_size ...................... 1 > ---------------- end of arguments ---------------- > > building GPT2BPETokenizer tokenizer ... > > padded vocab (size: 40000) with 0 dummy tokens (new size: 40000) > > initializing torch distributed ... > > initializing model parallel with size 1 > > setting random seeds to 1234 ... > > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > building GPT2 model ... > > number of parameters on model parallel rank 0: 2625295360 > global rank 0 is loading checkpoint /workspace/models/pangu-alpha-evolution_2.6b_fp16/iter_0055000/mp_rank_00/model_optim_rng.pt > could not find arguments in the checkpoint ... > Traceback (most recent call last): > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/tools/generate_samples_Pangu.py", line 241, in <module> > main() > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/tools/generate_samples_Pangu.py", line 209, in main > _ = load_checkpoint(model, None, None) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/megatron/checkpointing.py", line 210, in load_checkpoint > model.load_state_dict(state_dict['model']) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/megatron/model/distributed.py", line 93, in load_state_dict > self.module.load_state_dict(state_dict, strict=strict) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/megatron/fp16/fp16.py", line 85, in load_state_dict > self.module.load_state_dict(state_dict, strict=strict) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/megatron/model/gpt2_model.py", line 145, in load_state_dict > self.language_model.load_state_dict(state_dict, strict=strict) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/megatron/model/language_model.py", line 635, in load_state_dict > self.topQueryEmbedding.load_state_dict(state_dict_, strict=strict) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/megatron/model/language_model.py", line 354, in load_state_dict > self.top_query_embeddings.load_state_dict(state_dict_, strict=strict) > File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict > self.__class__.__name__, "\n\t".join(error_msgs))) > RuntimeError: Error(s) in loading state_dict for Embedding: > Missing key(s) in state_dict: "weight". > Traceback (most recent call last): > File "/workspace/sdk-test/main.py", line 7, in <module> > evolution.inference(config, input="文本分类:\n基本上可以说是诈骗\n选项:积极,消极\n答案:") > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/evolution/evolution.py", line 74, in inference > run_pt(script_args, py_script) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/evolution/evolution.py", line 91, in run_pt > **kwargs) > File "/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/launcher_torch.py", line 69, in launch > cmd=cmd) > subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', '/opt/conda/lib/python3.6/site-packages/pcl_pangu/model/panguAlpha_pytorch/tools/generate_samples_Pangu.py', '--local_rank=0', '--num-layers=31', '--hidden-size=2560', '--num-attention-heads=32', '--seq-length=1024', '--max-position-embeddings=1024', '--model-parallel-size=1', '--batch-size=1', '--train-iters=10000', '--lr-decay-iters=6400', '--save=/workspace/models/pangu-alpha-evolution_2.6b_fp16', '--load=/workspace/models/pangu-alpha-evolution_2.6b_fp16', '--data-path=data', '--vocab-file=/opt/conda/lib/python3.6/site-packages/pcl_pangu/tokenizer/bpe_4w_pcl/vocab', '--merge-file=gpt2-merges.txt', '--data-impl=mmap', '--split=949,50,1', '--distributed-backend=nccl', '--lr=0.00015', '--lr-decay-style=cosine', '--min-lr=1e-05', '--weight-decay=0.01', '--clip-grad=1.0', '--warmup=0.01', '--checkpoint-activations', '--log-interval=100', '--save-interval=1000', '--eval-interval=1000', '--eval-iters=10', '--fp16', '--finetune', '--tokenizer-type=GPT2BPETokenizer', '--top-k=1', '--top-p=0.9', '--sample-input=e69687e69cace58886e7b1bbefbc9a0ae59fbae69cace4b88ae58fafe4bba5e8afb4e698afe8af88e9aa970ae98089e9a1b9efbc9ae7a7afe69e81efbc8ce6b688e69e810ae7ad94e6a188efbc9a']' returned non-zero exit status 1. > > Process finished with exit code 1 >
1 year ago
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》