You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
xju_gwz a8c90d4252 更新 'speech_transformer/defalut_config.yaml' 1 year ago
speech_transformer 更新 'speech_transformer/defalut_config.yaml' 1 year ago
README.md 添加 'README.md' 1 year ago

模型整体框架采用的即是transformer的encoder-decoder形式,主要有3点改进:1)降低帧率,缩短声学特征的时序长度,在大规模语音数据训练时提升计算效率;2)decoder输入采样策略,在训练decoder时,以一定的采样概率决定该时刻decoder输入是否采用前一时刻的预测输出;3)Focal Loss,在计算loss时,对于分类概率大的样本进行降权,对分类概率小的样本进行升权,这样会使得模型更加关注被误分类的hard样本。

Python Shell

Contributors (3)