交叉熵损失函数的理解问题

问题所属任务编号（Task 01-05）

Task 03

运行环境（操作系统版本、Python 版本）/ 非程序问题

非程序问题

完整的报错信息（截图或复制均可）/ 问题详细描述

H（p,q）表示交叉熵，Dkl(p||q)表示kl散度，H(p,q)=H(p)+Dkl(p||q)，Kl散度当q=p的时候可以到0，而交叉熵最小智能是h(p),这里去优化交叉熵或是kl散度是不是没有什么区别呀？而且能优化到0不是更好吗？请助教老师们帮忙解答一下为什么优化交叉熵损失函数是好的选择。

描述你期望看到的结果

详细信息可以参考 stackexchange 上的问题：https://stats.stackexchange.com/questions/357963/what-is-the-difference-between-cross-entropy-and-kl-divergence?newreg=f9c50acac96b4b019c10abfc4e0a6b95
分类模型学习的目标是希望学习P_{truth}(y|x) := P(truth)，但是真实分布是未知的，我们只能通过数据集$P(D)$来近似真实分布，因此我们有P(model)≈P(D)≈P(truth)。同时在更新梯度的时候我们并没有用到整个数据集，而是使用minibatch的方法，因此在工程中交叉熵会比KL散度更加稳定。（当然我好像看t-SNE或者saliency任务也会用KL散度作为损失函数。

Deleting a branch is permanent. It CANNOT be undone. Continue?

Dear OpenI User

Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.

For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》

#37 交叉熵损失函数的理解问题

问题所属任务编号（Task 01-05）

运行环境（操作系统版本、Python 版本）/ 非程序问题

完整的报错信息（截图或复制均可）/ 问题详细描述

描述你期望看到的结果