#5 添加模型参数稀疏化算法FedDropout,用于通信优化

Closed
created 1 year ago by zhangyh02 · 6 comments
zhangyh02 commented 1 year ago

添加FedDropout测试样例与测试结果

添加FedDropout测试样例与测试结果
zhangyh02 added the
enhancement
label 1 year ago
jiaqi was assigned by zhangyh02 1 year ago
jiaqi commented 1 year ago
Collaborator

添加参考样例:
AISynergy/examples/Fed-MAE
增加FedDropout联邦策略,在
strategy = AISyncore.server.strategy.FedDropout(
...
)
FedDropout: 对Server大模型进行随机Dropout分发给CLient端进行训练

待完成:
相关文档说明及实验测试结果

添加参考样例: AISynergy/examples/Fed-MAE 增加FedDropout联邦策略,在 strategy = AISyncore.server.strategy.FedDropout( ... ) FedDropout: 对Server大模型进行随机Dropout分发给CLient端进行训练 待完成: 相关文档说明及实验测试结果
jiaqi commented 1 year ago
Collaborator
[MAE with FedDropout Readme](https://git.openi.org.cn/PCL-Platform.Intelligence/AISynergy/src/branch/fedDropout/examples/Fed-MAE/mae)
jiaqi commented 1 year ago
Collaborator
Fed-Dropout测试实验:
实验设置:

Data Size:Client1: Training: 1750 Testing: 750; Client2: Training: 1712 Testing: 734
Dropout Rate: 20%
Model:MAE-base

实验结果

Figure 1: Fed-Drop实验结果,train loss为Client端使用Dropout后小模型训练每个epoch结果,test loss为Client利用本地数据对合并后完整Server模型进行测试结果(400 Epoch)

Fed-Dropout.png

Figure 2:不使用Fed-Dropout聚合方式,Fed-Avg聚合的baseline对比结果(250 Epoch)

Result of Fed-Avg

结果分析

Fed-Dropout聚合过程中,Client端训练精度稳定收敛,但Server端模型loss随着训练轮数的增加先降低后增长,并最终收敛至0.6左右,相比使用Fed-Avg策略的对比结果大幅下降,且收敛速度变慢。

改进计划
  1. 尝试间隔一定轮数在Client端进行一次完整模型的更新同步;
  2. 尝试不同的参数筛选策略,e.g. Head Importance;
  3. 尝试从流水线并行等其他维度讨论参数稀疏化策略。
##### Fed-Dropout测试实验: ###### 实验设置: Data Size:Client1: Training: 1750 Testing: 750; Client2: Training: 1712 Testing: 734 Dropout Rate: 20% Model:MAE-base ###### 实验结果 Figure 1: Fed-Drop实验结果,train loss为Client端使用Dropout后小模型训练每个epoch结果,test loss为Client利用本地数据对合并后完整Server模型进行测试结果(400 Epoch) ![Fed-Dropout.png](https://git.openi.org.cn/PCL-Platform.Intelligence/AISynergy/src/branch/fedDropout/examples/Fed-MAE/mae/output/Fed-Dropout.png) Figure 2:不使用Fed-Dropout聚合方式,Fed-Avg聚合的baseline对比结果(250 Epoch) ![Result of Fed-Avg](https://git.openi.org.cn/PCL-Platform.Intelligence/AISynergy/src/branch/fedDropout/examples/Fed-MAE/mae/output/Fed-Avg.png) ###### 结果分析 Fed-Dropout聚合过程中,Client端训练精度稳定收敛,但Server端模型loss随着训练轮数的增加先降低后增长,并最终收敛至0.6左右,相比使用Fed-Avg策略的对比结果大幅下降,且收敛速度变慢。 ###### 改进计划 1. 尝试间隔一定轮数在Client端进行一次完整模型的更新同步; 2. 尝试不同的参数筛选策略,e.g. Head Importance; 3. 尝试从流水线并行等其他维度讨论参数稀疏化策略。
jiaqi commented 11 months ago
Collaborator
Fed-Dropout间隔轮数更新:
改进策略:

间隔一定轮数发送Server端全部数据给Client进行一轮训练,同步Clients参数更新方向。

实验结果:

Client 1:
Result-1

Client 2:
Result-2

结果分析

采用间隔轮数更新全局模型的方法可以提高Client端模型收敛精度,但收敛过程模型精度波动较大,且与Client本地数据分布有关。

##### Fed-Dropout间隔轮数更新: ###### 改进策略: 间隔一定轮数发送Server端全部数据给Client进行一轮训练,同步Clients参数更新方向。 ###### 实验结果: Client 1: ![Result-1](https://git.openi.org.cn/PCL-Platform.Intelligence/AISynergy/src/branch/fedDropout/examples/Fed-MAE/mae/output/server-log-1.png) Client 2: ![Result-2](https://git.openi.org.cn/PCL-Platform.Intelligence/AISynergy/src/branch/fedDropout/examples/Fed-MAE/mae/output/server-log-2.png) ###### 结果分析 采用间隔轮数更新全局模型的方法可以提高Client端模型收敛精度,但收敛过程模型精度波动较大,且与Client本地数据分布有关。
jiaqi commented 11 months ago
Collaborator
Head-Importance重要性剪枝:
算法思路:

为避免随机选取子网络导致的收敛问题,采用计算网络头结构重要性的方法,评估模型结构分数,选取重要部分进行传输更新,以提高收敛速度和性能效率。

实验结果:

在Bert模型上实验根据重要性排序对头结构进行剪枝的效果

Total acc: 0.835048395313296
18:05:19-INFO: Evaluating following pruning strategy
18:05:19-INFO: 12:12,8 11:2,6,7 8:8 6:2
18:06:22-INFO: ***** Running evaluation *****
18:06:22-INFO:   Num examples = 9815
18:06:22-INFO:   Batch size = 32
18:06:22-INFO: ***** Pruning eval results *****
18:06:22-INFO: 7        0.834538970962812
18:06:22-INFO: Evaluating following pruning strategy
18:06:22-INFO: 12:4,12,6,8 11:2,6,7 8:12,8 6:2 7:4 10:10,2 9:7
18:07:24-INFO: ***** Pruning eval results *****
18:07:24-INFO: 14       0.8370860927152318
18:08:30-INFO: ***** Pruning eval results *****
18:08:30-INFO: 21       0.8351502801833928
18:09:42-INFO: ***** Pruning eval results *****
18:09:42-INFO: 28       0.8366785532348446
18:10:46-INFO: ***** Pruning eval results *****
18:10:46-INFO: 36       0.8336220071319409
18:11:49-INFO: ***** Pruning eval results *****
18:11:49-INFO: 43       0.8313805399898115
18:12:52-INFO: ***** Pruning eval results *****
18:12:52-INFO: 50       0.8299541518084564
18:13:56-INFO: ***** Pruning eval results *****
18:13:56-INFO: 57       0.8254712175241976
18:15:00-INFO: ***** Pruning eval results *****
18:15:00-INFO: 64       0.8249617931737137
18:16:03-INFO: ***** Pruning eval results *****
18:16:03-INFO: 72       0.8196637799286806
18:17:07-INFO: ***** Pruning eval results *****
18:17:07-INFO: 79       0.8053998981151299
18:18:11-INFO: ***** Pruning eval results *****
18:18:11-INFO: 86       0.7944982170147733
18:19:15-INFO: ***** Pruning eval results *****
18:19:15-INFO: 93       0.7757514009169638
18:20:20-INFO: ***** Pruning eval results *****
18:20:20-INFO: 100      0.7506877228731533
18:21:24-INFO: ***** Pruning eval results *****
18:21:24-INFO: 108      0.6330106979113601
18:22:28-INFO: ***** Pruning eval results *****
18:22:28-INFO: 115      0.5125827814569537
18:23:33-INFO: ***** Pruning eval results *****
18:23:33-INFO: 122      0.5014773306164034
18:24:37-INFO: ***** Pruning eval results *****
18:24:37-INFO: 129      0.3681100356597045
18:25:42-INFO: ***** Pruning eval results *****
18:25:42-INFO: 136      0.36566479877738156
18:26:46-INFO: ***** Pruning eval results *****
18:26:46-INFO: 144      0.31818644931227713

剪枝比例与精度损失结果:

##### Head-Importance重要性剪枝: ###### 算法思路: 为避免随机选取子网络导致的收敛问题,采用计算网络头结构重要性的方法,评估模型结构分数,选取重要部分进行传输更新,以提高收敛速度和性能效率。 ###### 实验结果: 在Bert模型上实验根据重要性排序对头结构进行剪枝的效果 ``` Total acc: 0.835048395313296 18:05:19-INFO: Evaluating following pruning strategy 18:05:19-INFO: 12:12,8 11:2,6,7 8:8 6:2 18:06:22-INFO: ***** Running evaluation ***** 18:06:22-INFO: Num examples = 9815 18:06:22-INFO: Batch size = 32 18:06:22-INFO: ***** Pruning eval results ***** 18:06:22-INFO: 7 0.834538970962812 18:06:22-INFO: Evaluating following pruning strategy 18:06:22-INFO: 12:4,12,6,8 11:2,6,7 8:12,8 6:2 7:4 10:10,2 9:7 18:07:24-INFO: ***** Pruning eval results ***** 18:07:24-INFO: 14 0.8370860927152318 18:08:30-INFO: ***** Pruning eval results ***** 18:08:30-INFO: 21 0.8351502801833928 18:09:42-INFO: ***** Pruning eval results ***** 18:09:42-INFO: 28 0.8366785532348446 18:10:46-INFO: ***** Pruning eval results ***** 18:10:46-INFO: 36 0.8336220071319409 18:11:49-INFO: ***** Pruning eval results ***** 18:11:49-INFO: 43 0.8313805399898115 18:12:52-INFO: ***** Pruning eval results ***** 18:12:52-INFO: 50 0.8299541518084564 18:13:56-INFO: ***** Pruning eval results ***** 18:13:56-INFO: 57 0.8254712175241976 18:15:00-INFO: ***** Pruning eval results ***** 18:15:00-INFO: 64 0.8249617931737137 18:16:03-INFO: ***** Pruning eval results ***** 18:16:03-INFO: 72 0.8196637799286806 18:17:07-INFO: ***** Pruning eval results ***** 18:17:07-INFO: 79 0.8053998981151299 18:18:11-INFO: ***** Pruning eval results ***** 18:18:11-INFO: 86 0.7944982170147733 18:19:15-INFO: ***** Pruning eval results ***** 18:19:15-INFO: 93 0.7757514009169638 18:20:20-INFO: ***** Pruning eval results ***** 18:20:20-INFO: 100 0.7506877228731533 18:21:24-INFO: ***** Pruning eval results ***** 18:21:24-INFO: 108 0.6330106979113601 18:22:28-INFO: ***** Pruning eval results ***** 18:22:28-INFO: 115 0.5125827814569537 18:23:33-INFO: ***** Pruning eval results ***** 18:23:33-INFO: 122 0.5014773306164034 18:24:37-INFO: ***** Pruning eval results ***** 18:24:37-INFO: 129 0.3681100356597045 18:25:42-INFO: ***** Pruning eval results ***** 18:25:42-INFO: 136 0.36566479877738156 18:26:46-INFO: ***** Pruning eval results ***** 18:26:46-INFO: 144 0.31818644931227713 ``` 剪枝比例与精度损失结果:
jiaqi closed this issue 11 months ago
jiaqi commented 11 months ago
Collaborator

后续相关进展内容转移至issue 21

后续相关进展内容转移至[issue 21](https://git.openi.org.cn/PCL-Platform.Intelligence/AISynergy/issues/21)
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.