You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am interested in this work. I want to try this algorithm to accelerate trainning procedure of NLP models. So I want to know if I can directly use this library on NLP models? Thanks!
The text was updated successfully, but these errors were encountered:
Yes, it should work without any problem. You can just follow the steps of wrapping the transformer into the Masking class and it should work just fine. What is happening in the background is that all weights in the module (and all its sub-modules) are multiplied with a binary mask before each forward pass.
If you apply this to transformers you should make sure though that you keep the layer norm parameters dense. You can achieve this by using the remove_type(torch.nn.LayerNorm) method fo the Masking class.
Yes, it should work without any problem. You can just follow the steps of wrapping the transformer into the Masking class and it should work just fine. What is happening in the background is that all weights in the module (and all its sub-modules) are multiplied with a binary mask before each forward pass.
If you apply this to transformers you should make sure though that you keep the layer norm parameters dense. You can achieve this by using the remove_type(torch.nn.LayerNorm) method fo the Masking class.
Hi, I am interested in this work. I want to try this algorithm to accelerate trainning procedure of NLP models. So I want to know if I can directly use this library on NLP models? Thanks!
The text was updated successfully, but these errors were encountered: