-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Mixtral BS #345
base: master
Are you sure you want to change the base?
Update Mixtral BS #345
Conversation
Higher batch_size is supposed to be faster given the same number of samples. How much slower is the current run? and how much faster is the run with batch_size = 128? |
It takes about ~12 min for 128 batch on bf16. Haven't tested for int8. Not sure why it's hitting timeout at 3 hours for int8 version. Let me try running int8 at 128. |
@@ -457,7 +457,7 @@ | |||
"quant_mode": W_INT8_KV_INT8, | |||
"quantization": "int8", | |||
"quantize_kvcache": "true", | |||
"per_device_batch_size": 258, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait! we have already quantized for inference on MaxText?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which implementation is this one? I don't think I enabled quantization for both matmul or megablox yet (I mean MoE block, other blocks are enabled). Or other implementation we are talking about here?
I think this configuration is using the old for |
Description
Update Mixtral batch size since 256 taking very long time.
Tests
Checklist
Before submitting this PR, please make sure (put X in square brackets):