-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline MLPerf Benchmarking #483
base: master
Are you sure you want to change the base?
Conversation
Fix. . . . Working except for metrics upload. Working Formatting fixes.
057879c
to
dc9ace7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - looks great. Left some comments.
run_model_cmds = ( | ||
"source .env/bin/activate", | ||
"cd maxtext/MaxText/inference_mlperf/trillium", | ||
"gsutil cp gs://cloud-tpu-inference-public/mlcommons/inference/language/llama2-70b/data/processed-openorca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl /tmp/processed-data.pkl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to set_up_cmds after installing loadgen
# Setup MaxText | ||
git_clone_maxtext, | ||
f"cd maxtext && bash setup.sh MODE={test_mode.value} && cd ..", | ||
"pip install torch --index-url https://download.pytorch.org/whl/cpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add pip install -r maxtext/MaxText/inference_mlperf/requirments.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already done within MaxText's bash setup.sh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - these are inference specific packages hence in inference_mlperf dir - e.g. nltk for accuracy script. setup installs the requirements in maxtext dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see.
# Setup MaxText | ||
git_clone_maxtext, | ||
f"cd maxtext && bash setup.sh MODE={test_mode.value} && cd ..", | ||
"pip install torch --index-url https://download.pytorch.org/whl/cpu", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not needed to install torch separately on my runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was being used in other MaxText-based tests. I will see what happens with it removed.
"cd inference/loadgen && pip install . && cd ../..", | ||
) | ||
|
||
run_model_cmds = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also dump following info as part of logs: maxtext commit id, jax and libtpu versions.
Description
Adds offline MLPerf benchmarking to run Llama2-70B v6e-8. Requires the changes in related MaxText PR to be merged in order to work correctly.
Tests
Ran e2e using the dev environment. Confirmed completion and correct uploading of metrics after the run.
Instruction and/or command lines to reproduce your tests:
Running the
maxtext_inference_offline_benchmark
DAG in the dev environment.List links for your tests (use go/shortn-gen for any internal link):
Test history in dev environment: http://shortn/_QYByqVphuT
Checklist
Before submitting this PR, please make sure (put X in square brackets):