New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Offline MLPerf Benchmarking #483

Open

patemotter wants to merge 2 commits into master from patemotter_maxtext_offline

Collaborator

patemotter commented Nov 21, 2024 •

edited

Loading

Description

Adds offline MLPerf benchmarking to run Llama2-70B v6e-8. Requires the changes in related MaxText PR to be merged in order to work correctly.

Tests

Ran e2e using the dev environment. Confirmed completion and correct uploading of metrics after the run.

Instruction and/or command lines to reproduce your tests:

Running the maxtext_inference_offline_benchmark DAG in the dev environment.

List links for your tests (use go/shortn-gen for any internal link):

Test history in dev environment: http://shortn/_QYByqVphuT

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run one-shot tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

patemotter assigned patemotter, RissyRan, vipannalla and singh-mitali

patemotter requested review from yeandy, vipannalla, morgandu, mailvijayasingh, sixiang-google, JoeZijunZhou, singh-mitali, mbzomowski, RissyRan and allenwang28 as code owners

November 21, 2024 21:59


          Adds offline MLPerf benchmarks

dc9ace7

Fix.

.

.

.

Working except for metrics upload.

Working

Formatting fixes.

patemotter force-pushed the patemotter_maxtext_offline branch from 057879c to dc9ace7 Compare

November 21, 2024 22:02


          Merge branch 'master' into patemotter_maxtext_offline

554b849

singh-mitali reviewed

View reviewed changes

Collaborator

singh-mitali left a comment

Thanks - looks great. Left some comments.

dags/inference/maxtext_inference_offline_benchmark.py Show resolved Hide resolved

dags/inference/maxtext_inference_offline_benchmark.py

+                run_model_cmds = (
+                    "source .env/bin/activate",
+                    "cd maxtext/MaxText/inference_mlperf/trillium",
+                    "gsutil cp gs://cloud-tpu-inference-public/mlcommons/inference/language/llama2-70b/data/processed-openorca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl /tmp/processed-data.pkl",

Collaborator

singh-mitali Nov 21, 2024

Move this to set_up_cmds after installing loadgen

dags/inference/maxtext_inference_offline_benchmark.py

+                    # Setup MaxText
+                    git_clone_maxtext,
+                    f"cd maxtext && bash setup.sh MODE={test_mode.value} && cd ..",
+                    "pip install torch --index-url https://download.pytorch.org/whl/cpu",

Collaborator

singh-mitali Nov 21, 2024

Also add pip install -r maxtext/MaxText/inference_mlperf/requirments.txt

Collaborator Author

patemotter Nov 21, 2024

This is already done within MaxText's bash setup.sh.

Collaborator

singh-mitali Nov 21, 2024 •

edited

Loading

No - these are inference specific packages hence in inference_mlperf dir - e.g. nltk for accuracy script. setup installs the requirements in maxtext dir.

Collaborator Author

patemotter Nov 21, 2024 •

edited

Loading

Ah I see.

dags/inference/maxtext_inference_offline_benchmark.py

+                    # Setup MaxText
+                    git_clone_maxtext,
+                    f"cd maxtext && bash setup.sh MODE={test_mode.value} && cd ..",
+                    "pip install torch --index-url https://download.pytorch.org/whl/cpu",

Collaborator

singh-mitali Nov 21, 2024

I have not needed to install torch separately on my runs.

Collaborator Author

patemotter Nov 21, 2024

This was being used in other MaxText-based tests. I will see what happens with it removed.

dags/inference/maxtext_inference_offline_benchmark.py

+                    "cd inference/loadgen && pip install . && cd ../..",
+                )
+                run_model_cmds = (

Collaborator

singh-mitali Nov 21, 2024

Can we also dump following info as part of logs: maxtext commit id, jax and libtpu versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

singh-mitali singh-mitali left review comments

yeandy Awaiting requested review from yeandy yeandy is a code owner

vipannalla Awaiting requested review from vipannalla vipannalla is a code owner

morgandu Awaiting requested review from morgandu morgandu is a code owner

mailvijayasingh Awaiting requested review from mailvijayasingh mailvijayasingh is a code owner

sixiang-google Awaiting requested review from sixiang-google sixiang-google is a code owner

JoeZijunZhou Awaiting requested review from JoeZijunZhou JoeZijunZhou is a code owner

mbzomowski Awaiting requested review from mbzomowski mbzomowski is a code owner

RissyRan Awaiting requested review from RissyRan RissyRan is a code owner

allenwang28 Awaiting requested review from allenwang28 allenwang28 is a code owner

At least 1 approving review is required to merge this pull request.

Labels

None yet