Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.31.1: Updated provisioning guide for A3 VM family

01 Apr 21:23
b830db4
Compare
Choose a tag to compare

What's Changed

The A3 provisioning guide was updated by @tpdownes to support 2 use cases:

  • user-created reservations without compact placement policies that are automatically consumed by matching VMs
  • Google Cloud-created reservations that must be specifically identified by Slurm cluster for consumption

See #2420 and reservation consumption documentation for details.

Full Changelog: v1.31.0...v1.31.1

v1.31.0: Improved Local File Management

28 Mar 19:06
fe6b653
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Implement ghpc_stage function to stage files into deployment by @mr0re1 in #2339

Module Improvements 🔨

  • Support http_proxy in HTCondor Windows installation by @tpdownes in #2368
  • Slurm6. Add support for dynamic nodeset. by @mr0re1 in #1986

Improvements 🛠

Deprecations 💤

  • Deprecate schedmd-slurm-gcp-v6-partition.network_storage by @mr0re1 in #2379
  • Remove quota validator by @mr0re1 in #2382

Bug fixes 🐞

  • Packer service account fix and alignment with Toolkit naming convention by @tpdownes in #2367

Full Changelog: v1.30.0...v1.31.0

v1.30.0 - Cloud HPC Toolkit A3 VM + NeMo Framework Solution

18 Mar 21:51
08ae77e
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Introduction of the Cloud HPC Toolkit A3 VM family blueprint featuring
    • A Slurm cluster composed of A3 VMs each with 8 NVIDIA H100 GPUs
    • An example for running the NVIDIA NeMo framework
    • An example for running the common nccl-tests benchmark

Module Improvements 🔨

Improvements 🛠

  • Add TPU v4 blueprint and tutorial to demonstrate running TPU workload by @harshthakkar01 in #2287
  • Update parameters for TPU nodeset module and add precondition checks and bump TPU to v3 by @harshthakkar01 in #2293
  • Add Slurm v6 version for image builder blueprint by @harshthakkar01 in #2297
  • Allow ghpc deploy blueprint.yaml by @mr0re1 in #2323
  • Slurm GCP version update; will cooldown before deleting orphan nodes by @nick-stroud in #2322
  • Add SlurmGCP v6 example of slurm compatible with startup scripts and integration test by @harshthakkar01 in #2346

Version Updates ⏫

Bug fixes 🐞

  • Added enable_devel for packer build to fix issue with bp by @cdunbar13 in #2334

New Contributors

Full Changelog: v1.29.0...v1.30.0

v1.29.0: New Firewall Rules module & Slurm-GCP v6 Improvements

07 Mar 21:27
c024e72
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

  • Split service account creation from htcondor-setup by @tpdownes in #2250

Module Improvements 🔨

  • Set http_proxy, https_proxy variables for user login and during startup-script by @tpdownes in #2237
  • Update documentation for Packer to include minimum operational requirements by @tpdownes in #2241
  • Modify cloud-storage-bucket to include ability to set bucket viewers by @tpdownes in #2247
  • Add "submit" option to batch-job-template module by @aaronegolden in #2210
  • Prevent usage of placement with static and auto-scale nodes in same nodeset by @nick-stroud in #2279

Improvements 🛠

Version Updates ⏫

Bug fixes 🐞

Full Changelog: v1.28.1...v1.29.0

v1.28.1: Slurm-GCP v4 reaches End-of-Life, improved Slurm-GCP v6 support

15 Feb 23:13
75a04d4
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

  • Slurm6. Make subnetwork_self_link required, don't pass subnetwork_project by @mr0re1 in #2067
  • Slurm6. Automagicaly set nodeset.name from module id. by @mr0re1 in #2068
  • Slurm6. Add support for additional_networks, access_config & reservation_name by @mr0re1 in #2062
  • Reduce default maximum number of HTCondor execute points by @tpdownes in #2127
  • Startup stackdriver option by @nick-stroud in #2120
  • HTCondor: variable MIG behavior by @tpdownes in #2140
  • Extending GKE Scheduler module by @ek-nag in #2137
  • Copies python binaries instead of symlink for more isolated venv by @nick-stroud in #2151
  • Increase dynamic node count to a more reasonable default value by @nick-stroud in #2153
  • Update Chrome Remote Desktop to Debian 12 by default by @tpdownes in #2180
  • Update startup-script module to latest release by @tpdownes in #2183
  • Updates to HTCondor autoscaler by @tpdownes in #2204
  • Change batch-job-base template from json to YAML by @aaronegolden in #2199
  • Add Slurm configuration template for long Prolog/Epilog scripts by @tpdownes in #2218

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

  • Update spack openfoam example to use /opt/apps directory by @harshthakkar01 in #2131
  • Fix HTCondow Windows URI for latest 23.0 LTS release by @tpdownes in #2141
  • Validation added to Slurm v5 login_startup_scripts_timeout by @cdunbar13 in #2148
  • Ensure Windows VMs start HTCondor only after successful secret download by @tpdownes in #2174

New Contributors

Full Changelog: v1.27.0...v1.28.0

Submission Checklist

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cloud HPC Toolkit Contribution guidelines #

v1.27.0: Spack support for non-root users

10 Jan 01:12
fcdc5e5
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

  • Making CloudSQL to use internal IP address instead of external for Slurm Accounting DB. by @ek-nag in #1795
  • OFE: Various new features and fixes. by @ek-nag in #2040
  • Disable firewall rule logging by default by @tpdownes in #2057
  • Slurm6. Add support for enable_slurm_gcp_plugins by @mr0re1 in #2066
  • Support explicit reserved_ip_range for Filestore instances by @tpdownes in #2072
  • Adopt gcloud storage over gsutil by default by @tpdownes in #2075
  • Skip upgrade of wheel/setuptools if already installed by @tpdownes in #2074
  • Support use of http/https proxy for pip/apt/yum package managers by @tpdownes in #2079

Improvements 🛠

Version Updates ⏫

Bug fixes 🐞

New Contributors

Full Changelog: v1.26.1...v1.27.0

v1.26.1: Fix regression in wait-for-startup module

14 Dec 16:11
a7aaccf
Compare
Choose a tag to compare

What's Changed

GoogleCloudPlatform/guest-agent@5c85572 introduced a change in bootup logging that prevented the community/modules/scripts/wait-for-startup module from detecting the end of a failed startup-script. The new solution has been patched to detect failure on new and old releases of the guest-agent.

Bug fixes 🐞

Full Changelog: v1.26.0...v1.26.1

GKE support for GCS, colorized output, improved "ghpc create" output

04 Dec 14:42
e8a7e20
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

  • Slurm6. Fix race condition between GCS config files and instances by @mr0re1 in #1932
  • Script run warning stage 2 by @cdunbar13 in #1956

Improvements 🛠

Deprecations 💤

  • Do not set role-label in expand, rely on module embedded labels by @mr0re1 in #1904

Version Updates ⏫

  • Bump django from 4.2.3 to 4.2.7 in /community/front-end/ofe by @dependabot in #1926

Other changes

  • Eliminate startup-script hasn't started message by @tpdownes in #2001

Full Changelog: v1.25.0...v1.26.0

v1.25.0: CAE solution

07 Nov 23:34
3abddcf
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

  • Add support for reading metadata.yaml in CFT-format, fallback to har… by @mr0re1 in #1841
  • Enable usage of GCS URL as Module.Source by @mr0re1 in #1523

Version Updates ⏫

Bug fixes 🐞

  • Fix HCLS and gcs fuse installation by @cdunbar13 in #1845
  • Fix HTCondor Windows download URI by @tpdownes in #1847
  • Update ansible's usage of virtualenv to venv by @cdunbar13 in #1877
  • Check for stockouts during bulkInsert in integration tests. by @cdunbar13 in #1880
  • Applied recommended changes to gcsfuse and nfs scripts to fix apt-get update by @cdunbar13 in #1901
  • Hotfix for vm-instance to allow image names to be used correctly by @cdunbar13 in #1930

Other changes

New Contributors

Full Changelog: v1.24.0...rc1.25.0

v1.24.0: Support for ephemeral storage on GKE, Slurm-on-GCP update to 5.9.1

19 Oct 06:42
e64f027
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Version Updates ⏫

Bug fixes 🐞

New Contributors

Full Changelog: v1.23.0...1.24.0