Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: Dockerfile for cpu and cuda #831

Merged
merged 23 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions .github/workflows/docker-cd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.9", "3.10", "3.11" ]
python-version: [ "3.10", "3.11", "3.12" ]
cuda-version: [ "none", "12.0", "12.5" ]
steps:
- name: Check out code
uses: actions/checkout@v3
Expand All @@ -36,7 +37,8 @@ jobs:
if: ${{ github.repository == 'xorbitsai/xorbits' }}
env:
DOCKER_ORG: ${{ secrets.DOCKERHUB_USERNAME }}
PY_VERSION: ${{ matrix.python-version }}
PYTHON_VERSION: ${{ matrix.python-version }}
CUDA_VERSION: ${{ matrix.cuda-version }}
run: |
if [[ "$GITHUB_REF" =~ ^"refs/tags/" ]]; then
export GIT_TAG=$(echo "$GITHUB_REF" | sed -e "s/refs\/tags\///g")
Expand Down Expand Up @@ -65,21 +67,28 @@ jobs:
git checkout $branch
export IMAGE_TAG="nightly-$branch"
fi
docker build -t "xprobe/xorbits:base-py$PY_VERSION" --progress=plain -f python/xorbits/deploy/docker/Dockerfile.base . --build-arg PYTHON_VERSION=$PY_VERSION
docker build -t "$DOCKER_ORG/xorbits:${IMAGE_TAG}-py${PY_VERSION}" --progress=plain -f python/xorbits/deploy/docker/Dockerfile . --build-arg PYTHON_VERSION=$PY_VERSION
docker push "xprobe/xorbits:base-py$PY_VERSION"
docker push "$DOCKER_ORG/xorbits:${IMAGE_TAG}-py${PY_VERSION}"
if [[ "$CUDA_VERSION" == "none" ]]; then
# Build CPU image
docker build -t "$DOCKER_ORG/xorbits:base-py${PYTHON_VERSION}" --progress=plain -f python/xorbits/deploy/docker/Dockerfile.cpu.base . --build-arg PYTHON_VERSION=$PYTHON_VERSION
docker push "$DOCKER_ORG/xorbits:base-py${PYTHON_VERSION}"
docker build -t "$DOCKER_ORG/xorbits:${IMAGE_TAG}-py${PYTHON_VERSION}" --progress=plain -f python/xorbits/deploy/docker/Dockerfile.cpu . --build-arg PYTHON_VERSION=$PYTHON_VERSION
docker push "$DOCKER_ORG/xorbits:${IMAGE_TAG}-py${PYTHON_VERSION}"
else
# Build GPU image
docker build -t "$DOCKER_ORG/xorbits:${IMAGE_TAG}-cuda${CUDA_VERSION}-py${PYTHON_VERSION}" --progress=plain -f python/xorbits/deploy/docker/Dockerfile.cuda . --build-arg PYTHON_VERSION=$PYTHON_VERSION --build-arg CUDA_VERSION=$CUDA_VERSION
docker push "$DOCKER_ORG/xorbits:${IMAGE_TAG}-cuda${CUDA_VERSION}-py${PYTHON_VERSION}"
fi
done

- name: Set default image
shell: bash
if: matrix.python-version == '3.10'
if: matrix.python-version == '3.11'
env:
DOCKER_ORG: ${{ secrets.DOCKERHUB_USERNAME }}
PY_VERSION: ${{ matrix.python-version }}
PYTHON_VERSION: ${{ matrix.python-version }}
run: |
if [[ "$GITHUB_REF" =~ ^"refs/tags/" ]]; then
export GIT_TAG=$(echo "$GITHUB_REF" | sed -e "s/refs\/tags\///g")
docker tag "$DOCKER_ORG/xorbits:${GIT_TAG}-py${PY_VERSION}" "$DOCKER_ORG/xorbits:${GIT_TAG}"
docker tag "$DOCKER_ORG/xorbits:${GIT_TAG}-py${PYTHON_VERSION}" "$DOCKER_ORG/xorbits:${GIT_TAG}"
docker push "$DOCKER_ORG/xorbits:${GIT_TAG}"
fi
8 changes: 4 additions & 4 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,16 +92,16 @@ jobs:
- { os: ubuntu-20.04, module: hadoop, python-version: 3.9 }
- { os: ubuntu-latest, module: vineyard, python-version: 3.11 }
- { os: ubuntu-latest, module: external-storage, python-version: 3.11 }
# always test compatibility with the latest version
# - { os: ubuntu-latest, module: compatibility, python-version: 3.9 }
- { os: ubuntu-latest, module: doc-build, python-version: 3.9 }
- { os: self-hosted, module: gpu, python-version: 3.11}
- { os: ubuntu-latest, module: jax, python-version: 3.9 }
- { os: ubuntu-latest, module: datasets, python-version: 3.9 }
- { os: ubuntu-latest, module: kubernetes, python-version: 3.11 }
# a self-hosted runner which needs computing resources, activate when necessary
# - { os: juicefs-ci, module: kubernetes-juicefs, python-version: 3.9 }
# - { os: ubuntu-latest, module: slurm, python-version: 3.9 }
- { os: ubuntu-latest, module: datasets, python-version: 3.9 }
- { os: ubuntu-latest, module: kubernetes, python-version: 3.11 }
# always test compatibility with the latest version
# - { os: ubuntu-latest, module: compatibility, python-version: 3.9 }
steps:
- name: Check out code
uses: actions/checkout@v3
Expand Down
6 changes: 3 additions & 3 deletions doc/source/development/contributing_environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ For the image tag prefixes, ``nightly-main`` represents the latest code from `Xo
while ``v<release_version>`` represents version numbers for each release.
You can choose to pull the image based on your specific requirements.

If you indeed need to manually build Xorbits image, Xorbits provides a ``DockerFile`` in the ``python/xorbits/deploy/docker`` directory to build a Docker image
If you indeed need to manually build Xorbits image, Xorbits provides a ``Dockerfile`` in the ``python/xorbits/deploy/docker`` directory to build a Docker image
with a full Xorbits development environment.

**Docker Commands**
Expand All @@ -117,7 +117,7 @@ Build the Docker image::
$ cd xorbits

# Build the image
docker build -t xorbits-dev --progress=plain -f python/xorbits/deploy/docker/Dockerfile . --build-arg PYTHON_VERSION=<your_python_version>
docker build -t xorbits-dev --progress=plain -f python/xorbits/deploy/docker/Dockerfile.cpu . --build-arg PYTHON_VERSION=<your_python_version>

Run Container::

Expand All @@ -130,7 +130,7 @@ Run Container::

**Visual Studio Code**

You can use the DockerFile to launch a remote session with Visual Studio Code,
You can use the Dockerfile to launch a remote session with Visual Studio Code,
a popular free IDE, using the ``.devcontainer.json`` file.
See https://code.visualstudio.com/docs/remote/containers for details.

Expand Down
53 changes: 49 additions & 4 deletions doc/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,23 @@ Xorbits can be installed via pip from `PyPI <https://pypi.org/project/xorbits>`_

pip install xorbits


.. _install.version:
It will install the latest version of Xorbits and dependencies like ``pandas``, ``numpy``, etc.
We recommend you to use environment management tools like ``conda`` or ``venv`` to create
a new environment. ``conda`` will install the pre-compiled packages, while ``pip`` will
install the wheel (which is pre-compiled) or compile the packages from source code if no wheel
is available.

Python version support
----------------------

Officially Python 3.9, 3.10, 3.11, and 3.12.
Officially support Python 3.9, 3.10, 3.11, and 3.12.

Packages support
----------------

Xorbits partitions large datasets into chunks and processes each individual
chunk using single-node packages (such as pandas). Currently, our latest version strives
chunk using single-node packages (such as pandas).
Currently, our latest version strives
to be compatible with the latest single-node packages. The table below lists the highest
versions of the single-node packages that Xorbits are compatible with. If you are using
an older version of pandas, you should either upgrade your pandas or downgrade Xorbits.
Expand All @@ -40,6 +44,37 @@ Xorbits Python `NumPy`_ `pandas`_ `xgboost`_ `lightgbm`_ `datasets`
.. _`lightgbm`: https://lightgbm.readthedocs.io
.. _`datasets`: https://huggingface.co/docs/datasets/index

GPU support
-----------

Xorbits can also scale GPU-accelerated data science tools like `CuPy`_ and `cuDF`_. To enable GPU support, you need to install
GPU-accelerated packages. As GPU software stacks (i.e.,GPU driver, CUDA, etc.)
are complicated from CPU, you need to make sure NVIDIA driver and CUDA toolkit are properly installed.
We recommend you to use ``conda`` to install ``cuDF`` first, it will install both ``cudf`` and ``cupy``,
and then install ``xorbits`` with ``pip``.
``conda`` will help resolve the dependencies of ``cuDF`` and provides supporting software like CUDA.
Refer to `RAPIDS_INSTALL_DOCS`_ for more details about how to install ``cuDF``.

When using Xorbits with GPU, you need to add the :code:`gpu=True` parameter to the data loading method.
For example:

.. code-block:: python

import xorbits.pandas as pd
df = pd.read_parquet(path, gpu=True)

======= =================== ======== =========
Xorbits Python `CuPy`_ `cuDF`_
======= =================== ======== =========
0.8.1 3.10,3.11,3.12 13.3.0 24.10
======= =================== ======== =========

If you find installing GPU-accelerated packages too complicated, you can use our docker images
with pre-installed GPU drivers and CUDA toolkit. Please refer to :ref:`docker` for more details.

.. _`Cupy`: https://cupy.dev
.. _`cuDF`: https://docs.rapids.ai/api/cudf/stable/
.. _`RAPIDS_INSTALL_DOCS`: https://docs.rapids.ai/install/

Dependencies
------------
Expand Down Expand Up @@ -89,3 +124,13 @@ The following extra dependencies will be installed.
library.

* `fsspec <https://github.com/fsspec/filesystem_spec>`__: for cloud data accessing.

.. _docker:
Docker image
------------

To simplify the installation of Xorbits, we provide docker images with pre-installed
Xorbits and its dependencies.

* CPU image: ``xprobe/xorbits:v{version}-py{python_version}``, e.g., ``xprobe/xorbits:v0.8.0-py3.12``
* GPU image: ``xprobe/xorbits:v{version}-cuda{cuda_version}-py{python_version}``, e.g., ``xprobe/xorbits:v0.8.0-cuda12.0-py3.12``
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ msgstr ""
#: ../../source/development/contributing_environment.rst:112
msgid ""
"If you indeed need to manually build Xorbits image, Xorbits provides a "
"``DockerFile`` in the ``python/xorbits/deploy/docker`` directory to build"
"``Dockerfile`` in the ``python/xorbits/deploy/docker`` directory to build"
" a Docker image with a full Xorbits development environment."
msgstr ""
"如果你真的需要手动构建镜像,Xorbits 在 ``python/xorbits/deploy/docker`` 目录下提供了一个 Dockerfile 用以构建包含完备开发环境的镜像。"
Expand Down Expand Up @@ -254,7 +254,7 @@ msgstr ""

#: ../../source/development/contributing_environment.rst:136
msgid ""
"You can use the DockerFile to launch a remote session with Visual Studio "
"You can use the Dockerfile to launch a remote session with Visual Studio "
"Code, a popular free IDE, using the ``.devcontainer.json`` file. See "
"https://code.visualstudio.com/docs/remote/containers for details."
msgstr ""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -161,14 +161,14 @@ msgstr "自 ``v0.2.1`` 起,Xorbits 镜像不再支持 Python ``3.7``,同时
#: ../../source/user_guide/deployment_kubernetes.rst:82
msgid ""
"If you need to build an image from source, the related Dockerfiles exists"
" at `this position <https://github.com/xprobe-"
"inc/xorbits/tree/main/python/xorbits/deploy/docker>`_ for reference. You "
" at `this position <https://github.com/xorbitsai"
"/xorbits/tree/main/python/xorbits/deploy/docker>`_ for reference. You "
"can follow the `Docker document "
"<https://docs.docker.com/engine/reference/commandline/build/>`_ to build "
"your own Xorbits image."
msgstr ""
"如果你希望从源码制作一个镜像,可以参考我们的 `Dockerfile <https://github.com/xprobe-"
"inc/xorbits/tree/main/python/xorbits/deploy/docker>`_ 和 `Docker 构建文档 "
"如果你希望从源码制作一个镜像,可以参考我们的 `Dockerfile <https://github.com/xorbitsai"
"/xorbits/tree/main/python/xorbits/deploy/docker>`_ 和 `Docker 构建文档 "
"<https://docs.docker.com/engine/reference/commandline/build/>`_ 进行制作。"

#: ../../source/user_guide/deployment_kubernetes.rst:85
Expand All @@ -192,15 +192,15 @@ msgstr "安装 Python 包"

#: ../../source/user_guide/deployment_kubernetes.rst:94
msgid ""
"Refer `DockerFile <https://github.com/xprobe-"
"inc/xorbits/blob/main/python/xorbits/deploy/docker/Dockerfile.base>`_ for"
"Refer `Dockerfile <https://github.com/xorbitsai"
"/xorbits/blob/main/python/xorbits/deploy/docker/Dockerfile.cpu>`_ for"
" the python packages included in the Xorbits image. If you want to "
"install additional python packages in your Xorbits K8s cluster, use "
"``pip`` and ``conda`` options of the "
":meth:`xorbits.deploy.kubernetes.client.new_cluster` api."
msgstr ""
"Xorbits 的发布镜像中已经包含了一些 Python 包,参考 `DockerFile <https://github.com/xprobe-"
"inc/xorbits/blob/main/python/xorbits/deploy/docker/Dockerfile.base>`_ "
"Xorbits 的发布镜像中已经包含了一些 Python 包,参考 `Dockerfile <https://github.com/xorbitsai"
"/xorbits/blob/main/python/xorbits/deploy/docker/Dockerfile.cpu>`_ "
"中安装内容。如果你想安装额外的 Python 包或者改变其中某些包的版本,使用 "
":meth:`xorbits.deploy.kubernetes.client.new_cluster` 接口中的 ``pip`` 和 "
"``conda`` 选项即可。"
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/deployment_kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Finally, specify your own image during the deployment process through the ``imag

Install Python Packages
-----------------------
Refer `DockerFile <https://github.com/xorbitsai/xorbits/blob/main/python/xorbits/deploy/docker/Dockerfile.base>`_ for the python packages included in the Xorbits image.
Refer `Dockerfile <https://github.com/xorbitsai/xorbits/blob/main/python/xorbits/deploy/docker/Dockerfile.cpu>`_ for the python packages included in the Xorbits image.
If you want to install additional python packages in your Xorbits K8s cluster, use ``pip`` and ``conda`` options of the :meth:`xorbits.deploy.kubernetes.client.new_cluster` api.

Please make sure your K8s cluster can access the corresponding `channel of conda <https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html>`_ or `PyPi <https://pypi.org/>`_, when using ``pip`` and ``conda`` options.
Expand Down
10 changes: 9 additions & 1 deletion doc/source/user_guide/storage_backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,15 @@ If you want to run tasks on GPUs, add the :code:`gpu=True` parameter to the data
.. code-block:: python

import xorbits.pandas as pd
pd.read_parquet(path, gpu=True)
import xorbits.numpy as np

df = pd.read_parquet(path, gpu=True)
...

a = np.ones((1000, 1000), gpu=True)
b = np.ones((1000, 1000), gpu=True)
c = np.matmul(a, b)
...


All subsequent operations will run on GPUs.
69 changes: 0 additions & 69 deletions python/xorbits/deploy/docker/Dockerfile.base

This file was deleted.

46 changes: 46 additions & 0 deletions python/xorbits/deploy/docker/Dockerfile.cpu.base
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
FROM continuumio/miniconda3:24.9.2-0 AS base

ARG PYTHON_VERSION=3.9
SHELL ["/bin/bash", "-c"]

RUN apt-get -y update \
&& apt-get install -y \
curl \
procps \
gcc \
g++ \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN conda install python=${PYTHON_VERSION} \
nodejs=20.17 \
conda-forge::mkl \
conda-forge::libnuma \
&& conda clean --all -f -y
# TODO: UCXX is not mature enough for production, add it back when it's ready

RUN pip install -U \
numpy \
scipy \
pandas \
numexpr \
psutil \
scikit-learn \
sqlalchemy \
tornado \
xoscar \
pyarrow \
cloudpickle \
azure-storage-blob \
adlfs \
fsspec \
s3fs \
pyopenssl \
datasets \
python-kubernetes \
jax \
uvloop \
Cython

RUN if [ "$PYTHON_VERSION" == "3.9" ] ; then \
pip install -U 'setuptools<64' ; fi
Loading
Loading