-
Notifications
You must be signed in to change notification settings - Fork 7
SYCL Timings for CCO‐surface30
Eric Bylaska edited this page Oct 7, 2023
·
5 revisions
- This example is FFT dominant.
Directory: /home/bylaska/PWDFT3/QA/CCO-Cu_surface30
The table contains performance timings for the computational task on the given machine with varying numbers of CPU cores (ncpus). The timings are presented in seconds (cputime) and are broken down into different components:
- non-local: Timings for non-local operations.
- ffm: Timings for ffm operations.
- fmf: Timings for fmf operations.
- fft: Timings for FFT (Fast Fourier Transform) operations.
- diagonalize: Timings for diagonalize operations.
"In the SYCL binary, FFT operations are exclusively performed using MPI, while BLAS3 operations are executed on the GPU. Additionally, it's important to note that the GPUs become overloaded after reaching a threshold of ncpus=6."
Directory: /home/bylaska/PWDFT3/QA/CCO-Cu_surface30
machine | ncpus | cputime | non-local | ffm | fmf | fft | diagonalize |
---|---|---|---|---|---|---|---|
SYCL | 1 | 1.519e+01 | 1.951e+00 | 8.861e-02 | 7.209e-02 | 1.291e+01 | 4.546e-02 |
SYCL | 2 | 8.975e+00 | 6.720e-01 | 7.058e-02 | 4.063e-02 | 7.508e+00 | 7.711e-03 |
SYCL | 4 | 4.726e+00 | 3.806e-01 | 4.169e-02 | 2.210e-02 | 3.930e+00 | 3.337e-03 |
SYCL | 6 | 3.156e+00 | 2.836e-01 | 3.112e-02 | 1.453e-02 | 2.591e+00 | 3.330e-03 |
SYCL | 12 | 1.731e+00 | 2.260e-01 | 2.491e-02 | 9.346e-03 | 1.340e+00 | 3.280e-03 |
SYCL | 24 | 1.162e+00 | 2.552e-01 | 2.844e-02 | 8.030e-03 | 7.770e-01 | 3.534e-03 |
SYCL | 48 | 9.357e-01 | 2.749e-01 | 4.026e-02 | 8.584e-03 | 5.168e-01 | 4.718e-03 |
SYCL | 64 | 8.649e-01 | 3.305e-01 | 4.270e-02 | 1.512e-02 | 4.137e-01 | 5.813e-03 |
SYCL | 96 | 9.893e-01 | 4.708e-01 | 5.608e-02 | 8.433e-03 | 3.352e-01 | 4.620e-03 |
The table presents the total and component times for different numbers of CPU cores (ncpus). The optimal timings for each component are indicated by bold values.