-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIO error when using gnu (> v10.1.0) and MPT #359
Comments
I do have some GNU tests that work in the latest... ERI_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_gnu.mizuroute-default But, it also seems that this requires running for at least 10 years before it shows up. This has: gnu/10.1.0 |
More updates. @ekluzek, do you think this is enough information for someone to tell what is the root cause for the error?? This is a test based on derecho with gcc and cray-mpich. The modules loaded for compilation and runs are:
Note that intel/cray-mpich and gcc/openmpi5.0.0 works fine. The run died after several time iterations at pio_synch call. Using DDT, I was able to trace back to the pio function where it stopped.
|
Hi @ekluzek, I heard some issues on pnetcdf in CESM I/O during the CESM workshop (I believe at CSEG working group AND at ultra-high resolution modeling session). Coincidently I did notice that the output error in mizuRoute happens with PIO built with pnetcdf support. When PIO is built without pnetcdf (just use netcdf), mizuRoute PIO output is stable. Note that this happens only for PIO built with gnu and cray-mpich. |
@nmizukami in looking at both ParallelIO and pnetcdf github pages I don't see an issue about something that might explain this. can you figure out which talks talked about this? Then we could watch the video and figure out where they talk about this. And then there might be more context to figure out where this will be talked about. |
When using gnu compiler with MPT, PIO sync fails (seemingly randomly) as segmentation fault (invalid memory reference).
Using intel compiler with MPT works fine.
Using gnu with openmpi works fine (seems to be).
This error happen with mizuRoute with large high resolution river network data (MERIT-Hydro)
I have been running into this problem for long time (for several years now).
More specific configuration is:
gnu v12.1.0
netcdf v 4.8.1
pnetcdf v1.12.3
mpt v2.25
The trace back looks like this (run with debug mode: flag is
-g -Wall -fmax-errors=0 -fbacktrace -fcheck=all
). 14 through 25 are not displayed: they would be in C codes.piolib_mod.F90 Line 1372 is just
PIOc_sync(file%fh)
The text was updated successfully, but these errors were encountered: