[mpich-devel] MPICH with SYCL on Aurora
Wozniak, Justin M.
woz at anl.gov
Tue Mar 31 12:02:33 CDT 2026
This is mpich-5.0.0rc3 , I will try that, thanks.
--
Justin M Wozniak
________________________________
From: Raffenetti, Ken <raffenet at anl.gov>
Sent: Tuesday, March 31, 2026 11:30
To: devel at mpich.org <devel at mpich.org>; Harms, Kevin <harms at alcf.anl.gov>
Cc: Wozniak, Justin M. <woz at anl.gov>
Subject: Re: MPICH with SYCL on Aurora
Which version of MPI is this? This might be a known issue in CMA support (fixed here https://urldefense.us/v3/__https://github.com/pmodels/mpich/pull/7743__;!!G_uCfscf7eWS!cUiPT3Nmpearus7dfptbS5Cz3Q8WaE29of-5eDNCJqQMBj0uuKoeZlwEDkmOTVRQzaomHJtR$ ). You can try disabling CMA with MPIR_CVAR_CH4_CMA_ENABLE=0 to avoid that path or pull in the fix to your copy and rebuild.
Ken
From: Wozniak, Justin M. via devel <devel at mpich.org>
Date: Tuesday, March 31, 2026 at 11:27 AM
To: Harms, Kevin <harms at alcf.anl.gov>, devel at mpich.org <devel at mpich.org>
Cc: Wozniak, Justin M. <woz at anl.gov>
Subject: Re: [mpich-devel] MPICH with SYCL on Aurora
With MPIR_CVAR_REQUEST_ERR_FATAL=1 in a 2-process run, this looks like:
Abort(270742287) on node 0: Fatal error in internal_Waitall: Other MPI error, error stack:
internal_Waitall(126)..: MPI_Waitall(count=1, array_of_requests=0x797cdb0, array_of_statuses=0x7ca3fb0) failed
MPIR_Waitall(916)......:
MPIDI_IPC_rndv_cb(172).:
MPIDI_CMA_copy_data(54):
copy_iovs(202).........: process_vm_readv failed (errno 14)
Abort(270742287) on node 1: Fatal error in internal_Waitall: Other MPI error, error stack:
(same)
This succeeds for 1-process with SYCL enabled or for 2-process with SYCL disabled in the app at configure time.
The app looks like:
$ ldd =agent
libmpicxx.so.0 => /lus/flare/projects/EpiCalib/sfw/mpich-5.0.0rc3/lib/libmpicxx.so.0 (0x0000146726aed000)
libmpi.so.0 => /lus/flare/projects/EpiCalib/sfw/mpich-5.0.0rc3/lib/libmpi.so.0 (0x0000146725265000)
libmkl_sycl_blas.so.5 => /opt/aurora/26.26.0/oneapi/mkl/latest/lib/libmkl_sycl_blas.so.5 (0x0000146721063000)
(etc.)
libstdc++.so.6 => /opt/aurora/26.26.0/spack/unified/1.1.1/install/linux-x86_64/gcc-13.4.0-hgnyg4p/lib64/libstdc++.so.6 (0x0000146708cd9000)
libm.so.6 => /lib64/libm.so.6 (0x0000146708b77000)
libgcc_s.so.1 => /opt/aurora/26.26.0/spack/unified/1.1.1/install/linux-x86_64/gcc-13.4.0-hgnyg4p/lib64/libgcc_s.so.1 (0x0000146708b53000)
libsycl.so.8 => /opt/aurora/26.26.0/oneapi/compiler/latest/lib/libsycl.so.8 (0x0000146708758000)
libOpenCL.so.1 => /opt/aurora/26.26.0/support/libraries/khronos/default/lib64/libOpenCL.so.1 (0x0000146708743000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000014670871f000)
libc.so.6 => /lib64/libc.so.6 (0x000014670852a000)
libhwloc.so.15 => /opt/aurora/26.26.0/oneapi/tcm/latest/lib/libhwloc.so.15 (0x00001467082cc000)
(etc.)
Thanks
--
Justin M Wozniak
________________________________
From: Harms, Kevin <harms at alcf.anl.gov>
Sent: Monday, March 30, 2026 14:50
To: devel at mpich.org <devel at mpich.org>
Cc: Wozniak, Justin M. <woz at anl.gov>
Subject: Re: MPICH with SYCL on Aurora
Justin,
can you provide the specific error?
kevin
________________________________________
From: Wozniak, Justin M. via devel <devel at mpich.org>
Sent: Monday, March 30, 2026 2:24 PM
To: devel at mpich.org
Cc: Wozniak, Justin M.
Subject: [mpich-devel] MPICH with SYCL on Aurora
Hi
I am trying to port a simulation ensemble workflow that runs ExaEpi/AMReX/SYCL to Aurora. The outer workflow uses the system MPI and we use MPICH to run the app with node-local parallelism using a hand-built MPICH. On Aurora, I get errors in early MPI calls that I think are due to SYCL. This approach works on NVIDIA systems like Perlmutter. Is there some simple way to make MPICH aware of SYCL?
Thanks
--
Justin M Wozniak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20260331/ddce0922/attachment.html>
More information about the devel
mailing list