[mpich-devel] MPICH with SYCL on Aurora

Wozniak, Justin M. woz at anl.gov
Tue Mar 31 11:27:29 CDT 2026


With MPIR_CVAR_REQUEST_ERR_FATAL=1 in a 2-process run, this looks like:

Abort(270742287) on node 0: Fatal error in internal_Waitall: Other MPI error, error stack:
internal_Waitall(126)..: MPI_Waitall(count=1, array_of_requests=0x797cdb0, array_of_statuses=0x7ca3fb0) failed
MPIR_Waitall(916)......:
MPIDI_IPC_rndv_cb(172).:
MPIDI_CMA_copy_data(54):
copy_iovs(202).........: process_vm_readv failed (errno 14)

Abort(270742287) on node 1: Fatal error in internal_Waitall: Other MPI error, error stack:
(same)

This succeeds for 1-process with SYCL enabled or for 2-process with SYCL disabled in the app at configure time.

The app looks like:

$ ldd =agent
        libmpicxx.so.0 => /lus/flare/projects/EpiCalib/sfw/mpich-5.0.0rc3/lib/libmpicxx.so.0 (0x0000146726aed000)
        libmpi.so.0 => /lus/flare/projects/EpiCalib/sfw/mpich-5.0.0rc3/lib/libmpi.so.0 (0x0000146725265000)
        libmkl_sycl_blas.so.5 => /opt/aurora/26.26.0/oneapi/mkl/latest/lib/libmkl_sycl_blas.so.5 (0x0000146721063000)
   (etc.)
        libstdc++.so.6 => /opt/aurora/26.26.0/spack/unified/1.1.1/install/linux-x86_64/gcc-13.4.0-hgnyg4p/lib64/libstdc++.so.6 (0x0000146708cd9000)
        libm.so.6 => /lib64/libm.so.6 (0x0000146708b77000)
        libgcc_s.so.1 => /opt/aurora/26.26.0/spack/unified/1.1.1/install/linux-x86_64/gcc-13.4.0-hgnyg4p/lib64/libgcc_s.so.1 (0x0000146708b53000)
        libsycl.so.8 => /opt/aurora/26.26.0/oneapi/compiler/latest/lib/libsycl.so.8 (0x0000146708758000)
        libOpenCL.so.1 => /opt/aurora/26.26.0/support/libraries/khronos/default/lib64/libOpenCL.so.1 (0x0000146708743000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000014670871f000)
        libc.so.6 => /lib64/libc.so.6 (0x000014670852a000)
        libhwloc.so.15 => /opt/aurora/26.26.0/oneapi/tcm/latest/lib/libhwloc.so.15 (0x00001467082cc000)
(etc.)

Thanks


--

Justin M Wozniak


________________________________
From: Harms, Kevin <harms at alcf.anl.gov>
Sent: Monday, March 30, 2026 14:50
To: devel at mpich.org <devel at mpich.org>
Cc: Wozniak, Justin M. <woz at anl.gov>
Subject: Re: MPICH with SYCL on Aurora

Justin,

  can you provide the specific error?

kevin

________________________________________
From: Wozniak, Justin M. via devel <devel at mpich.org>
Sent: Monday, March 30, 2026 2:24 PM
To: devel at mpich.org
Cc: Wozniak, Justin M.
Subject: [mpich-devel] MPICH with SYCL on Aurora

Hi
    I am trying to port a simulation ensemble workflow that runs ExaEpi/AMReX/SYCL to Aurora.  The outer workflow uses the system MPI and we use MPICH to run the app with node-local parallelism using a hand-built MPICH.  On Aurora, I get errors in early MPI calls that I think are due to SYCL.  This approach works on NVIDIA systems like Perlmutter.  Is there some simple way to make MPICH aware of SYCL?
    Thanks

--

Justin M Wozniak

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20260331/65cffa3b/attachment.html>


More information about the devel mailing list