[mpich-discuss] using mpi4py in a Singluarity container run at a large computing center with Slurm installed
Kandes, Martin
mkandes at sdsc.edu
Fri Aug 31 14:46:46 CDT 2018
Hi Heather,
Can you copy and paste your Slurm batch job script here to give us an overview of what the job looks like? It'd also be helpful if you could provided the definition (or recipe) file for this container and a list of the software modules available at the site you are running the container.
Marty
P.S. In general, you want to have the same MPI implementation and version installed within the Singularity container as-is the one available on the host systems where the container will run.
________________________________
From: Heather Kelly <heather999kelly at gmail.com>
Sent: Friday, August 31, 2018 12:34:25 PM
To: discuss at mpich.org
Subject: [mpich-discuss] using mpi4py in a Singluarity container run at a large computing center with Slurm installed
Hi,
Complete newbie here.
I have a Singularity container (created by someone else), that includes a python script that uses mpi4py which is installed in the image, as is mpi. I'm trying to run this at a large computing center where SLURM is installed. The code in the container, wants to utilize its own installation of mpi4py and mpi. The code provides flags that allow a user to specify if they want to allow the code to use Slurm, SMP, or nothing at all. The default is SMP.
When I attempt to run this code in the image, on a compute node of this computing center, I receive an error:
HYDU_create_process (utils/launch/launch.c:75): execvp error on file srun (No such file or directory
even though I have specified to the program that I want to use SMP, yet, it appears mpiexec is trying to submit a job to Slurm. Checking the environment, it appears mpiexec is pointing to the version installed in the container, and not the one available at the computing center.
Is there an env variable or some way to set the process management system to avoid using Slurm?
In other contexts this code works just fine, my problem seems specific to running this in a container at these large computing centers where Slurm is available. It's as if the local computing center's mpi install is taking precedence..and perhaps that's just how it works, but I'd like to find a way around that.
Take care,
Heather
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20180831/e21532fc/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list