[mpich-discuss] MPICH v3.2 and SLRUM

Balaji, Pavan balaji at anl.gov
Tue Dec 1 12:41:10 CST 2015


You can also use the environment variable HYDRA_LAUNCHER_EXTRA_ARGS to tell mpiexec to add more arguments to srun while launching the applications.

  -- Pavan

> On Dec 1, 2015, at 12:39 PM, Guo, Yanfei <yguo at anl.gov> wrote:
> 
> Hi Andreas,
> 
> I am guessing that you are using "--cpu-freq" option of srun. One way to go is manually setting the SLRUM_CPU_FREQ_REQ environment variable. Srun suppose to pick that up.
> 
> Yanfei Guo
> Postdoctoral Appointee
> MCS Division, ANL
> 
> 
> 
> 
> 
> 
> 
> On 12/1/15, 11:22 AM, "Andreas Gocht" <andreas.gocht at tu-dresden.de> wrote:
> 
>> Hey
>> 
>> yeah mpiexec is working quite well. I'd just liked to use slurm, as our 
>> implantation allows to set the cpu frequency on a node. Is there a way 
>> to pass flags to srun using mpiexec?
>> 
>> Thanks four your help.
>> 
>> Kind Regards
>> 
>> Andreas
>> 
>> Am 01.12.2015 um 17:41 schrieb Balaji, Pavan:
>>> Looks like SLURM is telling MPICH that two processes are on the same node, even though they are on different nodes.  It looks like a bug in the SLURM PMI implementation.  Did you try simply using mpiexec instead?  You'll need to remove the --with-pmi, --with-pm, and LDFLAGS/LIBS options and rebuild mpich for that.  Note that mpiexec will internally use srun on slurm environments.
>>> 
>>>   -- Pavan
>>> 
>>>> On Dec 1, 2015, at 5:56 AM, Andreas Gocht <andreas.gocht at tu-dresden.de> wrote:
>>>> 
>>>> Hey
>>>> 
>>>> I tried to build an use mpich with slurm, sbatch and srun. Unfortunate it looks like MPI_Init doesn't work with srun.
>>>> 
>>>> I got the following error:
>>>> 
>>>> Fatal error in MPI_Init: Other MPI error, error stack:
>>>> MPIR_Init_thread(474).................:
>>>> MPID_Init(190)........................: channel initialization failed
>>>> MPIDI_CH3_Init(89)....................:
>>>> MPID_nem_init(272)....................:
>>>> MPIDI_CH3I_Seg_commit(366)............:
>>>> MPIU_SHMW_Hnd_deserialize(324)........:
>>>> MPIU_SHMW_Seg_open(865)...............:
>>>> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory
>>>> In: PMI_Abort(4239887, Fatal error in MPI_Init: Other MPI error, error stack:
>>>> MPIR_Init_thread(474).................:
>>>> MPID_Init(190)........................: channel initialization failed
>>>> MPIDI_CH3_Init(89)....................:
>>>> MPID_nem_init(272)....................:
>>>> MPIDI_CH3I_Seg_commit(366)............:
>>>> MPIU_SHMW_Hnd_deserialize(324)........:
>>>> MPIU_SHMW_Seg_open(865)...............:
>>>> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory)
>>>> 
>>>> I configured MPICH with "./configure --prefix=<some/prefix> --with-pmi=slurm --with-pm=none --with-slurm=<path/to/slurm>" and compiled my application with the  "-L<path_to_slurm_lib> -lpmi" command.
>>>> 
>>>> (as described in
>>>> 
>>>> https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed.
>>>> 
>>>> and
>>>> 
>>>> https://computing.llnl.gov/linux/slurm/mpi_guide.html#mpich2
>>>> 
>>>> )
>>>> 
>>>> I am running with 10 nodes and one task per node. Is there something I am missing during the configuration of MPICH?
>>>> 
>>>> Best,
>>>> 
>>>> Andreas
>>>> 
>>>> -- 
>>>> M.Sc. Andreas Gocht
>>>> 
>>>> Technische Universität Dresden
>>>> Center for Information Services and
>>>> High Performance Computing (ZIH)
>>>> D-01062 Dresden
>>>> Germany
>>>> 
>>>> Contact:
>>>> Willersbau, Room A 104
>>>> Phone:  (+49) 351 463-36415
>>>> Fax:    (+49) 351 463-3773
>>>> e-mail: andreas.gocht at tu-dresden.de
>>>> 
>>>> 
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> -- 
>> M.Sc. Andreas Gocht
>> 
>> Technische Universität Dresden
>> Center for Information Services and
>> High Performance Computing (ZIH)
>> D-01062 Dresden
>> Germany
>> 
>> Contact:
>> Willersbau, Room A 104
>> Phone:  (+49) 351 463-36415
>> Fax:    (+49) 351 463-3773
>> e-mail: andreas.gocht at tu-dresden.de
>> 
>> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list