[mpich-discuss] MPICH v3.2 and SLRUM

Andreas Gocht andreas.gocht at tu-dresden.de
Wed Dec 2 02:40:38 CST 2015


Thanks :-)

Andreas

Am 01.12.2015 um 19:41 schrieb Balaji, Pavan:
> You can also use the environment variable HYDRA_LAUNCHER_EXTRA_ARGS to tell mpiexec to add more arguments to srun while launching the applications.
>
>    -- Pavan
>
>> On Dec 1, 2015, at 12:39 PM, Guo, Yanfei <yguo at anl.gov> wrote:
>>
>> Hi Andreas,
>>
>> I am guessing that you are using "--cpu-freq" option of srun. One way to go is manually setting the SLRUM_CPU_FREQ_REQ environment variable. Srun suppose to pick that up.
>>
>> Yanfei Guo
>> Postdoctoral Appointee
>> MCS Division, ANL
>>
>>
>>
>>
>>
>>
>>
>> On 12/1/15, 11:22 AM, "Andreas Gocht" <andreas.gocht at tu-dresden.de> wrote:
>>
>>> Hey
>>>
>>> yeah mpiexec is working quite well. I'd just liked to use slurm, as our
>>> implantation allows to set the cpu frequency on a node. Is there a way
>>> to pass flags to srun using mpiexec?
>>>
>>> Thanks four your help.
>>>
>>> Kind Regards
>>>
>>> Andreas
>>>
>>> Am 01.12.2015 um 17:41 schrieb Balaji, Pavan:
>>>> Looks like SLURM is telling MPICH that two processes are on the same node, even though they are on different nodes.  It looks like a bug in the SLURM PMI implementation.  Did you try simply using mpiexec instead?  You'll need to remove the --with-pmi, --with-pm, and LDFLAGS/LIBS options and rebuild mpich for that.  Note that mpiexec will internally use srun on slurm environments.
>>>>
>>>>    -- Pavan
>>>>
>>>>> On Dec 1, 2015, at 5:56 AM, Andreas Gocht <andreas.gocht at tu-dresden.de> wrote:
>>>>>
>>>>> Hey
>>>>>
>>>>> I tried to build an use mpich with slurm, sbatch and srun. Unfortunate it looks like MPI_Init doesn't work with srun.
>>>>>
>>>>> I got the following error:
>>>>>
>>>>> Fatal error in MPI_Init: Other MPI error, error stack:
>>>>> MPIR_Init_thread(474).................:
>>>>> MPID_Init(190)........................: channel initialization failed
>>>>> MPIDI_CH3_Init(89)....................:
>>>>> MPID_nem_init(272)....................:
>>>>> MPIDI_CH3I_Seg_commit(366)............:
>>>>> MPIU_SHMW_Hnd_deserialize(324)........:
>>>>> MPIU_SHMW_Seg_open(865)...............:
>>>>> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory
>>>>> In: PMI_Abort(4239887, Fatal error in MPI_Init: Other MPI error, error stack:
>>>>> MPIR_Init_thread(474).................:
>>>>> MPID_Init(190)........................: channel initialization failed
>>>>> MPIDI_CH3_Init(89)....................:
>>>>> MPID_nem_init(272)....................:
>>>>> MPIDI_CH3I_Seg_commit(366)............:
>>>>> MPIU_SHMW_Hnd_deserialize(324)........:
>>>>> MPIU_SHMW_Seg_open(865)...............:
>>>>> MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory)
>>>>>
>>>>> I configured MPICH with "./configure --prefix=<some/prefix> --with-pmi=slurm --with-pm=none --with-slurm=<path/to/slurm>" and compiled my application with the  "-L<path_to_slurm_lib> -lpmi" command.
>>>>>
>>>>> (as described in
>>>>>
>>>>> https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed.
>>>>>
>>>>> and
>>>>>
>>>>> https://computing.llnl.gov/linux/slurm/mpi_guide.html#mpich2
>>>>>
>>>>> )
>>>>>
>>>>> I am running with 10 nodes and one task per node. Is there something I am missing during the configuration of MPICH?
>>>>>
>>>>> Best,
>>>>>
>>>>> Andreas
>>>>>
>>>>> -- 
>>>>> M.Sc. Andreas Gocht
>>>>>
>>>>> Technische Universität Dresden
>>>>> Center for Information Services and
>>>>> High Performance Computing (ZIH)
>>>>> D-01062 Dresden
>>>>> Germany
>>>>>
>>>>> Contact:
>>>>> Willersbau, Room A 104
>>>>> Phone:  (+49) 351 463-36415
>>>>> Fax:    (+49) 351 463-3773
>>>>> e-mail: andreas.gocht at tu-dresden.de
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> -- 
>>> M.Sc. Andreas Gocht
>>>
>>> Technische Universität Dresden
>>> Center for Information Services and
>>> High Performance Computing (ZIH)
>>> D-01062 Dresden
>>> Germany
>>>
>>> Contact:
>>> Willersbau, Room A 104
>>> Phone:  (+49) 351 463-36415
>>> Fax:    (+49) 351 463-3773
>>> e-mail: andreas.gocht at tu-dresden.de
>>>
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-- 
M.Sc. Andreas Gocht

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 104
Phone:  (+49) 351 463-36415
Fax:    (+49) 351 463-3773
e-mail: andreas.gocht at tu-dresden.de


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5149 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151202/d2ef4e2e/attachment.p7s>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list