[mpich-discuss] error spawning processes in mpich-3.2rc1

Min Si msi at il.is.s.u-tokyo.ac.jp
Mon Oct 12 12:49:04 CDT 2015


Sure. Please send to msi at il.is.s.u-tokyo.ac.jp.

Min

On 10/12/15 9:24 AM, Siegmar Gross wrote:
> Hi Min,
>
>> It seems you already enabled the most detailed error outputs. We could
>> not think out any clue for now. If you can give us access to your
>> machine, we are glad to help you debug on it.
>
> Can you send me your email address because I don't want to send
> login data to this list.
>
>
> Kind regards
>
> Siegmar
>
>
>>
>> Min
>>
>> On 10/8/15 12:02 AM, Siegmar Gross wrote:
>>> Hi Min,
>>>
>>> thank you very much for your answer.
>>>
>>>> We cannot reproduce this error on our test machines (Solaris i386,
>>>> Ubuntu x86_64) by using your programs. And unfortunately we do not 
>>>> have
>>>> Solaris Sparc machine thus could not verify it.
>>>
>>> The programs work fine on my Solaris x86_64 and Linux machines
>>> as well. I only have a problem on Solaris Sparc.
>>>
>>>
>>>> Sometime, it can happen that you need to add "./" in front of the
>>>> program path, could you try it ?
>>>>
>>>> For example, in spawn_master.c MPI: A Message-Passing Interface 
>>>> Standard
>>>>> #define SLAVE_PROG "./spawn_slave"
>>>
>>> No, it wil not work, because the programs are stored in a
>>> different directory ($HOME/{SunOS, Linux}/{sparc, x86_64}/bin)
>>> which is part of PATH (as well as ".").
>>>
>>> Can I do anything to track the source of the error?
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>>
>>>> Min
>>>>
>>>> On 10/7/15 5:03 AM, Siegmar Gross wrote:
>>>>> Hi,
>>>>>
>>>>> today I've built mpich-3.2rc1 on my machines (Solaris 10 Sparc,
>>>>> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0
>>>>> and Sun C 5.13. I still get the following errors on my Sparc machine
>>>>> which I'd already reported September 8th. "mpiexec" is aliased to
>>>>> 'mpiexec -genvnone'. It still doesn't matter if I use my cc- or
>>>>> gcc-version of MPICH.
>>>>>
>>>>>
>>>>> tyr spawn 119 mpichversion
>>>>> MPICH Version:          3.2rc1
>>>>> MPICH Release date:     Wed Oct  7 00:00:33 CDT 2015
>>>>> MPICH Device:           ch3:nemesis
>>>>> MPICH configure: --prefix=/usr/local/mpich-3.2_64_cc
>>>>> --libdir=/usr/local/mpich-3.2_64_cc/lib64
>>>>> --includedir=/usr/local/mpich-3.2_64_cc/include64 CC=cc CXX=CC 
>>>>> F77=f77
>>>>> FC=f95 CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 
>>>>> LDFLAGS=-m64
>>>>> -L/usr/lib/sparcv9 -R/usr/lib/sparcv9 --enable-fortran=yes
>>>>> --enable-cxx --enable-romio --enable-debuginfo --enable-smpcoll
>>>>> --enable-threads=multiple --with-thread-package=posix --enable-shared
>>>>> MPICH CC:       cc -m64   -O2
>>>>> MPICH CXX:      CC -m64  -O2
>>>>> MPICH F77:      f77 -m64
>>>>> MPICH FC:       f95 -m64  -O2
>>>>> tyr spawn 120
>>>>>
>>>>>
>>>>>
>>>>> tyr spawn 111 mpiexec -np 1 spawn_master
>>>>>
>>>>> Parent process 0 running on tyr.informatik.hs-fulda.de
>>>>>   I create 4 slave processes
>>>>>
>>>>> Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
>>>>> MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_slave",
>>>>> argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD,
>>>>> intercomm=ffffffff7fffde50, errors=0) failed
>>>>> MPIDI_Comm_spawn_multiple(274):
>>>>> MPID_Comm_accept(153).........:
>>>>> MPIDI_Comm_accept(1057).......:
>>>>> MPIR_Bcast_intra(1287)........:
>>>>> MPIR_Bcast_binomial(310)......: Failure during collective
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> tyr spawn 112 mpiexec -np 1 spawn_multiple_master
>>>>>
>>>>> Parent process 0 running on tyr.informatik.hs-fulda.de
>>>>>   I create 3 slave processes.
>>>>>
>>>>> Fatal error in MPI_Comm_spawn_multiple: Unknown error class, error
>>>>> stack:
>>>>> MPI_Comm_spawn_multiple(162)..: MPI_Comm_spawn_multiple(count=2,
>>>>> cmds=ffffffff7fffde08, argvs=ffffffff7fffddf8,
>>>>> maxprocs=ffffffff7fffddf0, infos=ffffffff7fffdde8, root=0,
>>>>> MPI_COMM_WORLD, intercomm=ffffffff7fffdde4, errors=0) failed
>>>>> MPIDI_Comm_spawn_multiple(274):
>>>>> MPID_Comm_accept(153).........:
>>>>> MPIDI_Comm_accept(1057).......:
>>>>> MPIR_Bcast_intra(1287)........:
>>>>> MPIR_Bcast_binomial(310)......: Failure during collective
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> tyr spawn 113 mpiexec -np 1 spawn_intra_comm
>>>>> Parent process 0: I create 2 slave processes
>>>>> Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
>>>>> MPI_Comm_spawn(144)...........: 
>>>>> MPI_Comm_spawn(cmd="spawn_intra_comm",
>>>>> argv=0, maxprocs=2, MPI_INFO_NULL, root=0, MPI_COMM_WORLD,
>>>>> intercomm=ffffffff7fffded4, errors=0) failed
>>>>> MPIDI_Comm_spawn_multiple(274):
>>>>> MPID_Comm_accept(153).........:
>>>>> MPIDI_Comm_accept(1057).......:
>>>>> MPIR_Bcast_intra(1287)........:
>>>>> MPIR_Bcast_binomial(310)......: Failure during collective
>>>>> tyr spawn 114
>>>>>
>>>>>
>>>>> I would be grateful if somebody can fix the problem. Thank you very
>>>>> much for any help in advance. I've attached my programs. Please let
>>>>> me know if you need anything else.
>>>>>
>>>>>
>>>>> Kind regards
>>>>>
>>>>> Siegmar
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing listdiscuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing listdiscuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151012/e45ee97f/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list