[mpich-discuss] error spawning processes in mpich-3.2rc1

Min Si msi at il.is.s.u-tokyo.ac.jp
Thu Oct 8 21:34:19 CDT 2015


Hi Siegmar,

It seems you already enabled the most detailed error outputs. We could 
not think out any clue for now. If you can give us access to your 
machine, we are glad to help you debug on it.

Min

On 10/8/15 12:02 AM, Siegmar Gross wrote:
> Hi Min,
>
> thank you very much for your answer.
>
>> We cannot reproduce this error on our test machines (Solaris i386,
>> Ubuntu x86_64) by using your programs. And unfortunately we do not have
>> Solaris Sparc machine thus could not verify it.
>
> The programs work fine on my Solaris x86_64 and Linux machines
> as well. I only have a problem on Solaris Sparc.
>
>
>> Sometime, it can happen that you need to add "./" in front of the
>> program path, could you try it ?
>>
>> For example, in spawn_master.c MPI: A Message-Passing Interface Standard
>>> #define SLAVE_PROG      "./spawn_slave"
>
> No, it wil not work, because the programs are stored in a
> different directory ($HOME/{SunOS, Linux}/{sparc, x86_64}/bin)
> which is part of PATH (as well as ".").
>
> Can I do anything to track the source of the error?
>
>
> Kind regards
>
> Siegmar
>
>>
>> Min
>>
>> On 10/7/15 5:03 AM, Siegmar Gross wrote:
>>> Hi,
>>>
>>> today I've built mpich-3.2rc1 on my machines (Solaris 10 Sparc,
>>> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-5.1.0
>>> and Sun C 5.13. I still get the following errors on my Sparc machine
>>> which I'd already reported September 8th. "mpiexec" is aliased to
>>> 'mpiexec -genvnone'. It still doesn't matter if I use my cc- or
>>> gcc-version of MPICH.
>>>
>>>
>>> tyr spawn 119 mpichversion
>>> MPICH Version:          3.2rc1
>>> MPICH Release date:     Wed Oct  7 00:00:33 CDT 2015
>>> MPICH Device:           ch3:nemesis
>>> MPICH configure:        --prefix=/usr/local/mpich-3.2_64_cc
>>> --libdir=/usr/local/mpich-3.2_64_cc/lib64
>>> --includedir=/usr/local/mpich-3.2_64_cc/include64 CC=cc CXX=CC F77=f77
>>> FC=f95 CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 LDFLAGS=-m64
>>> -L/usr/lib/sparcv9 -R/usr/lib/sparcv9 --enable-fortran=yes
>>> --enable-cxx --enable-romio --enable-debuginfo --enable-smpcoll
>>> --enable-threads=multiple --with-thread-package=posix --enable-shared
>>> MPICH CC:       cc -m64   -O2
>>> MPICH CXX:      CC -m64  -O2
>>> MPICH F77:      f77 -m64
>>> MPICH FC:       f95 -m64  -O2
>>> tyr spawn 120
>>>
>>>
>>>
>>> tyr spawn 111 mpiexec -np 1 spawn_master
>>>
>>> Parent process 0 running on tyr.informatik.hs-fulda.de
>>>   I create 4 slave processes
>>>
>>> Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
>>> MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_slave",
>>> argv=0, maxprocs=4, MPI_INFO_NULL, root=0, MPI_COMM_WORLD,
>>> intercomm=ffffffff7fffde50, errors=0) failed
>>> MPIDI_Comm_spawn_multiple(274):
>>> MPID_Comm_accept(153).........:
>>> MPIDI_Comm_accept(1057).......:
>>> MPIR_Bcast_intra(1287)........:
>>> MPIR_Bcast_binomial(310)......: Failure during collective
>>>
>>>
>>>
>>>
>>> tyr spawn 112 mpiexec -np 1 spawn_multiple_master
>>>
>>> Parent process 0 running on tyr.informatik.hs-fulda.de
>>>   I create 3 slave processes.
>>>
>>> Fatal error in MPI_Comm_spawn_multiple: Unknown error class, error 
>>> stack:
>>> MPI_Comm_spawn_multiple(162)..: MPI_Comm_spawn_multiple(count=2,
>>> cmds=ffffffff7fffde08, argvs=ffffffff7fffddf8,
>>> maxprocs=ffffffff7fffddf0, infos=ffffffff7fffdde8, root=0,
>>> MPI_COMM_WORLD, intercomm=ffffffff7fffdde4, errors=0) failed
>>> MPIDI_Comm_spawn_multiple(274):
>>> MPID_Comm_accept(153).........:
>>> MPIDI_Comm_accept(1057).......:
>>> MPIR_Bcast_intra(1287)........:
>>> MPIR_Bcast_binomial(310)......: Failure during collective
>>>
>>>
>>>
>>>
>>> tyr spawn 113 mpiexec -np 1 spawn_intra_comm
>>> Parent process 0: I create 2 slave processes
>>> Fatal error in MPI_Comm_spawn: Unknown error class, error stack:
>>> MPI_Comm_spawn(144)...........: MPI_Comm_spawn(cmd="spawn_intra_comm",
>>> argv=0, maxprocs=2, MPI_INFO_NULL, root=0, MPI_COMM_WORLD,
>>> intercomm=ffffffff7fffded4, errors=0) failed
>>> MPIDI_Comm_spawn_multiple(274):
>>> MPID_Comm_accept(153).........:
>>> MPIDI_Comm_accept(1057).......:
>>> MPIR_Bcast_intra(1287)........:
>>> MPIR_Bcast_binomial(310)......: Failure during collective
>>> tyr spawn 114
>>>
>>>
>>> I would be grateful if somebody can fix the problem. Thank you very
>>> much for any help in advance. I've attached my programs. Please let
>>> me know if you need anything else.
>>>
>>>
>>> Kind regards
>>>
>>> Siegmar
>>>
>>>
>>> _______________________________________________
>>> discuss mailing listdiscuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20151008/989f5c63/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list