[mpich-discuss] Internal Error: invalid error code 409e10 (Ring ids do not match)

Jakub Łuczyński doubleloop at o2.pl
Mon Jun 2 18:09:41 CDT 2014


After installation of current version (3.1)

$ mpiexec -n 2 ~/tmp/opencl/msp-par.exe 10 10 1
Fatal error in PMPI_Reduce: Internal MPI error!, error stack:
PMPI_Reduce(1259)........: MPI_Reduce(sbuf=0x7fff40ad30c8,
rbuf=0x7fff40ad30e0, count=1, dtype=USER<struct>, op=0x98000000, root=0,
MPI_COMM_WORLD) failed
MPIR_Reduce_impl(1071)...:
MPIR_Reduce_intra(822)...:
MPIR_Reduce_impl(1071)...:
MPIR_Reduce_intra(877)...:
MPIR_Reduce_binomial(130):
MPIR_Localcopy(123)......: memcpy arguments alias each other,
dst=0x7fff40ad30e0 src=0x7fff40ad30c8 len=32
MPIR_Reduce_intra(842)...:
MPIR_Reduce_impl(1071)...:
MPIR_Reduce_intra(877)...:
MPIR_Reduce_binomial(246): Failure during collective


On Tue, Jun 3, 2014 at 12:25 AM, Gus Correa <gus at ldeo.columbia.edu> wrote:

> This is an old version of mpich.
> Is it perhaps still using the mpd ring?
> [If so, you need to start the mpd ring, if not already set,
> before you launch the job. But that method was phased out.]
> It may be worth updating to the latest mpich stable
> and use the current mpiexec (hydra) to launch the job.
>
> http://www.mpich.org/downloads/
> http://www.mpich.org/documentation/guides/
> http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
>
>
>
> On 06/02/2014 06:03 PM, Lu, Huiwei wrote:
>
>> Hi Kuba,
>>
>> Since it works with both Open MPI and BGP, it is most likely a problem of
>> your MPICH installation or your platform.
>>
>> We have stopped supporting the Windows platform a while ago due to lack
>> of developer resources. Please refer to our FAQ for more information:
>> http://wiki.mpich.org/mpich/index.php/Frequently_Asked_
>> Questions#Q:_Why_can.27t_I_build_MPICH_on_Windows_anymore.3F
>>
>> If it is on windows platform, we recommend you use Microsoft MPI, which
>> can be found here: http://msdn.microsoft.com/en-
>> us/library/bb524831(v=vs.85).aspx
>>
>> We also encourage you to use the latest MPICH on Linux or OSX platforms,
>> which can be downloaded here: http://www.mpich.org/downloads/
>>
>>>> Huiwei
>>
>> On Jun 2, 2014, at 4:49 PM, Jakub Łuczyński <doubleloop at o2.pl> wrote:
>>
>>  I wrote my assignment using MPI, and tested it both locally on Open MPI
>>> (1.6.5) and on IBM Blue Gene/P (with mpi implementation provided by IBM).
>>> Everything worked fine. Turns out that our solutions are tested also in our
>>> labs where MPICH, is installed:
>>>
>>> $ mpich2version
>>> MPICH2 Version:        1.4.1p1
>>>
>>> And when I run my solution there I get this strange error:
>>> $ mpirun -n 2 msp-par.exe 10 10 1
>>> Internal Error: invalid error code 409e10 (Ring ids do not match) in
>>> MPIR_Reduce_impl:1087
>>> Fatal error in PMPI_Reduce: Other MPI error, error stack:
>>> PMPI_Reduce(1270).....: MPI_Reduce(sbuf=0x7fff693a92e8,
>>> rbuf=0x7fff693a9300, count=1, dtype=USER<struct>, op=0x98000000, root=0,
>>> MPI_COMM_WORLD) failed
>>> MPIR_Reduce_impl(1087):
>>>
>>> I am literally out of ideas what is wrong!
>>>
>>> Below source code fragments (c++):
>>>
>>> struct msp_solution
>>> {
>>>     int x1, y1, x2, y2;
>>>     m_entry_t max_sum;
>>>     msp_solution();
>>>     msp_solution(const pair<int, int> &c1, const pair<int, int> &c2, int
>>> max_sum);
>>>     friend bool operator<(const msp_solution &s1, const msp_solution
>>> &s2);
>>> };
>>>
>>> void max_msp_solution(msp_solution *in, msp_solution *inout, int,
>>> MPI_Datatype*)
>>> {
>>>      *inout = max(*in, *inout);
>>> }
>>>
>>> // somewhere in code
>>> {
>>>      MPI_Datatype MPI_msp_solution_t;
>>>      MPI_Op max_msp_solution_op;
>>>
>>>      // create MPI struct from msp_solution
>>>      MPI_Datatype types[] = { MPI_INT, MPI_LONG_LONG_INT };
>>>      int block_lengths[] = { 4, 2 };
>>>      MPI_Aint base_addr, x1_addr, max_sum_addr;
>>>      MPI_Get_address(&collected_solution, &base_addr);
>>>      MPI_Get_address(&collected_solution.x1, &x1_addr);
>>>      MPI_Get_address(&collected_solution.max_sum, &max_sum_addr);
>>>
>>>      MPI_Aint displacements[] =
>>>      {
>>>          x1_addr - base_addr,
>>>          max_sum_addr - base_addr
>>>      };
>>>
>>>      MPI_Type_create_struct(2, block_lengths, displacements, types,
>>> &MPI_msp_solution_t);
>>>      MPI_Type_commit(&MPI_msp_solution_t);
>>>
>>>      // max reduction function
>>>      MPI_Op_create((MPI_User_function *) max_msp_solution, 1,
>>> &max_msp_solution_op);
>>>
>>>     ...
>>>
>>>      msp_solution solution, received_solution;
>>>      MPI_Comm comm,
>>>      ...
>>>      // comm is created using MPI_Comm_split
>>>      // solution is initialized
>>>      MPI_Reduce(&solution, &received_solution, 1, MPI_msp_solution_t,
>>> max_msp_solution_op , 0, MPI_COMM_WORLD);
>>>      // ERROR above!!!
>>> }
>>>
>>>
>>> Is there some error in this? How can I make it run?
>>> P.S. MPI_Send and MPI_Recv on my struct MPI_msp_solution_t seems to work
>>> fine
>>>
>>> Thanks in advance!
>>> Best regards,
>>> Kuba
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140603/bf3bb5e1/attachment.html>


More information about the discuss mailing list