[mpich-discuss] Internal Error: invalid error code 409e10 (Ring ids do not match)

Jakub Łuczyński doubleloop at o2.pl
Mon Jun 2 19:53:59 CDT 2014


Solved, it was my error..
int block_lengths[] = { 4, 2 };
should be:
int block_lengths[] = { 4, 1 };

Did not read man carefully.

There was overflow all the time. On MPICH it caused error, on openmpi and
BGP it was silent.


On Tue, Jun 3, 2014 at 1:15 AM, Jakub Łuczyński <doubleloop at o2.pl> wrote:

> P.S.
> for clarity
>
> typedef long long m_entry_t;
>
>
> On Tue, Jun 3, 2014 at 1:09 AM, Jakub Łuczyński <doubleloop at o2.pl> wrote:
>
>> After installation of current version (3.1)
>>
>> $ mpiexec -n 2 ~/tmp/opencl/msp-par.exe 10 10 1
>> Fatal error in PMPI_Reduce: Internal MPI error!, error stack:
>> PMPI_Reduce(1259)........: MPI_Reduce(sbuf=0x7fff40ad30c8,
>> rbuf=0x7fff40ad30e0, count=1, dtype=USER<struct>, op=0x98000000, root=0,
>> MPI_COMM_WORLD) failed
>> MPIR_Reduce_impl(1071)...:
>> MPIR_Reduce_intra(822)...:
>> MPIR_Reduce_impl(1071)...:
>> MPIR_Reduce_intra(877)...:
>> MPIR_Reduce_binomial(130):
>> MPIR_Localcopy(123)......: memcpy arguments alias each other,
>> dst=0x7fff40ad30e0 src=0x7fff40ad30c8 len=32
>> MPIR_Reduce_intra(842)...:
>> MPIR_Reduce_impl(1071)...:
>> MPIR_Reduce_intra(877)...:
>> MPIR_Reduce_binomial(246): Failure during collective
>>
>>
>> On Tue, Jun 3, 2014 at 12:25 AM, Gus Correa <gus at ldeo.columbia.edu>
>> wrote:
>>
>>> This is an old version of mpich.
>>> Is it perhaps still using the mpd ring?
>>> [If so, you need to start the mpd ring, if not already set,
>>> before you launch the job. But that method was phased out.]
>>> It may be worth updating to the latest mpich stable
>>> and use the current mpiexec (hydra) to launch the job.
>>>
>>> http://www.mpich.org/downloads/
>>> http://www.mpich.org/documentation/guides/
>>> http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
>>>
>>>
>>>
>>> On 06/02/2014 06:03 PM, Lu, Huiwei wrote:
>>>
>>>> Hi Kuba,
>>>>
>>>> Since it works with both Open MPI and BGP, it is most likely a problem
>>>> of your MPICH installation or your platform.
>>>>
>>>> We have stopped supporting the Windows platform a while ago due to lack
>>>> of developer resources. Please refer to our FAQ for more information:
>>>> http://wiki.mpich.org/mpich/index.php/Frequently_Asked_
>>>> Questions#Q:_Why_can.27t_I_build_MPICH_on_Windows_anymore.3F
>>>>
>>>> If it is on windows platform, we recommend you use Microsoft MPI, which
>>>> can be found here: http://msdn.microsoft.com/en-
>>>> us/library/bb524831(v=vs.85).aspx
>>>>
>>>> We also encourage you to use the latest MPICH on Linux or OSX
>>>> platforms, which can be downloaded here: http://www.mpich.org/
>>>> downloads/
>>>>
>>>>>>>> Huiwei
>>>>
>>>> On Jun 2, 2014, at 4:49 PM, Jakub Łuczyński <doubleloop at o2.pl> wrote:
>>>>
>>>>  I wrote my assignment using MPI, and tested it both locally on Open
>>>>> MPI (1.6.5) and on IBM Blue Gene/P (with mpi implementation provided by
>>>>> IBM). Everything worked fine. Turns out that our solutions are tested also
>>>>> in our labs where MPICH, is installed:
>>>>>
>>>>> $ mpich2version
>>>>> MPICH2 Version:        1.4.1p1
>>>>>
>>>>> And when I run my solution there I get this strange error:
>>>>> $ mpirun -n 2 msp-par.exe 10 10 1
>>>>> Internal Error: invalid error code 409e10 (Ring ids do not match) in
>>>>> MPIR_Reduce_impl:1087
>>>>> Fatal error in PMPI_Reduce: Other MPI error, error stack:
>>>>> PMPI_Reduce(1270).....: MPI_Reduce(sbuf=0x7fff693a92e8,
>>>>> rbuf=0x7fff693a9300, count=1, dtype=USER<struct>, op=0x98000000, root=0,
>>>>> MPI_COMM_WORLD) failed
>>>>> MPIR_Reduce_impl(1087):
>>>>>
>>>>> I am literally out of ideas what is wrong!
>>>>>
>>>>> Below source code fragments (c++):
>>>>>
>>>>> struct msp_solution
>>>>> {
>>>>>     int x1, y1, x2, y2;
>>>>>     m_entry_t max_sum;
>>>>>     msp_solution();
>>>>>     msp_solution(const pair<int, int> &c1, const pair<int, int> &c2,
>>>>> int max_sum);
>>>>>     friend bool operator<(const msp_solution &s1, const msp_solution
>>>>> &s2);
>>>>> };
>>>>>
>>>>> void max_msp_solution(msp_solution *in, msp_solution *inout, int,
>>>>> MPI_Datatype*)
>>>>> {
>>>>>      *inout = max(*in, *inout);
>>>>> }
>>>>>
>>>>> // somewhere in code
>>>>> {
>>>>>      MPI_Datatype MPI_msp_solution_t;
>>>>>      MPI_Op max_msp_solution_op;
>>>>>
>>>>>      // create MPI struct from msp_solution
>>>>>      MPI_Datatype types[] = { MPI_INT, MPI_LONG_LONG_INT };
>>>>>      int block_lengths[] = { 4, 2 };
>>>>>      MPI_Aint base_addr, x1_addr, max_sum_addr;
>>>>>      MPI_Get_address(&collected_solution, &base_addr);
>>>>>      MPI_Get_address(&collected_solution.x1, &x1_addr);
>>>>>      MPI_Get_address(&collected_solution.max_sum, &max_sum_addr);
>>>>>
>>>>>      MPI_Aint displacements[] =
>>>>>      {
>>>>>          x1_addr - base_addr,
>>>>>          max_sum_addr - base_addr
>>>>>      };
>>>>>
>>>>>      MPI_Type_create_struct(2, block_lengths, displacements, types,
>>>>> &MPI_msp_solution_t);
>>>>>      MPI_Type_commit(&MPI_msp_solution_t);
>>>>>
>>>>>      // max reduction function
>>>>>      MPI_Op_create((MPI_User_function *) max_msp_solution, 1,
>>>>> &max_msp_solution_op);
>>>>>
>>>>>     ...
>>>>>
>>>>>      msp_solution solution, received_solution;
>>>>>      MPI_Comm comm,
>>>>>      ...
>>>>>      // comm is created using MPI_Comm_split
>>>>>      // solution is initialized
>>>>>      MPI_Reduce(&solution, &received_solution, 1, MPI_msp_solution_t,
>>>>> max_msp_solution_op , 0, MPI_COMM_WORLD);
>>>>>      // ERROR above!!!
>>>>> }
>>>>>
>>>>>
>>>>> Is there some error in this? How can I make it run?
>>>>> P.S. MPI_Send and MPI_Recv on my struct MPI_msp_solution_t seems to
>>>>> work fine
>>>>>
>>>>> Thanks in advance!
>>>>> Best regards,
>>>>> Kuba
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140603/92ac1f6c/attachment.html>


More information about the discuss mailing list