[mpich-discuss] Internal Error: invalid error code 409e10 (Ring ids do not match)
Jakub Łuczyński
doubleloop at o2.pl
Mon Jun 2 18:15:05 CDT 2014
P.S.
for clarity
typedef long long m_entry_t;
On Tue, Jun 3, 2014 at 1:09 AM, Jakub Łuczyński <doubleloop at o2.pl> wrote:
> After installation of current version (3.1)
>
> $ mpiexec -n 2 ~/tmp/opencl/msp-par.exe 10 10 1
> Fatal error in PMPI_Reduce: Internal MPI error!, error stack:
> PMPI_Reduce(1259)........: MPI_Reduce(sbuf=0x7fff40ad30c8,
> rbuf=0x7fff40ad30e0, count=1, dtype=USER<struct>, op=0x98000000, root=0,
> MPI_COMM_WORLD) failed
> MPIR_Reduce_impl(1071)...:
> MPIR_Reduce_intra(822)...:
> MPIR_Reduce_impl(1071)...:
> MPIR_Reduce_intra(877)...:
> MPIR_Reduce_binomial(130):
> MPIR_Localcopy(123)......: memcpy arguments alias each other,
> dst=0x7fff40ad30e0 src=0x7fff40ad30c8 len=32
> MPIR_Reduce_intra(842)...:
> MPIR_Reduce_impl(1071)...:
> MPIR_Reduce_intra(877)...:
> MPIR_Reduce_binomial(246): Failure during collective
>
>
> On Tue, Jun 3, 2014 at 12:25 AM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>
>> This is an old version of mpich.
>> Is it perhaps still using the mpd ring?
>> [If so, you need to start the mpd ring, if not already set,
>> before you launch the job. But that method was phased out.]
>> It may be worth updating to the latest mpich stable
>> and use the current mpiexec (hydra) to launch the job.
>>
>> http://www.mpich.org/downloads/
>> http://www.mpich.org/documentation/guides/
>> http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
>>
>>
>>
>> On 06/02/2014 06:03 PM, Lu, Huiwei wrote:
>>
>>> Hi Kuba,
>>>
>>> Since it works with both Open MPI and BGP, it is most likely a problem
>>> of your MPICH installation or your platform.
>>>
>>> We have stopped supporting the Windows platform a while ago due to lack
>>> of developer resources. Please refer to our FAQ for more information:
>>> http://wiki.mpich.org/mpich/index.php/Frequently_Asked_
>>> Questions#Q:_Why_can.27t_I_build_MPICH_on_Windows_anymore.3F
>>>
>>> If it is on windows platform, we recommend you use Microsoft MPI, which
>>> can be found here: http://msdn.microsoft.com/en-
>>> us/library/bb524831(v=vs.85).aspx
>>>
>>> We also encourage you to use the latest MPICH on Linux or OSX platforms,
>>> which can be downloaded here: http://www.mpich.org/downloads/
>>>
>>> —
>>> Huiwei
>>>
>>> On Jun 2, 2014, at 4:49 PM, Jakub Łuczyński <doubleloop at o2.pl> wrote:
>>>
>>> I wrote my assignment using MPI, and tested it both locally on Open MPI
>>>> (1.6.5) and on IBM Blue Gene/P (with mpi implementation provided by IBM).
>>>> Everything worked fine. Turns out that our solutions are tested also in our
>>>> labs where MPICH, is installed:
>>>>
>>>> $ mpich2version
>>>> MPICH2 Version: 1.4.1p1
>>>>
>>>> And when I run my solution there I get this strange error:
>>>> $ mpirun -n 2 msp-par.exe 10 10 1
>>>> Internal Error: invalid error code 409e10 (Ring ids do not match) in
>>>> MPIR_Reduce_impl:1087
>>>> Fatal error in PMPI_Reduce: Other MPI error, error stack:
>>>> PMPI_Reduce(1270).....: MPI_Reduce(sbuf=0x7fff693a92e8,
>>>> rbuf=0x7fff693a9300, count=1, dtype=USER<struct>, op=0x98000000, root=0,
>>>> MPI_COMM_WORLD) failed
>>>> MPIR_Reduce_impl(1087):
>>>>
>>>> I am literally out of ideas what is wrong!
>>>>
>>>> Below source code fragments (c++):
>>>>
>>>> struct msp_solution
>>>> {
>>>> int x1, y1, x2, y2;
>>>> m_entry_t max_sum;
>>>> msp_solution();
>>>> msp_solution(const pair<int, int> &c1, const pair<int, int> &c2,
>>>> int max_sum);
>>>> friend bool operator<(const msp_solution &s1, const msp_solution
>>>> &s2);
>>>> };
>>>>
>>>> void max_msp_solution(msp_solution *in, msp_solution *inout, int,
>>>> MPI_Datatype*)
>>>> {
>>>> *inout = max(*in, *inout);
>>>> }
>>>>
>>>> // somewhere in code
>>>> {
>>>> MPI_Datatype MPI_msp_solution_t;
>>>> MPI_Op max_msp_solution_op;
>>>>
>>>> // create MPI struct from msp_solution
>>>> MPI_Datatype types[] = { MPI_INT, MPI_LONG_LONG_INT };
>>>> int block_lengths[] = { 4, 2 };
>>>> MPI_Aint base_addr, x1_addr, max_sum_addr;
>>>> MPI_Get_address(&collected_solution, &base_addr);
>>>> MPI_Get_address(&collected_solution.x1, &x1_addr);
>>>> MPI_Get_address(&collected_solution.max_sum, &max_sum_addr);
>>>>
>>>> MPI_Aint displacements[] =
>>>> {
>>>> x1_addr - base_addr,
>>>> max_sum_addr - base_addr
>>>> };
>>>>
>>>> MPI_Type_create_struct(2, block_lengths, displacements, types,
>>>> &MPI_msp_solution_t);
>>>> MPI_Type_commit(&MPI_msp_solution_t);
>>>>
>>>> // max reduction function
>>>> MPI_Op_create((MPI_User_function *) max_msp_solution, 1,
>>>> &max_msp_solution_op);
>>>>
>>>> ...
>>>>
>>>> msp_solution solution, received_solution;
>>>> MPI_Comm comm,
>>>> ...
>>>> // comm is created using MPI_Comm_split
>>>> // solution is initialized
>>>> MPI_Reduce(&solution, &received_solution, 1, MPI_msp_solution_t,
>>>> max_msp_solution_op , 0, MPI_COMM_WORLD);
>>>> // ERROR above!!!
>>>> }
>>>>
>>>>
>>>> Is there some error in this? How can I make it run?
>>>> P.S. MPI_Send and MPI_Recv on my struct MPI_msp_solution_t seems to
>>>> work fine
>>>>
>>>> Thanks in advance!
>>>> Best regards,
>>>> Kuba
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140603/324ca9f2/attachment.html>
More information about the discuss
mailing list