[mpich-discuss] mpi assertion error

Danilo apeironoriepa at aol.com
Fri Jun 28 10:28:28 CDT 2013


Hi Jeff,
the program was tested intensively on the previous cluster. The changes made are in scatter/gather (due to sendbuf and recvbuff that has to be differente in this version it seems..). The other main change is due to hydra, because on the previous cluster there wasn't such a process management system. But I'm quite new to programming, so I don't know...
 

Thanks for your help.

Regards

 

-----Original Message-----
From: Jeff Hammond <jeff.science at gmail.com>
To: discuss <discuss at mpich.org>
Sent: Fri, Jun 28, 2013 5:12 pm
Subject: Re: [mpich-discuss] mpi assertion error


Null buffer assertions are suggestive of incorrect programs.  Can you
share the source of this program?

As for the inline vs attached files debate, I think that pastebin is a
superior option for large output since it is plain-text readable from
any internet-enabled device and doesn't lead to huge messages on the
list.  But for short messages, inlining is definitely good for email
reading on phones.

Jeff

On Fri, Jun 28, 2013 at 9:46 AM, Danilo <apeironoriepa at aol.com> wrote:
> In the last topic I read it was asked more that once to zip the files and I
> did it.. By the way, this is the first error:
> Assertion failed in file helper_fns.c at line 361: ((((char *) sendbuf +
> sendtype_true_lb))) != NULL
> internal ABORT - process 2
>
> Starting from the second execution I get:
>
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
>
> HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed)
> failed
> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
> returned error status
> main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed)
> failed
> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
> returned error status
> main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:928): assert (!closed)
> failed
> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
> returned error status
> main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
> HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one
> of the processes terminated badly; aborting
> HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23):
> launcher returned error waiting for completion
> HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher
> returned error waiting for completion
> main (./ui/mpich/mpiexec.c:405): process manager error waiting for
> completion
>
>
> Regards,
> Danilo
>
>
> -----Original Message-----
> From: Wesley Bland <wbland at mcs.anl.gov>
> To: discuss <discuss at mpich.org>
> Sent: Fri, Jun 28, 2013 3:22 pm
> Subject: Re: [mpich-discuss] mpi assertion error
>
> Can you just copy paste your error into the email? Most of us will probably
> not be all that excited about opening up strange tarballs attached to an
> email. Also, we get these emails on our phones and tablets where unzipping
> source code isn't as much of an option.
>
> Wesley
>
> On Jun 28, 2013, at 8:17 AM, Danilo <apeironoriepa at aol.com> wrote:
>
> Good afternoon,
>
> I wrote a little application in C to compute 2D fft. This app was firstly
> executed on a cluster on which it was installed 2007 mpi version (don't
> remember the package name) and then adapted for a different cluster with mpi
> 1.4.1 (had to change the scatter/gather because in the previous version I
> could use the same buffer for both sendbuff and recvbuff). By the way, when
> executing with 2 processes it works fine. When trying with 4/8/16/32 and so
> on it gives firstly an assertion error as shown in the file attached, and
> starting from the second time you try to run it on more than 2 procs it
> gives error code 139. The error I'm talking about appears just when you run
> it with "realDim=16384" (it means that you have 16384 rows and 16384x2
> columns since it is designed for real/imaginary numbers). I know the code is
> working since it was all ok on the previous cluster (even with 4-8-16-32
> procs) and I can't find out which is the problem now.. Can you help?
>
> As said attached you can find my application as well as the errors appearing
> and mpi info..
>
> Regards,
> Danilo
> <error+app.tar.gz>_______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130628/73915fbf/attachment.html>


More information about the discuss mailing list