[mpich-discuss] BLCR kernel module not present
michael
michael.bane at manchester.ac.uk
Fri Apr 26 15:20:15 CDT 2013
Thanks Pavan
Just to be clear, you're saying if I use mvapich with blcr then a
running multi-node MPI job when killed (eg out of time) by a batch
scheduler can be restarted (from which checkpoint?) presuming it doesn't
have open files?
Many thanks, M
On Fri, 2013-04-26 at 15:12 -0500, Pavan Balaji wrote:
> Michael,
>
> BLCR support for mpich should work fine; if something is broken, please
> let us know.
>
> However, the core BLCR group itself hadn't released updates in a while,
> primarily because they didn't have direct funding for it. I believe
> that's fixed now and they are working on newer releases.
>
> -- Pavan
>
> On 04/26/2013 03:07 PM US Central Time, michael wrote:
> > Hi folks
> > I was wondering what the state of BLCR for mpich/mvapich is? eg how
> > reliably can one presume it to be?
> > Thanks, Michael
> >
> >
> > On Fri, 2013-04-26 at 14:33 -0500, Wesley Bland wrote:
> >> It looks like you might have missed installing the kernel module for
> >> BLCR. What is the output of `lsmod`?
> >>
> >>
> >> Alternatively, if you installed BLCR by using apt-get in Ubuntu, you
> >> should be able to use dkms to manage your kernel modules
> >> automatically. Make sure you have the package 'blcr-dkms' installed
> >> (you should be able to check this by typing `dims status`.
> >>
> >>
> >> Do either of those solutions solve your issue?
> >>
> >>
> >> Wesley
> >>
> >> On Apr 26, 2013, at 1:55 PM, basma a.azeem
> >> <basmaabdelazeem at hotmail.com <mailto:basmaabdelazeem at hotmail.com>> wrote:
> >>
> >>>
> >>>
> >>> Thank you for your help
> >>>
> >>>
> >>> i installed BLCR 0.8.5 on my ubuntu 12.10 to be used for MPICH -3.0.3
> >>> this version of blcr should support to kernels through 3.7.1
> >>>
> >>> when i run the command :
> >>> basma at basma-Satellite-A500:~$ mpiexec --info
> >>>
> >>> results:
> >>>
> >>> HYDRA build details:
> >>> Version: 3.0.3
> >>> Release Date: Thu Mar 28 16:01:21 CDT 2013
> >>> CC: gcc
> >>> CXX: c++
> >>> F77: no
> >>> F90: no
> >>> Configure options:
> >>> '--disable-option-checking' '--prefix=/home/basma/mpich2-install'
> >>> '--disable-f77' '--disable-fc' '--enable-checkpointing'
> >>> '--with-hydra-ckpointlib=blcr' '--cache-file=/dev/null' '--srcdir=.'
> >>> 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= ' 'LIBS=-lrt -lcr -lpthread '
> >>> 'CPPFLAGS= -I/home/basma/libraries/mpich-3.0.3/src/mpl/include
> >>> -I/home/basma/libraries/mpich-3.0.3/src/mpl/include
> >>> -I/home/basma/libraries/mpich-3.0.3/src/openpa/src
> >>> -I/home/basma/libraries/mpich-3.0.3/src/openpa/src
> >>> -I/home/basma/libraries/mpich-3.0.3/src/mpi/romio/include'
> >>> Process Manager: pmi
> >>> Launchers available: ssh rsh fork slurm ll
> >>> lsf sge manual persist
> >>> Topology libraries available: hwloc
> >>> Resource management kernels available: user slurm ll lsf sge
> >>> pbs cobalt
> >>> Checkpointing libraries available: blcr
> >>> Demux engines available: poll select
> >>>
> >>> so i thought that every thing is ok but when i try to rum mpiexec it
> >>> failed:
> >>>
> >>> basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr
> >>> -ckpoint-prefix /home/business/ckpts/app.ckpoint -ckpoint-interval
> >>> 3600 -n 4 /home/basma/libraries/mpich-3.0.3/examples/cpi
> >>>
> >>> results:
> >>>
> >>> Fatal error in MPI_Init: Other MPI error, error stack:
> >>> MPIR_Init_thread(433)...:
> >>> MPID_Init(151)..........: channel initialization failed
> >>> MPIDI_CH3_Init(70)......:
> >>> MPID_nem_init(379)......:
> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
> >>> Fatal error in MPI_Init: Other MPI error, error stack:
> >>> MPIR_Init_thread(433)...:
> >>> MPID_Init(151)..........: channel initialization failed
> >>> MPIDI_CH3_Init(70)......:
> >>> MPID_nem_init(379)......:
> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
> >>> Fatal error in MPI_Init: Other MPI error, error stack:
> >>> MPIR_Init_thread(433)...:
> >>> MPID_Init(151)..........: channel initialization failed
> >>> MPIDI_CH3_Init(70)......:
> >>> MPID_nem_init(379)......:
> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
> >>> Fatal error in MPI_Init: Other MPI error, error stack:
> >>> MPIR_Init_thread(433)...:
> >>> MPID_Init(151)..........: channel initialization failed
> >>> MPIDI_CH3_Init(70)......:
> >>> MPID_nem_init(379)......:
> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
> >>>
> >>> ===================================================================================
> >>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >>> = EXIT CODE: 1
> >>> = CLEANING UP REMAINING PROCESSES
> >>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >>> ===================================================================================
> >>>
> >>>
> >>>
> >>> i am a Linux and parallel programming beginner
> >>>
> >>> Thank you
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130426/693d867a/attachment.html>
More information about the discuss
mailing list