[mpich-discuss] BLCR kernel module not present

Pavan Balaji balaji at mcs.anl.gov
Fri Apr 26 15:25:44 CDT 2013


I can't speak for mvapich, of course.  I'm only speaking about mpich.
However, most of our derivatives don't destroy the features that are in
stock mpich.  So I'd think it'll work fine with mvapich as well.

 -- Pavan

On 04/26/2013 03:20 PM US Central Time, michael wrote:
> Thanks Pavan
> 
> Just to be clear, you're saying if I use mvapich with blcr then a
> running multi-node MPI job when killed (eg out of time) by a batch
> scheduler can be restarted (from which checkpoint?) presuming it doesn't
> have open files?
> 
> Many thanks, M
> 
> On Fri, 2013-04-26 at 15:12 -0500, Pavan Balaji wrote:
>> Michael,
>>
>> BLCR support for mpich should work fine; if something is broken, please
>> let us know.
>>
>> However, the core BLCR group itself hadn't released updates in a while,
>> primarily because they didn't have direct funding for it.  I believe
>> that's fixed now and they are working on newer releases.
>>
>>  -- Pavan
>>
>> On 04/26/2013 03:07 PM US Central Time, michael wrote:
>> > Hi folks
>> > I was wondering what the state of BLCR for mpich/mvapich is? eg how
>> > reliably can one presume it to be?
>> > Thanks, Michael
>> > 
>> > 
>> > On Fri, 2013-04-26 at 14:33 -0500, Wesley Bland wrote:
>> >> It looks like you might have missed installing the kernel module for
>> >> BLCR. What is the output of `lsmod`? 
>> >>
>> >>
>> >> Alternatively, if you installed BLCR by using apt-get in Ubuntu, you
>> >> should be able to use dkms to manage your kernel modules
>> >> automatically. Make sure you have the package 'blcr-dkms' installed
>> >> (you should be able to check this by typing `dims status`. 
>> >>
>> >>
>> >> Do either of those solutions solve your issue? 
>> >>
>> >>
>> >> Wesley 
>> >>
>> >> On Apr 26, 2013, at 1:55 PM, basma a.azeem
>> >> <basmaabdelazeem at hotmail.com <mailto:basmaabdelazeem at hotmail.com> <mailto:basmaabdelazeem at hotmail.com>> wrote: 
>> >>
>> >>>
>> >>>
>> >>> Thank you for your help
>> >>>
>> >>>
>> >>> i installed BLCR 0.8.5 on my ubuntu 12.10  to be used for MPICH -3.0.3
>> >>> this version of blcr should support to kernels through 3.7.1
>> >>>
>> >>> when i run the command :
>> >>> basma at basma-Satellite-A500:~$ mpiexec --info
>> >>>
>> >>> results:
>> >>>
>> >>> HYDRA build details:
>> >>>     Version:                                 3.0.3
>> >>>     Release Date:                            Thu Mar 28 16:01:21 CDT 2013
>> >>>     CC:                              gcc    
>> >>>     CXX:                             c++    
>> >>>     F77:                             no   
>> >>>     F90:                             no   
>> >>>     Configure options:                      
>> >>> '--disable-option-checking' '--prefix=/home/basma/mpich2-install'
>> >>> '--disable-f77' '--disable-fc' '--enable-checkpointing'
>> >>> '--with-hydra-ckpointlib=blcr' '--cache-file=/dev/null' '--srcdir=.'
>> >>> 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= ' 'LIBS=-lrt -lcr -lpthread '
>> >>> 'CPPFLAGS= -I/home/basma/libraries/mpich-3.0.3/src/mpl/include
>> >>> -I/home/basma/libraries/mpich-3.0.3/src/mpl/include
>> >>> -I/home/basma/libraries/mpich-3.0.3/src/openpa/src
>> >>> -I/home/basma/libraries/mpich-3.0.3/src/openpa/src
>> >>> -I/home/basma/libraries/mpich-3.0.3/src/mpi/romio/include'
>> >>>     Process Manager:                         pmi
>> >>>     Launchers available:                     ssh rsh fork slurm ll
>> >>> lsf sge manual persist
>> >>>     Topology libraries available:            hwloc
>> >>>     Resource management kernels available:   user slurm ll lsf sge
>> >>> pbs cobalt
>> >>>     Checkpointing libraries available:       blcr
>> >>>     Demux engines available:                 poll select
>> >>>
>> >>> so i thought that every thing is ok but when i try to rum mpiexec it
>> >>> failed:
>> >>>
>> >>> basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr
>> >>> -ckpoint-prefix /home/business/ckpts/app.ckpoint -ckpoint-interval
>> >>> 3600  -n 4 /home/basma/libraries/mpich-3.0.3/examples/cpi
>> >>>
>> >>> results:
>> >>>
>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>> >>> MPIR_Init_thread(433)...: 
>> >>> MPID_Init(151)..........: channel initialization failed
>> >>> MPIDI_CH3_Init(70)......: 
>> >>> MPID_nem_init(379)......: 
>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>> >>> MPIR_Init_thread(433)...: 
>> >>> MPID_Init(151)..........: channel initialization failed
>> >>> MPIDI_CH3_Init(70)......: 
>> >>> MPID_nem_init(379)......: 
>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>> >>> MPIR_Init_thread(433)...: 
>> >>> MPID_Init(151)..........: channel initialization failed
>> >>> MPIDI_CH3_Init(70)......: 
>> >>> MPID_nem_init(379)......: 
>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>> >>> Fatal error in MPI_Init: Other MPI error, error stack:
>> >>> MPIR_Init_thread(433)...: 
>> >>> MPID_Init(151)..........: channel initialization failed
>> >>> MPIDI_CH3_Init(70)......: 
>> >>> MPID_nem_init(379)......: 
>> >>> MPIDI_nem_ckpt_init(153): BLCR kernel module not present
>> >>>
>> >>> ===================================================================================
>> >>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >>> =   EXIT CODE: 1
>> >>> =   CLEANING UP REMAINING PROCESSES
>> >>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >>> ===================================================================================
>> >>>
>> >>>
>> >>>  
>> >>> i am a Linux and parallel programming beginner
>> >>>
>> >>> Thank you
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org> <mailto:discuss at mpich.org>
>> >>> To manage subscription options or unsubscribe:
>> >>> https://lists.mpich.org/mailman/listinfo/discuss 
>> >>
>> >>
>> >> _______________________________________________
>> >> discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org> <mailto:discuss at mpich.org>
>> >> To manage subscription options or unsubscribe:
>> >> https://lists.mpich.org/mailman/listinfo/discuss
>> > 
>> > 
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org>
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> > 
>>
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list