Sun ONE Grid Engine, Parallel Environment Integration for SGE with LAM

updated: 7/29/03

This is a test version of an SGE and LAM MPI intergration package. Its only been minimally tested with MPI codes. Full LAM functionality was not yet tested as of 7/22/03. This release is intended to get the code into the hands of users for testing this initial release and providing early feedback.

This code was tested against SGE 5.3p2 and LAM 6.5.9 on Solaris 8.

Download Here: sge-lam.tar

For updates and info regarding this code email: christopher.duncan@sun.com

This code is provided AS-IS with no implied warranty or support.

Tar Dist File Contents

README.sge-lam

Directions and info

qrsh-lam

qrsh wrapper used for remote lamboot and for local lamd

sge-lam

SGE compatible lamboot and lamhalt for use in start_proc_args and stop_proc_ags for and SGE PE


Setup Instructions:

  1. Install LAM MPI and SGE. This code was tested against SGE 5.3p2 and LAM 6.5.9 and should work with later releases. It may work with earlier versions of SGE and LAM.
    NOTE: make sure your shell startup env has both the LAM and SGE bin dirs in your path.

  2. Install the 2 PERL executables: qrsh-lam, sge-lam inside the LAM installation bin dir. Make sure they are executable.

  3. Modify the variables: LAMBINDIR and SGEBINDIR in sge-lam and qrsh-lam to fit your site setup. The variables will depend on your installation of SGE and LAM.

  4. Create an SGE PE that can be used to submit lam jobs. The following is an example assuming the scripts exist in /usr/local/lam/bin. You should replace the queue_list and slots with your site specific values.  

        % qconf -sp lammpi 
        pe_name lammpi
        queue_list hpc-v880.q polarbear.q
        slots 6
        user_lists NONE
        xuser_lists NONE
        start_proc_args /usr/local/lam/bin/sge-lam start
        stop_proc_args /usr/local/lam/bin/sge-lam stop
        allocation_rule $fill_up
        control_slaves TRUE
        job_is_first_task TRUE

    NOTE: It is probably easiest to use the qmon GUI to create the PE.

  1. Modify your LAM boot schema to use qrsh-lam. This is normally in the file $LAMHOME/etc/lam-conf.lam. You need to give a path to qrsh-lam and lamd for the boot schema. Normally this would be something like:

       lamd $inet_topo $debug

       instead change this to (assuming your LAMBINDIR is /usr/local/lam/bin):

     /usr/local/lam/bin/qrsh-lam local /usr/local/lam/bin/lamd $inet_topo $debug

  2. With this PE setup users can submit jobs as normal and do not need to lamboot on their own. Users need only call mpirun for their MPI programs. Here is an example job:

        % cat lamjob.csh
        #$ -cwd
        set path=(/usr/local/lam/bin $path)
        echo "Starting my LAM MPI job"
        mpirun C conn-60
        echo "LAM MPI job done"

Using the C arg to mpirun is the easiest way to create a spanning MPI job that uses all the allocated
slots for MPI.


Current Issues:


TODO: