SPW 2.3.9 Install Procedures Parallel Geoscience Corporation support@parallelgeo.com ------------------------------------ Compatibility ------------------------------------ This release was built using the following Linux distribution: CentOS Linux 4 Update 6 (kernel 2.6.9-67.EL) It is compatible with: Red Hat Enterprise Linux 4.6 ------------------------------------ Installation ------------------------------------ 1. From this directory, move the spw directory into your login directory using the following commands: $ cd SPW-2.3.9-rhel4 $ mv spw ~/. *Note: if this is not the first time SPW is being installed on the system, you may want to back up the exec.conf file that is located in the old spw directory before moving this spw directory into place. Otherwise, you will have to reconfigure the exec.conf file after installation. 2. Several optional installations and/or configurations may need to be performed. Sections A - H below describe each of the optional installations/configurations in detail. In order for SPW applications to run properly, each of the following installations/configurations must be in place: A. Optional runtime libraries B. Sentinel SuperPro system driver. C. Linux network security pertaining to rsh access D. Shell environment variables E. Device files for SCSI tape drive access F. Environment variable for SCSI tape block size limit G. Executor config file, exec.conf H. xterm system utility *Note: these optional installations and configurations only need to be done the first time SPW is installed on a given system. It is also possible that after a Linux system upgrade some of the configurations will have been changed. These should be the only times that the installations and configurations need to be done. 3. After all of the optional installations and/or configurations are completed, the install directory (e.g. SPW-2.3.9-rhel4) itself can be deleted, along with its remaining contents. The entire directory can be removed using the following commands: $ cd $ \rm -r SPW-2.3.9-rhel4 ------------------------------------------------------------------------ A. Optional runtime libraries ------------------------------------------------------------------------ Motif and GTK runtime libraries are required to run SPW. A graphical login session must be available, along with the versions of Motif and GTK that are appropriate for the RHEL 4 system. Compatibility libraries are provided with this install in the event that the system on which SPW is running does not have the correct version of Motif or GTK libraries installed. If the compatibility libraries are required for the installed system, it will be essential that the envrionment variables (discussed in item D below) are set correctly for the LD_LIBRARY_PATH environment variable. ------------------------------------------------------------------------ B. Sentinel system driver ------------------------------------------------------------------------ The Sentinel System Driver is required by the Sentinel SuperPro license key for running applications in non-demo mode. The USB driver is the only driver for Sentinel SuperPro keys which can be used with this release. It can be installed by running the drvr_install.sh script located in the SPW-2.3.9-rhel4/sentinel directory. The script must be run with super-user authority in order for the drivers to be installed properly. An example command is as follows: $ su # sh sud_install.sh Follow the prompts during script execution to install the USB driver. ------------------------------------------------------------------------ C. Linux network security files ------------------------------------------------------------------------ *********************************************************************** **** **** **** Only required for parallel processing on a network of PC's **** **** **** *********************************************************************** Several Linux network security related files must be properly configured in order to run the executor in a parallel processing environment. There are several files that have been provided in the install package as examples of properly configured network security files. 1. rsh access The system must be enabled to allow rsh access. In order for rsh access to be activated, the rsh-server package must be installed. To verify this, look for the rsh and rlogin files in the /etc/xinetd.d directory. If the package is not already installed on your system, it must be added to the system before parallel processing will be functional. After installing rsh-server, configure as described below, then restart the xinetd daemon. The rsh and rlogin files need to be checked to verify that the entry, 'disable = ', is set to 'no'. The files in the /etc/xinetd.d directory can be examined with a text editor, and edited if necessary (as super-user). If the file requires editing, the xinetd daemon can be restarted without rebooting the system with the following commands (again as super-user): # cd /etc/rc.d/init.d # ./xinetd restart 2. 'hosts.equiv' This file will need to exist in the /etc directory. The format of this file is as follows: hostname.domain user The first field, 'hostname.domain', should contain the fully qualified domain name of all hosts which have SPW software and will participate in parallel processing. The second field, 'user', should contain the name of the user account which will be used while running the executor/execstream programs. *Note: to obtain the correct hostname.domain for your system, the command "hostname" can be run from the command line in a shell. If no domain has been specified for the system, only the hostname part of the name will exist. In that case, the .domain extension will not be needed in the hosts.equiv file. As an example, the hosts.equiv file on earth would look like the following: earth.acompany.com pgc mars.acompany.com pgc venus.acompany.com pgc mercury.acompany.com pgc (the network domain in this case is acompany.com) ------------------------------------------------------------------------ D. Shell environment variables ------------------------------------------------------------------------ A number of environment variables will need to be set which will be used by the SPW applications. The syntax of the commands for setting the variables will depend on the type of shell being used by the login account. Listed here are the commands for setting the variables in a bash shell or a c-shell: 1. bash $ export SPWHOME=$HOME/spw $ export IOHOME=$HOME/spw * $ export STREAMHOME=$SPWHOME $ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:${SPWHOME}:${SPWHOME}/spwGtk:${SPWHOME}/spwMotif" $ export MPIRUN_DEVICE="ch_p4" $ export ST_BUFFER="256" ** $ export XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB 2. csh $ setenv SPWHOME $HOME/spw $ setenv IOHOME $HOME/spw * $ setenv STREAMHOME $SPWHOME $ setenv LD_LIBRARY_PATH "$LD_LIBRARY_PATH:${SPWHOME}:${SPWHOME}/spwGtk:${SPWHOME}/spwMotif" $ setenv MPIRUN_DEVICE "ch_p4" $ setenv ST_BUFFER "256" ** $ setenv XKEYSYMDB /usr/X11R6/lib/X11/XKeysymDB In order for these environment variables to be available in any given shell, you must add these command lines to the .bashrc or .cshrc file. You will find it in your login directory by entering ls -a. You can edit it with any text editor. Keep in mind that these environment variables won't take effect until you either run these commands in your current shell, or open a new shell after adding these commands to your shell resource file. *Note: the STREAMHOME environment variable needs to be set to the directory which holds the execstream software on the stream nodes. For simplicity's sake, it is best to install the execstream software in a directory called 'spw' in the login directory of the user account on the secondary nodes, which will effectively mirror the installed directory path on the primary node. Therefore, STREAMHOME can be set to SPWHOME. Although, it is possible for the directory structure to be different than this. **Note: the XKEYSYMDB environment variable needs to be set to the location of the XKeysymDB file on the Linux system onto which SPW is being installed. The user needs to locate this file (which is stored in different locations on different Linux distributions), and set the environment variable accordingly. ------------------------------------------------------------------------ E. Device files for SCSI tape drive access ------------------------------------------------------------------------ There are a number of files in the /dev directory that will need to have the correct permissions set in order for regular users to access SCSI devices. Use the following commands to ensure that all of the st and nst device files have the proper permissions set : $ su # cd /dev # chmod 666 st[0-9]* # chmod 666 nst[0-9]* ------------------------------------------------------------------------ F. Environment variable for SCSI tape block size limit ------------------------------------------------------------------------ The environment variable 'ST_BUFFER', which is set in step D above, is used by IO Utility to determine the maximum block size that will be handled during scsi i/o. The default setting of 256 (kb) should be adequate for most tape i/o involving trace sequential seismic data. If the maximum block size for scsi i/o needs to be larger, changing this environment variable and restarting IO Utility will increase the limit. ------------------------------------------------------------------------ G. Executor configuration file (exec.conf) ------------------------------------------------------------------------ This file must reside in the same directory as the 'executor' application. The executor will not run in 'Parallel' mode without it. The entries contained within the file are used to identify: 1. the host (computer) and program filename that will be used by the executor. 2. the host(s) and program path/filename that will be used by the streams. The format of this file is as follows: hostname 0 executor hostname 1 $STREAMHOME/execstream hostname 1 $STREAMHOME/execstream hostname 1 $STREAMHOME/execstream In addition to the entry for the executor, there needs to be one entry for every stream process that will participate in the parallel processing. It is possible (and can be desirable) to run more than a single stream 'process' on another host on the network. Every line other than the first will cause an instance of the execstream program (a process) to be created on the specified host, and each of these stream 'processes' will contribute to the parallel processing that will be done during execution of the flow. Below is an example file for a small network or cluster of PC's: earth 0 executor mars 1 $STREAMHOME/execstream mars 1 $STREAMHOME/execstream venus 1 $STREAMHOME/execstream Edit the file exec.conf so that it contains the correct hostname for the host running the executor, and the correct hostname(s) of the host(s) running the stream(s). Delete any unneeded lines from the existing exec.conf file. *Note: the executable file 'execstream' must exist at the location given by the path string in the exec.conf file for the streams. In the example above, the path is $STREAMHOME, which was set as an environment variable in section D above. The most straightforward way to install spw is to put the executables in a directory called spw that is directly within the login directory of the user account that will be running the software. ------------------------------------------------------------------------ H. xterm system utility ------------------------------------------------------------------------ Job execution is launched by the FlowChart application within an xterm window. It is essential that the xterm system utility be installed on the system that will be running processing flows in SPW. To verify that the xterm utility is available, the following command can be used: $ which xterm /usr/bin/xterm $ If the response to the which command does not indicate that the program is available on the system, it must be installed before attempting to execute processing jobs in SPW. ------------------------------------------------------------------------ Performance considerations in the parallel processing environment ------------------------------------------------------------------------ It is up to the user to determine the optimal number of stream processes running on a given computer. It is possible to run as many stream processes as desired on a particular machine, but once memory or cpu cycle time becomes saturated, the benefit of more streams is lost. PGC recommends that the executor be configured initially with one stream process per cpu involved in the processing. After becoming familiar with the runtimes of the processing steps, the user is encouraged to experiment with a different number of stream processes per host. Following are some points to consider when approaching the distribution of the processing workload in a networked environment: * By convention, a 'node' typically refers to a piece of hardware (a computer). When configuring a parallel processing scheme with SPW, there may not always be a one to one correspondence between the number of nodes on the network, and the number of stream processes being used by executor.It is up to the user to decide how to optimize the use of resources on the network. * If a particular host has dual cpu's, two streams running on that host should perform just about as well as two streams running on two single cpu hosts. * Most of the processing done by the executor does not require a great deal of overhead in terms of memory used by each stream. However, two exceptions to this are the pstm (Pre-Stack Time Migration) step and the 3D dmo step. Because of the nature of their algorithms, each of the streams participating in the processing require a fairly large amount of memory. The actual amount of memory depends upon data set size and migration/dmo aperture. This must be kept in mind when planning the configuration of streams when running a migration, because a stream configuration that works well for regular processing may choke on RAM if you have several streams designated to run on a single host.