4. Configuring resources

Applications

For our purposes, an "application" is any scientific code that takes a text-based parameter file and perhaps one or more data files as input, runs noninteractively, and produces a set of output files, including a text log file and checkpoint files which can be used to restart the code.  In order to let Teuthis know how to work with an application, you must configure it using the applications dialog, which is accessed through the Settings menu in the main window.  An example application dialog is shown in Figure 1.

Editing application profiles
Figure 1.  Editing application profiles.

Configured applications are listed in the scrolled window on the left.  Selecting one of them fills the fields on the right with the configuration data for that application.  New applications can be added to the list by clicking on the "New" button at the bottom of the dialog.  Applications can be cloned by clicking on the "Clone" button; this will create a copy of the application with "Copy" appended to the application name.  Applications can also be removed from the list by clicking the "Remove" button.  Action buttons like "New," "Clone," and "Remove" take effect immediately and are not cancelled if the dialog is exited via the "Cancel" button.

Descriptive fields

The application name is used to populate the application pulldown list in the experiment dialog, so it is best to keep it short.  It is also used to name the build directory on the execution machine if the "Build executable" box is checked.  The name may include spaces; these will be "munged" into underscores when naming directories.  (The same happens when using the name of a job to create a unique run directory.)  Two distinct applications may have the same name, but this will be confusing and may have unpredictable results.

The description field is purely informative and is not used by Teuthis at this time.

Building and configuring

If your application needs to be built from source code, Teuthis can manage the configuration and building of the application's executable file, or else you can log in to each of your remote execution machines and build it by hand.  If Teuthis is to manage the build, you must check the "Build executable" box.  You will need to select a path on your local machine where the source code may be found.  Ideally this will be a directory into which you have checked out a copy of the code from a source code repository such as CVS or Subversion; however, this is not a requirement.  You will also need to specify the name of the configuration command (if any) and the name of the file containing configuration information (if any).  The configuration command should be as generic as possible (e.g., ./configure --parallel); you will have the opportunity to add arguments within the experiment dialog before issuing the command for each experiment.  For some codes no configuration is necessary; instead, you might expect to upload a local makefile named "Makefile.myproject," rename it "Makefile," and then build the code.  In such cases you should leave the "Configure command" field blank and fill in the "Configuration file" field with the name the makefile should be given on the remote machine.  Then use the configuration file selection and upload function of the experiment dialog to choose and upload your makefile.

If Teuthis is managing the build, you should also fill in the "Build command" field.  For most codes this will be make or gmake; others may use tools such as ant or cmake.  By checking the box labeled "Move to remote exec dir" you can have the executable moved to the executable directory you have configured for the machine on which the build is to take place.  If you check this box, jobs using this application will expect to run the executable from the executable directory.  Teuthis will expect the build process to leave the executable in the build directory for "pickup."  If you do not check the box, jobs will attempt to run the executable straight from the build directory.  In either case the "Executable name" field should contain just the name of the file that is produced by the build process.

If Teuthis is not managing the build, you should leave the "Build executable" and "Move to remote exec dir" buttons unchecked.  The "Executable name" field should contain either the name of the executable if it is in your default path or the absolute path to the executable if it is not.

If your code is a parallel code, check the "Parallel executable" box.  Job scripts using this application will prefix your executable's name with the parallel execution command configured for the execution machine.

Parameter files and restarting

For each job, Teuthis will create a parameter file in the job's run directory with the name given in the "Parameter file" field.  Your code should expect to find this file in the directory from which it is run, either by default or through the use of command-line options (which can be specified in the experiment dialog).  (As with the configuration file, this need not be the local name you use for your parameter file templates.)  Teuthis will expect your code to generate a log file with the name given in the "Log file" field.  This file will be downloaded and stored within your project file when you click on the "View log file" button or menu item associated with jobs using this application.

Teuthis can automatically create restart jobs for you with a little help from your application.  Three methods are supported at present.  By default, Teuthis will expect a control file with a specific name to exist in the run directory when you ask to continue a job.  If you choose this option and leave the control file entry blank, Teuthis will not look for any specific file and will simply create a job that re-executes the original job.  It will be up to your application to detect that it is running a restart job.  The second option is to specify a special command-line argument (e.g., "--restart-from checkpoint_file").  The third option is for Teuthis to expect your code to write a special version of the runtime file after each successful checkpoint.  The name of the file is specified in the space supplied.  It should contain all of the original runtime parameter settings for the run plus any settings needed to cause the application to restart from the checkpoint just written.  When asked to continue a job, Teuthis will download this file and use it for the runtime parameter settings of the restart job.

Settings for common applications

In the table below we list some of the applications that have been used with Teuthis and the suggested settings for them.  If your application isn't listed and you find some settings that work, please let us know and we will add your application to the list.  If you are reading this document as part of your Teuthis distribution, please see the online user's guide for the latest version of this table.  Note that these suggestions only cover the basic operation of each code; codes may have additional capabilities that are enabled by configuration or make arguments.  Please see your application's documentation for any special features.

Application
Exec name and args
Config file
Config command
Build command
Move executable
Parallel
Parameter file
Log file
Auto restart method
Notes
FLASH 2.x
flash2
Modules
./setup
gmake
optional
yes
flash.par
flash.log
Copy from flash.par.restart; or command line argument "-chk_file" followed by manual addition of checkpoint file name
flash.par.restart not available in standard distribution; need patch
Need to set up site directory for remote site
Leave log_file parameter unset
Gadget 2
Gadget2 gadget.param
Makefile
N/A
gmake
optional
yes
gadget.param
info.txt
Command line argument "1"
Upload custom makefile as your configuration file
Use "." for OutputDir parameter
Leave InfoFile parameter unset
Enzo 1.0.1
enzo.exe EnzoParms
N/A
./configure --bindir=XX
cd amr_mpi/src; gmake mach-YY; gmake; gmake install
yes
yes
EnzoParms
OutputLevelInformation
Command line argument "-r"; must manually add name of last restart file
XX = absolute path to build directory
Need to set up Make.mach.YY file for remote machine; place in config directory
Hydra 4.2
hydra
makeflags
N/A
make clean; make
yes
no
prun.dat
pr0001.log
None; manually edit prun.dat
May need to create a new src/system.YY file for remote machine YY
Modify src/dumpdata.F, src/readdata.F, and src/gravsubs.F to read/write to ./ rather than data/ 
To change array sizes, edit include/psize.inc on local machine and sync source
Upload custom makeflags file as your configuration file; set RUNDIR to ".."
Use 0001 as run number in prun.dat


Machines

At present, Teuthis does not support dynamic resource discovery over the Grid.  Hence you, the user, must create profiles for each of the machines you expect to use with Teuthis.  This is done through the machines dialog, which is accessed through the Settings menu in the main window.  An example is shown in Figure 2.

Editing machine profiles
Figure 2.  Editing machine profiles.

The machine dialog is very similar to the application dialog in that configured machines are listed on the left, machine data are entered on the right, and action buttons are provided to create new machine profiles and to clone or remove them.  You can create multiple profiles for the same machine, for instance to use different access methods as shown.  In such cases you should make sure the machine names are distinct in order to avoid unpredictable results.

Descriptive fields

Currently the machine description and type fields are for information only; Teuthis does not make use of them.

Access method

The first thing to configure for each machine is the access method you will be using to execute remote commands on the machine and to transfer files.  Enter the fully qualified hostname to which you will connect in the "Login host" field and your username on this host in the "User ID" field.  Then choose an access method from the list box.  Doing so will immediately fill in the command fields with patterns appropriate to that access method.  In most cases you will not need to modify these, but in case you do, you will need to refer to the table of command patterns below.

Patterns understood by Teuthis in remote command fields
%A
Account name
%a
Application arguments
%b
Remote batch file name
%C
Remote change directory command
%c
Remote command
%D
Remote path create command
%d
Remote run path
%e
Path to remote executable
%f
"From" file in file transfers
%H
Wall clock time limit hours
%h
Remote host; or "from" host in third-party transfers %i
"To" host in third-party transfers
%j
Remote job identifier
%K
Local certificate file name
%k
Local key file name
%L
Remote job name
%l
Proxy lifetime in hours
%M
Wall clock time limit minutes
%m
Memory per node (MB)
%N
Number of nodes
%n
Number of CPUs
%P
Remote parallel execution command
%p
Remote password
%Q
Queue name
%R
Kerberos realm %r
"To" path in file transfers %S
Wall clock time limit seconds
%T
CPU tiling (CPUs/node) %t
Wall clock time limit (%H:%M) %u
Remote userid; or "from" userid in third-party transfers
%v
"To" userid in third-party transfers %z
Remote directory for Teuthis files




The following choices of access method are currently available:

The most secure and feature-ful access methods at present are the ssh-agent and gsissh* methods.  For third-party file transfers to work, both remote machines must be configured with the same access method, and that method must support third-party transfers.

Note that occasionally third-party uberftp file transfers will fail with a PORT command error.  These will sometimes clear up if the file transfer is repeated.  It's not clear what the cause is, but it appears to be due to a GridFTP issue and not Teuthis, because it also shows up when uberftp is invoked from the command line.

Job submission

Next you must configure the queuing system used by the remote machine.  If you choose, you can supply a job script template that uses the command patterns given above.  Most of the time, however, you can leave this field blank and simply choose the queuing system from the list box.  If the job template field is blank, Teuthis will use the queuing system selection to construct a basic job script for you.  Currently Teuthis knows about PBS, LSF, and LoadLeveler, the most commonly used queuing systems.  Unix process management (without queues) is also supported, but currently this option does not work reliably.  You can add the names of the charging accounts you use and the names of the queues that are supported on the machine using the "Accounts" and "Queues" comboboxes.  For each, type an entry into the text field, then click the "A" button next to it to add it to the list of values.  To remove an entry, select it and then click the "R" button.

If this machine can run parallel jobs, give the command used to run parallel jobs in the "Parallel exec" field.  You can use the %N and %n command patterns to indicate the number of nodes or CPUs, respectively.  Parallel applications will be appended to this command when the job script is generated.

Set the number of CPUs per node, the default choice for this value, and the amount of memory per node in the appropriate boxes.  The number of CPUs per node and the memory requested can be changed in the job dialog before job submission.

Path settings

Finally, configure the paths under which applications will be built, executables will be stored (for those applications with "Move to remote exec dir" checked), jobs will be run, and Teuthis files will be placed.  In general the build, executable, and Teuthis directories can be placed under your home directory on the remote machine, but runs should be performed on a scratch partition.  For each job, Teuthis will construct a run directory of the form

/run_root/project_name/experiment_name/run_name/job_name

where the names (except for the run root) will be munged to remove spaces and other troublesome characters.

The "OS Type" setting determines how remote paths are constructed (as well as a few other minor things, such as what test command to use to check whether authentication is valid).  Available choices are "Windows" and "Unix."  The Windows setting has not yet been extensively tested.

Previous section

Table of contents

Next section