2.
Getting started
Teuthis is written in Python and uses the GTK+ graphics library. To run Teuthis, you must have the following packages (or later versions) installed on your system:
![]() |
Python version 2.0 | |
![]() |
PyGTK version 2.4 | |
![]() |
libglade version 2 | |
![]() |
pexpect version 0.999 (a version of pexpect.py modified to retry reads on EINTR signals is included with Teuthis) |
To use remote machines from within Teuthis, you also need to have at least one of the following client packages installed:
![]() |
OpenSSH or GSI-enabled OpenSSH | |
![]() |
Kerberos |
To make use of Grid services for single signon authentication and third-party file transfers, you need to have GSI-enabled OpenSSH and the Globus Toolkit (version 4.0) installed. (At this time pyGlobus is not required.) Teuthis can also make use of the following optional Grid software packages:
![]() |
UberFTP file transfer client | |
![]() |
MyProxy credential management client |
To build these Grid tools from source, it helps to have the Grid Packaging Toolkit installed.
Finally, to take notes you will need a text editor and/or web browser.
At present Teuthis is only supported under Linux. However, its dependencies are all open-source projects with versions available for Windows and Macintosh computers, so it should be usable on these systems as well. Users with these systems are invited to try out Teuthis and share their experiences with us.
To install Teuthis, make sure you have installed the required packages, then follow these instructions:
zcat
file | tar xvf -
, where file
is
the name of the archive file you downloaded. This will create a
directory named teuthis-XX/
, where XX
is
the version number. install-teuthis.py
script found
there. Note that if you plan to install Teuthis into a
system-wide directory such as /usr/local/
, you will need
to have system administrator privileges for this step. /usr/local/
,
the Teuthis files will be copied into /usr/local/teuthis-XX
.
You will also be asked if you would like to create symbolic links to
the teuthis.py
and tendril.py
scripts in
another directory like /usr/local/bin/
. teuthis-dir/bin/teuthis.py
,
where teuthis-dir
is the directory
into which you
installed Teuthis in the previous step. If you chose to create links in
a directory that is in your path, you may also simply issue the command
teuthis
. Teuthis is based on an experimental view of computational science. To be scientifically useful, calculations must be performed in a repeatable fashion, systematically varying input parameters and cataloging the dependence of the results on these parameters. When many cases are to be run on different computers, with source code and data stored on still other computers, managing such experiments can be a daunting task, particularly when jobs fail or machines go down. By reducing the labor required to create and track a large number of computational experiments, Teuthis makes it easier to be careful and harder to make mistakes.
Traditionally, a simulation user interacts from his or her workstation with remote computing resources using ssh and scp (or Kerberized rlogin and ftp, or Grid equivalents) running within a terminal window. The user must upload, configure, and build source code, upload any data files needed, and manually create parameter files and a job script. After submitting the job to a remote queuing system, the user must periodically return to check on the job's status. If the job succeeds, the job script (or the user) transfers the resulting files to an archival data system. To continue the run, the user must edit the parameter file, maneuver a restart file into position, edit the job script, and resubmit. At several points in this process a typographical error can lead to a wasted trip through the queue or possibly lost data. The user must make a special effort to keep a record of the work that has been done (including the locations of the data files produced and the disposition of runs). Teuthis does not replace the tools used to authenticate users, transfer data between machines, queue jobs, and the like. Instead, it replaces the cumbersome and error-prone process described above with a single point of control and recordkeeping, using existing tools in the background.
A typical network setup for Teuthis appears in Figure 1. Teuthis runs together with an integrated development environment (IDE) such as Eclipse on the user's workstation (a future release may include a plugin for Eclipse). The user edits code within the IDE, synchronizing it with a source code repository (e.g., CVS or Subversion) on a separate machine accessible to other members of the user's research group. When needed, Teuthis synchronizes the source code with a remote compute server (perhaps a large parallel computer) and issues the instructions to configure and build the code there. Also using Teuthis, the user configures runs on the workstation and submits the resulting scripts to the compute server, transferring data to and from the data server as needed. Small files such as log files and standard output are brought back to the workstation and stored within Teuthis. Pointers to large data sets on the data server are also maintained.
Figure
1: Network envisioned for use with Teuthis.
At present Teuthis does not support
the use of Grid services for remote resource discovery and job
submission. This
fact requires Teuthis to know something about how to
interact with different authentication methods (always using external
tools to limit the chances of introducing new security holes) and
queuing systems. Also, machine information must be entered directly by
the user.
These requirements are not too onerous for now, but it is anticipated
that
future versions of Teuthis will support these services.
A more serious limitation of the present version, however, is
that asynchronous data transfers are not yet
possible, making it infeasible to transfer very large files as part
of a job. Background file transfers are a major priority for the next
release of Teuthis.
Here we present an example showing how one typically might interact with Teuthis. After launching Teuthis (e.g., by issuing the command python teuthis.py from a terminal), we are presented with the window shown in Figure 2.
Figure
2: Main window.
The blank project view area will show summary
information for the different projects we will create. Before
creating a project, though, we need to configure at least one
application and machine. Choosing “Applications...” from
the Settings menu brings up the dialog shown in Figure 3, allowing us
to configure an application. In the figure we have filled in the
dialog with entries appropriate to the FLASH
simulation code as an example. FLASH is a parallel code that must be
built anew for different problems. It is configured using a command
called setup which expects to
find a configuration file called Modules
containing the list of code modules to include. After configuration,
the executable flash2 is built
using make. At runtime, FLASH
looks for a file called flash.par
containing a list of control parameter values. As it runs, it
produces a second parameter file called flash.par.restart
every time it successfully writes a checkpoint file. A log of FLASH's
activity is left in the file flash.log.
Figure
3: Application window.
Although FLASH uses most of the features of this dialog, other applications might have simpler requirements; for example, an application configured using GNU configure might not require a configuration file. To add another application, you can click the “New” button and another entry will be created. You can also configure similar applications using the "Clone" button, which will create a separate application entry called "FLASH 2.4 Copy" and allow it to be edited. Most of the entities manipulated by Teuthis (applications, machines, projects, experiments, runs, and jobs) can be cloned. Click “OK” to exit the dialog and return to the main window.
To configure a machine, we select “Machines...” from the Settings menu, bringing up the dialog shown in Figure 4.
Figure
4: Machine window.
In Figure 4 this dialog is shown with three
different machines configured. The properties of a machine named
cobalt (the Altix machine at NCSA) are being edited. After
filling in the "Login host" and "User ID" fields, we selected
"gsissh+uberftp" from the access method list box. Doing so filled
in the fields giving the commands to initiate authentication, remotely
execute commands, and perform file transfers. The command fields
include patterns such as "%h" that will be matched against file, host,
and user names; the complete
list of patterns understood in different
fields by Teuthis is given in the Configuring
resources section. We can modify the commands as needed for
this particular machine if, for example, we need to pass through a
front end machine to get to this machine. Since the access method
we have chosen uses GSI
authentication, we also need to fill in the X.509 certificate
subject field with the subject or distinguished name of the certificate
that we will use to authenticate to this machine. (The
certificate itself should be located either on the machine running
Teuthis or on a MyProxy
server; the choice between these options is made in the user preferences dialog.)
After filling in the access fields, we configured
the machine for job submission by indicating the queuing system used,
the default number of CPUs and memory per node to request, the parallel
execution command, and the account and queue names available to
us. By choosing a queuing system, we can let Teuthis create a job
script for us, or else choose our own job template file. Like the
command fields, job templates include patterns that allow job
properties such as name, wall clock time, and run directory to be
inserted into the job script actually submitted to the remote system.
Having completed the access and job submission
sections, we finally provide the paths on the remote system under which
applications should be build, executables should be stored, and jobs
should be run. We also provide a path for Teuthis to store its
files, such as the tendril.py
script and standard output
files. Usually all of the paths can be placed under the user's
remote home directory except for the run root, which generally goes on
some large scratch file system. The "OS type" setting determines
whether to use forward or reverse slashes for remote paths, as well as
a few other minor issues.
As with applications, we can clone, remove, or
add machine definitions as needed. Click "OK" to return to the
main window.
Finally we are ready to start a project. Select “New project...” from the File menu and bring up the dialog shown in Figure 5.
Figure
5: Project properties dialog.
Here we can edit the project name and description as well as take notes on the project by editing either a local text file or an online web page (wiki). The "Creator" field is filled in using information taken from the user preferences dialog. For now let's create a new experiment by clicking on the “New experiment” button.
After clicking on the “New experiment” button, we are faced with the formidable-looking dialog shown in Figure 6. Some of the fields are shown already filled in.
Figure
6: Experiment properties dialog.
The really essential sections of this dialog to
complete are the ones labeled “Application” and
“Execution.” Using the appropriate list boxes, we
select the application FLASH 2.4 and machine cobalt that we
configured earlier. Based on the information we gave then, the build
directory, configuration command, build command, and executable
fields are filled in for us. We can make some modifications specific
to the experiment if we wish; for example, in Figure 6 we have
manually added the argument “sedov -auto” to the
configuration command. This sets up FLASH to solve the standard Sedov
blast wave test problem, and by using “-auto” we have
avoided the need to specify a configuration file.
To continue, we need to upload the source code
to the execution machine and configure and build it there. By
selecting FLASH 2.4 as the application we have already chosen a local
source directory; clicking the “Upload source” button
initiates a file transfer to the execution machine. Since we have
chosen to access cobalt via GSI, an authentication dialog comes up
asking for our certificate passphrase (Figure 7). The certificate
subject used is the one specified for this machine in the machine
dialog.
Figure
7: Authentication dialog for a machine
configured to use GSI authentication.
After clicking “OK” in the
authentication dialog we wait for the source code to be uploaded.
When the upload finishes, we are returned to the experiment dialog,
at which point we click “Do it” next to “Configure
command” and then “Do it” next to “Build
command.” We must wait for each of these tasks to complete
before going on. The "Output" button next to each "Do it" button
allows us to view the output from each task, and the color of the
button border indicates whether the task was completed
successfully. Green indicates success, while red indicates that
an error occurred.
When using GSI authentication with any given
machine, a proxy check for the identity associated with that machine is
performed before every remote command is issued. If a valid proxy
is found, the command is issued without prompting the user again for a
passphrase. Thus if you are using multiple machines with GSI
authentication that recognize the same identity, you need only
authenticate yourself once.
For this example, we have also chosen to stage
files from a third machine (tungsten) and archive the files produced by
our run to that same machine. The files we are staging are listed
with their full paths, one per line, in the "Src files" box. The
ones listed in our example are just some random large files, but in
general one might stage files containing initial conditions data,
equation of state tables, etc. At the end of the run, we will
archive all of the files in the run directory to the data destination
machine, placing them in the directory indicated in the "Dest path"
field. If this directory does not exist, it will be created.
In the next step, we set the number of CPUs to use on the execution machine, then go on to choose the runtime parameter file template to use with our experiment. FLASH uses a text runtime parameter file containing lines of the form variable = value. Most of the parameters will not need to vary from one run to the next, so we create a normal parameter file containing these lines. For those parameters we wish to vary, we use lines of the form variable = %variable%, allowing Teuthis to substitute values for the variables as specified in the experiment dialog. In the Figure above we have selected such a file containing a line “lrefine_max = %lrefine_max%”. We then typed lrefine_max into the list of parameters to vary and provided a list of values. These will be the values lrefine_max will take on in each of the runs generated for this experiment.
Clicking the “Generate runs” button, we find that a number of new runs have been added under our experiment in the main window's project view. Click “OK” buttons to return to the main window, then select “Expand all” from the View menu. You should see something like Figure 8.
Figure
8: Main window with project view expanded to
show new runs.
Runs corresponding to each of our lrefine_max parameter values have been created for us. Each includes a separate runtime parameter list. By double-clicking on one of the runs or right-clicking on it and selecting “Properties...” from its popup menu, we obtain a dialog like that shown in Figure 9.
Figure
9: Run properties dialog.
Here we can review and add to or override the experiment settings for executable arguments, number of processors, files to transfer, etc. We can also review the parameter file that was generated for this run. The "Varied parameters" area shows just the parameters that were varied in generating the runs and the values they took for this particular run. Once everything is satisfactory, we click on the "Create job" button to prepare the run for submission. This brings up the dialog shown in Figure 10.
Figure
10: Job properties dialog.
Here we get one final opportunity to add to or
override the run settings. We also set the wall clock time to
request for this calculation. Notice that otherwise this looks
very similar to the run dialog. So why the additional "job" layer
of organization? A run is meant to be abstracted from the details
of whether a calculation succeeded or not, whether it crashed and had
to be restarted, whether it had to be resubmitted to the queue several
times, etc. These details are encapsulated in jobs. When we
decide to actually carry out a job, we click the "Submit" button in the
job dialog to perform the pre-job data file staging and to generate and
submit the job script. If the remote submission is successful,
the job is assigned a remote job identifier and a run status.
From the job dialog or the main window we can request updates to a
job's status, and when a job is found to be complete, we are offered
the opportunity to perform any data archiving that we have
requested. At any time after the job starts to run we can also
request the standard output, standard error, and application log file
produced by the job. After the job is complete, the next request
for these files downloads them and stores them within the Teuthis
project file, so thereafter they can be viewed offline.
The "Disposition" fields in both the run and job
dialogs are meant to be set by the user and contain a short comment on
the status of each run or job -- for example, "Ran to completion" or
"Crashed on startup." These comments appear in the main window
when the status field is enabled. Together with the data archived
within Teuthis, the notes files linked to in the project dialog, and
the recorded locations of archival data, the status fields allow you to
see at a glance where your calculations stand and what you need to do
next.
If the application has been configured with a
restart method, we can optionally continue the job by clicking on the
"Continue job" button. This will create a new unsubmitted job
that will run in the same directory as the original job. How the
restart is handled depends on the restart method chosen; for example,
FLASH has been designed so that it writes a new runtime parameter file
called "flash.par.restart" after each checkpoint file is successfully
written. Since we have configured Teuthis to look for this file
when continuing a job, it will download the contents of
flash.par.restart and use them as the runtime parameter file for the
restart job. This feature makes it straightforward to continue a
job after a crash or after running out of time in the queue.