2. Getting started

Requirements

Teuthis is written in Python and uses the GTK+ graphics library. To run Teuthis, you must have the following packages (or later versions) installed on your system:



* Python version 2.0

* PyGTK version 2.4

* libglade version 2

* pexpect version 0.999 (a version of pexpect.py modified to retry reads on EINTR signals is included with Teuthis)

To use remote machines from within Teuthis, you also need to have at least one of the following client packages installed:



* OpenSSH or GSI-enabled OpenSSH

* Kerberos

To make use of Grid services for single signon authentication and third-party file transfers, you need to have GSI-enabled OpenSSH and the Globus Toolkit (version 4.0) installed. (At this time pyGlobus is not required.) Teuthis can also make use of the following optional Grid software packages:



* UberFTP file transfer client

* MyProxy credential management client

To build these Grid tools from source, it helps to have the Grid Packaging Toolkit installed.

Finally, to take notes you will need a text editor and/or web browser.

At present Teuthis is only supported under Linux. However, its dependencies are all open-source projects with versions available for Windows and Macintosh computers, so it should be usable on these systems as well. Users with these systems are invited to try out Teuthis and share their experiences with us.

Installation

To install Teuthis, make sure you have installed the required packages, then follow these instructions:

  1. Obtain the appropriate Teuthis installation archive file from our downloads page.
  2. Extract the Teuthis files from the archive using the command zcat file | tar xvf -, where file is the name of the archive file you downloaded. This will create a directory named teuthis-XX/, where XX is the version number.
  3. Move to this directory and execute the install-teuthis.py script found there. Note that if you plan to install Teuthis into a system-wide directory such as /usr/local/, you will need to have system administrator privileges for this step.
  4. The installation script will ask you for the directory into which the Teuthis directory should be placed. For example, if you choose /usr/local/, the Teuthis files will be copied into /usr/local/teuthis-XX. You will also be asked if you would like to create symbolic links to the teuthis.py and tendril.py scripts in another directory like /usr/local/bin/.
  5. The script will create directories and links and copy files as needed. No compilation is required. After the script is finished, you may start Teuthis by running teuthis-dir/bin/teuthis.py, where teuthis-dir is the directory into which you installed Teuthis in the previous step. If you chose to create links in a directory that is in your path, you may also simply issue the command teuthis.

Philosophy

Teuthis is based on an experimental view of computational science. To be scientifically useful, calculations must be performed in a repeatable fashion, systematically varying input parameters and cataloging the dependence of the results on these parameters. When many cases are to be run on different computers, with source code and data stored on still other computers, managing such experiments can be a daunting task, particularly when jobs fail or machines go down. By reducing the labor required to create and track a large number of computational experiments, Teuthis makes it easier to be careful and harder to make mistakes.

Traditionally, a simulation user interacts from his or her workstation with remote computing resources using ssh and scp (or Kerberized rlogin and ftp, or Grid equivalents) running within a terminal window. The user must upload, configure, and build source code, upload any data files needed, and manually create parameter files and a job script. After submitting the job to a remote queuing system, the user must periodically return to check on the job's status. If the job succeeds, the job script (or the user) transfers the resulting files to an archival data system. To continue the run, the user must edit the parameter file, maneuver a restart file into position, edit the job script, and resubmit. At several points in this process a typographical error can lead to a wasted trip through the queue or possibly lost data. The user must make a special effort to keep a record of the work that has been done (including the locations of the data files produced and the disposition of runs). Teuthis does not replace the tools used to authenticate users, transfer data between machines, queue jobs, and the like. Instead, it replaces the cumbersome and error-prone process described above with a single point of control and recordkeeping, using existing tools in the background.

A typical network setup for Teuthis appears in Figure 1. Teuthis runs together with an integrated development environment (IDE) such as Eclipse on the user's workstation (a future release may include a plugin for Eclipse). The user edits code within the IDE, synchronizing it with a source code repository (e.g., CVS or Subversion) on a separate machine accessible to other members of the user's research group. When needed, Teuthis synchronizes the source code with a remote compute server (perhaps a large parallel computer) and issues the instructions to configure and build the code there. Also using Teuthis, the user configures runs on the workstation and submits the resulting scripts to the compute server, transferring data to and from the data server as needed. Small files such as log files and standard output are brought back to the workstation and stored within Teuthis. Pointers to large data sets on the data server are also maintained.

Network envisioned for use with Teuthis
Figure 1: Network envisioned for use with Teuthis.









At present Teuthis does not support the use of Grid services for remote resource discovery and job submission. This fact requires Teuthis to know something about how to interact with different authentication methods (always using external tools to limit the chances of introducing new security holes) and queuing systems. Also, machine information must be entered directly by the user. These requirements are not too onerous for now, but it is anticipated that future versions of Teuthis will support these services. A more serious limitation of the present version, however, is that asynchronous data transfers are not yet possible, making it infeasible to transfer very large files as part of a job. Background file transfers are a major priority for the next release of Teuthis.

A sample session

Here we present an example showing how one typically might interact with Teuthis. After launching Teuthis (e.g., by issuing the command python teuthis.py from a terminal), we are presented with the window shown in Figure 2.

Main window
Figure 2: Main window.







The blank project view area will show summary information for the different projects we will create. Before creating a project, though, we need to configure at least one application and machine. Choosing “Applications...” from the Settings menu brings up the dialog shown in Figure 3, allowing us to configure an application. In the figure we have filled in the dialog with entries appropriate to the FLASH simulation code as an example. FLASH is a parallel code that must be built anew for different problems. It is configured using a command called setup which expects to find a configuration file called Modules containing the list of code modules to include. After configuration, the executable flash2 is built using make. At runtime, FLASH looks for a file called flash.par containing a list of control parameter values. As it runs, it produces a second parameter file called flash.par.restart every time it successfully writes a checkpoint file. A log of FLASH's activity is left in the file flash.log.

Application dialog
Figure 3: Application window.






Although FLASH uses most of the features of this dialog, other applications might have simpler requirements; for example, an application configured using GNU configure might not require a configuration file. To add another application, you can click the “New” button and another entry will be created. You can also configure similar applications using the "Clone" button, which will create a separate application entry called "FLASH 2.4 Copy" and allow it to be edited.  Most of the entities manipulated by Teuthis (applications, machines, projects, experiments, runs, and jobs) can be cloned.  Click “OK” to exit the dialog and return to the main window.

To configure a machine, we select “Machines...” from the Settings menu, bringing up the dialog shown in Figure 4.

Machine dialog
Figure 4: Machine window.










In Figure 4 this dialog is shown with three different machines configured. The properties of a machine named cobalt (the Altix machine at NCSA) are being edited.  After filling in the "Login host" and "User ID" fields, we selected "gsissh+uberftp" from the access method list box.  Doing so filled in the fields giving the commands to initiate authentication, remotely execute commands, and perform file transfers.  The command fields include patterns such as "%h" that will be matched against file, host, and user names; the complete list of patterns understood in different fields by Teuthis is given in the Configuring resources section.  We can modify the commands as needed for this particular machine if, for example, we need to pass through a front end machine to get to this machine.  Since the access method we have chosen uses GSI authentication, we also need to fill in the X.509 certificate subject field with the subject or distinguished name of the certificate that we will use to authenticate to this machine.  (The certificate itself should be located either on the machine running Teuthis or on a MyProxy server; the choice between these options is made in the user preferences dialog.)

After filling in the access fields, we configured the machine for job submission by indicating the queuing system used, the default number of CPUs and memory per node to request, the parallel execution command, and the account and queue names available to us.  By choosing a queuing system, we can let Teuthis create a job script for us, or else choose our own job template file.  Like the command fields, job templates include patterns that allow job properties such as name, wall clock time, and run directory to be inserted into the job script actually submitted to the remote system.

Having completed the access and job submission sections, we finally provide the paths on the remote system under which applications should be build, executables should be stored, and jobs should be run.  We also provide a path for Teuthis to store its files, such as the tendril.py script and standard output files.  Usually all of the paths can be placed under the user's remote home directory except for the run root, which generally goes on some large scratch file system.  The "OS type" setting determines whether to use forward or reverse slashes for remote paths, as well as a few other minor issues.

As with applications, we can clone, remove, or add machine definitions as needed.  Click "OK" to return to the main window.

Finally we are ready to start a project. Select “New project...” from the File menu and bring up the dialog shown in Figure 5.

Project dialog
Figure 5: Project properties dialog.






Here we can edit the project name and description as well as take notes on the project by editing either a local text file or an online web page (wiki).  The "Creator" field is filled in using information taken from the user preferences dialog.  For now let's create a new experiment by clicking on the “New experiment” button.

After clicking on the “New experiment” button, we are faced with the formidable-looking dialog shown in Figure 6. Some of the fields are shown already filled in.

Experiment dialog
Figure 6: Experiment properties dialog.







The really essential sections of this dialog to complete are the ones labeled “Application” and “Execution.” Using the appropriate list boxes, we select the application FLASH 2.4 and machine cobalt that we configured earlier. Based on the information we gave then, the build directory, configuration command, build command, and executable fields are filled in for us. We can make some modifications specific to the experiment if we wish; for example, in Figure 6 we have manually added the argument “sedov -auto” to the configuration command. This sets up FLASH to solve the standard Sedov blast wave test problem, and by using “-auto” we have avoided the need to specify a configuration file.

To continue, we need to upload the source code to the execution machine and configure and build it there. By selecting FLASH 2.4 as the application we have already chosen a local source directory; clicking the “Upload source” button initiates a file transfer to the execution machine. Since we have chosen to access cobalt via GSI, an authentication dialog comes up asking for our certificate passphrase (Figure 7).  The certificate subject used is the one specified for this machine in the machine dialog.

Authentication dialog
Figure 7: Authentication dialog for a machine configured to use GSI authentication.







After clicking “OK” in the authentication dialog we wait for the source code to be uploaded. When the upload finishes, we are returned to the experiment dialog, at which point we click “Do it” next to “Configure command” and then “Do it” next to “Build command.” We must wait for each of these tasks to complete before going on.  The "Output" button next to each "Do it" button allows us to view the output from each task, and the color of the button border indicates whether the task was completed successfully.  Green indicates success, while red indicates that an error occurred.

When using GSI authentication with any given machine, a proxy check for the identity associated with that machine is performed before every remote command is issued.  If a valid proxy is found, the command is issued without prompting the user again for a passphrase.  Thus if you are using multiple machines with GSI authentication that recognize the same identity, you need only authenticate yourself once.

For this example, we have also chosen to stage files from a third machine (tungsten) and archive the files produced by our run to that same machine.  The files we are staging are listed with their full paths, one per line, in the "Src files" box.  The ones listed in our example are just some random large files, but in general one might stage files containing initial conditions data, equation of state tables, etc.  At the end of the run, we will archive all of the files in the run directory to the data destination machine, placing them in the directory indicated in the "Dest path" field.  If this directory does not exist, it will be created.

In the next step, we set the number of CPUs to use on the execution machine, then go on to choose the runtime parameter file template to use with our experiment. FLASH uses a text runtime parameter file containing lines of the form variable = value. Most of the parameters will not need to vary from one run to the next, so we create a normal parameter file containing these lines. For those parameters we wish to vary, we use lines of the form variable = %variable%, allowing Teuthis to substitute values for the variables as specified in the experiment dialog. In the Figure above we have selected such a file containing a line “lrefine_max = %lrefine_max%”. We then typed lrefine_max into the list of parameters to vary and provided a list of values. These will be the values lrefine_max will take on in each of the runs generated for this experiment.

Clicking the “Generate runs” button, we find that a number of new runs have been added under our experiment in the main window's project view. Click “OK” buttons to return to the main window, then select “Expand all” from the View menu. You should see something like Figure 8.

Main window with new runs
Figure 8: Main window with project view expanded to show new runs.






Runs corresponding to each of our lrefine_max parameter values have been created for us. Each includes a separate runtime parameter list. By double-clicking on one of the runs or right-clicking on it and selecting “Properties...” from its popup menu, we obtain a dialog like that shown in Figure 9.

Run dialog
Figure 9: Run properties dialog.






Here we can review and add to or override the experiment settings for executable arguments, number of processors, files to transfer, etc.  We can also review the parameter file that was generated for this run.  The "Varied parameters" area shows just the parameters that were varied in generating the runs and the values they took for this particular run.  Once everything is satisfactory, we click on the "Create job" button to prepare the run for submission.  This brings up the dialog shown in Figure 10.

Job dialog
Figure 10: Job properties dialog.






Here we get one final opportunity to add to or override the run settings.  We also set the wall clock time to request for this calculation.  Notice that otherwise this looks very similar to the run dialog.  So why the additional "job" layer of organization?  A run is meant to be abstracted from the details of whether a calculation succeeded or not, whether it crashed and had to be restarted, whether it had to be resubmitted to the queue several times, etc.  These details are encapsulated in jobs.  When we decide to actually carry out a job, we click the "Submit" button in the job dialog to perform the pre-job data file staging and to generate and submit the job script.  If the remote submission is successful, the job is assigned a remote job identifier and a run status.  From the job dialog or the main window we can request updates to a job's status, and when a job is found to be complete, we are offered the opportunity to perform any data archiving that we have requested.  At any time after the job starts to run we can also request the standard output, standard error, and application log file produced by the job.  After the job is complete, the next request for these files downloads them and stores them within the Teuthis project file, so thereafter they can be viewed offline.

The "Disposition" fields in both the run and job dialogs are meant to be set by the user and contain a short comment on the status of each run or job -- for example, "Ran to completion" or "Crashed on startup."  These comments appear in the main window when the status field is enabled.  Together with the data archived within Teuthis, the notes files linked to in the project dialog, and the recorded locations of archival data, the status fields allow you to see at a glance where your calculations stand and what you need to do next.

If the application has been configured with a restart method, we can optionally continue the job by clicking on the "Continue job" button.  This will create a new unsubmitted job that will run in the same directory as the original job.  How the restart is handled depends on the restart method chosen; for example, FLASH has been designed so that it writes a new runtime parameter file called "flash.par.restart" after each checkpoint file is successfully written.  Since we have configured Teuthis to look for this file when continuing a job, it will download the contents of flash.par.restart and use them as the runtime parameter file for the restart job.  This feature makes it straightforward to continue a job after a crash or after running out of time in the queue.

Previous section

Table of contents

Next section