The Wharton School Grid Project

FAQ ::: The Grid

Table of Contents

Grid Overview

What Is The Grid?

The Grid provides access to computational software for Wharton faculty, faculty research assistants, and PhD students. It is setup to allow simple and parallel processing across a large set of hardware.

Grid Architecture, Programs, and Compilers

The Grid runs on a large set of tightly integrated hardware with dedicated networking and storage. The hosts themselves utilize the VMware ESX hypervisor allowing multiple virtual machines on each system. For more information about the hardware, please see the Hardware page.

Grid users have access to a multitude of scientific, mathematics, and analytics software including Matlab, R, Mathematica, and more. MySQL access can be provided if necessary. The Grid also has a variety of Fortran, C, and C++ compilers in GNU and Intel versions. For more information about each of the software packages and compilers available, please see the Software page.

Getting an Account

In order to obtain access to The Grid, you must meet the following requirements:

  • Wharton faculty, faculty assistant, or PhD student
  • Have a Wharton domain account

If you meet the Grid requirements, please think about your requirements. If you're not sure of some of the details, just let us know you're not sure and we'll get you started.

  • How much disk space will you need? [Information about Grid storage]
  • What programs do you want to run? (software not listed may be installed based on user demands)
  • What kind of RAM usage are you expecting?
  • How long will your project last?

Once you have thought about these items, please use the Apply Here page. It may take up to 72 hours to process your account request, after which you will receive an email with further information and an email to join the grid-users mailing list. The rest of the information provided in the FAQ below should help you get started with working with the Grid.

Working with The Grid

Connecting

All Grid access is done through the standard Unix command line and SSH for commands and SSH-FTP/SMB for file transfers. (File transfer access is described in the Transferring Files section.) Mac OS X and Linux operating systems have these built in, however if you are using Windows, you will need to use a SSH v2 client for command access (we recommend the University supported SecureCRT). You may wish to consult with your department's distributed representative to determine what SSH client that he/she supports.

Your username and password are your Wharton domain credentials. The first time you login, you may receive a message similar to "Host key not found from list of known hosts. Are you sure you want to continue connecting?" Answer yes to make the connection. You should not receive this message on subsequent connections.

To login on from Windows or using an SSH client:

  • Set the host to: unix.wharton.upenn.edu
  • Connect using your Wharton username and password

To login on from Mac OS X/Linux/Unix:

  • Open a Terminal window (/Applications/Utilities/Terminal on OS X)
  • Type this command and hit enter: ssh username@unix.wharton.upenn.edu
  • When it prompts for a password: your Wharton password

PC SAS CONNECT/SASTCPD

PC SAS along with SAS CONNECT is a powerful alternative to logging directly onto the GRID UNIX server and running SAS programs in a "shell" session. The main advantage is that you can avoid learning and remembering UNIX syntax and software, staying almost entirely in a Windows PC environment to access and process data on the WRDS system. In other words, SAS CONNECT allows the user to make use of the resources of the remote machine without having to work on it through SSH connection.

When running a SAS program that will make use of SAS CONNECT, the following steps are automatically executed by SAS:

  1. Sign on the remote server
  2. Run the program on the remote machine.
  3. Return to the local SAS the output and the log generated by running the program on the remote computer.
  4. Sign off the remote server

In order to use SAS CONNECT and run a program remotely, the SAS program must contain some extra code.

At the beginning of the program add the three lines of code below that allow the remote system to identify the user and establish the remote connection. Once the connection is established, it will remain for future instruction or programs, until PC SAS encounters the sign-off instructions.

The code for the sign-in is:

%let grid=sastcpd.wharton.upenn.edu 7551;
options comamid=TCP remote=GRID;
signon username=_prompt_;
    

When these lines are executed, a window will appear on the local system asking for username and password see NOTE below), which are then used to log into the server sastcpd.wharton.upenn.edu (which is actually a group of servers).

Since the sign-in is NOT required every time you run a program (unless you submit the sign-off lines), you do not need to write these lines at the beginning of every program. However, if you do that, PC SAS will check if a connection is already established, and in that case, it will just skip the sign-in part. Therefore, it is perfectly safe to write the sign-in lines at the beginning of every program you intend to be executed on the remote server.

The instruction to sign off the remote server is:

signoff;
    

NOTE: SAS/CONNECT NO LONGER requires a separate, local grid password (YOU MAY use your Wharton credentials)!

Files

Storage/Quotas

The Grid has storage connected to each compute node to work with files. Your main area for file storage is your "home" directory: /home/department/username (sometimes abbreviated as ~, also available as the $HOME variable). You can get the full path to your home directory using the command: echo $HOME

While the Grid has dedicated storage (in addition to WRDS data access), resources are not unlimited. Therefore, each user and department as a whole has a quota or limit on the size of files allowed. The default quotas are:

  • Per User: 50GB
  • Per Department: 500GB

These limits can be adjusted above the defaults by special request and chargeback. Please consult with your department's distributed representative and business administrator for more information.

To check on your quota usage, use the following command: quota
As with all Unix commands, more usage information can be found with the manual pages using: man quota

Transferring Files

There are two currently supported file transfer protocols for moving files on and off the Grid: SSH-FTP (SSH File Transfer Protocol, not Secure FTP) and SMB (Windows file sharing). SSH-FTP is implemented in the command scp and many file transfer clients available. SMB is commonly known as the Windows file sharing protocol, but is also implemented in other operating systems like Mac OS X and Linux. Please Note: By design, you can only move files to and from your home directory. Each of the compute nodes has access to this directory.

Your username and password are your Wharton domain credentials. The first time you login, you may receive a message similar to "Host key not found from list of known hosts. Are you sure you want to continue connecting?" Answer yes to make the connection. You should not receive this message on subsequent connections.

If you are using a file transfer client (such as SecureCRT, etc) use the following settings:

  • File Transfer Protocol (if selectable): SSH-FTP
  • Host: unix.wharton.upenn.edu

Please Note: if you are transferring a file from Windows, you should set the transfer mode to ASCII for program and job script files, otherwise line breaks will not translate correctly. If you find a file with incorrect line breaks from Windows, you can use the dos2unix filename command to fix it.

If you are using Windows file sharing, type \\unix.wharton.upenn.edu\username in the address bar of a Windows Explorer window. Mac OS X and Linux can connect to smb://unix.wharton.upenn.edu/username.

If you are using Mac OS X or Linux, the scp command is available. The format for the command is scp source-filename target-filename. When referencing a remote file, you must use the full syntax for the file (username@system:/filename). Some examples:

  • From your local computer, copy a file from your local computer to the Grid: scp local-file username@unix.wharton.upenn.edu:/home/department/username/grid-file
  • From your local computer, copy a file from the Grid to your local computer: scp username@unix.wharton.upenn.edu:/home/department/username/grid-file local-file
  • From the Grid (if you have SSH properly setup on your local system), copy a file from your local computer to the Grid: scp local-username@your-system:/local-file grid-file

Please Note: if you are transferring files a large amount of files between your local computer and the Grid, it is much more efficient to tar or zip them into a single file and untarring or unzipping once its transfered. For more details, check out the manual pages for the commands: man tar / man zip / man unzip.

Advanced users: you may be able to improve the scp transfer rate by choosing the blowfish encryption method rather than using the default. To do this, use: scp -c blowfish local-file username@remote-system:/remote-file

Using Software on the Grid

There are two methods of using the software on the Grid, interactively and using scripts.

  • Interactive usage involves running the software manually on a compute node. If you new, this might be the way you are used to using the software, however it does not fully use the power or utility of the Grid. This method is good for learning the software, testing, and debugging. See the Interactive Program Access section below for information about how to use the software this way.
  • Using scripts is the preferred method for running computations on the Grid. With scripts, computations can be run without any user interaction. See the Running a Job section, especially the Overview section, below for information about how to use the software this way.

Interactive Program Access

To run software interactively on the Grid, use qrsh program. If there is an available compute node, The Grid will automatically connect you to it and start the software. (Please Note: By design, the software is not directly available on unix.wharton.upenn.edu, you cannot just run matlab, etc.) Some examples:

Software ProductText StartupGraphical Startup**
Mathematica
qrsh math
qrsh mathematica
Matlab
qrsh matlab -nodisplay
qrsh matlab
R
qrsh R --no-save
not implemented
SAS
qrsh sas -nodms
qrsh sas
Stata
qrsh stata-se
qrsh xstata-se

**Please Note: to view the graphical interface (if available), you need to have an X-server installed on your local computer and be properly tunneling the display (you may need to use the ssh -X or ssh -Y commands when connecting to the Grid). Otherwise, you must use the Text Startup software arguments in your command (described briefly above and in the Creating Job Scripts section). For assistance with X-server setup, please contact your departmental Wharton Computing Representative.

Please Note: to view more usage information for the Grid software, you can execute just qrsh then:

Running a Job

Overview

Simply stated, running calculations on the Grid without user interaction involves:

  • Taking your software commands and placing them in a software script file
  • Creating a job script that calls the software and your software script file
  • Using a command that submits your job script to the Grid queues
  • When resources are available, your job script is pulled out of its queue and executed
  • While your job script is running, the output is placed where you specified in your job script (home directory by default)
  • If you chained additional job scripts, they are submitted to the Grid queues and the process repeats

Creating Job Scripts

You will need to write a shell script that wraps your software execution(s), referred to here to as a job script. Generally these are saved with a .sh extension (although they don't have to be). It's fairly simple:

			#!/bin/bash
			
			# Executes an echo on the compute node it runs on, 
			# printing the hostname and timestamp it was run
			# Note the use of backticks, not single quotes
			echo "`hostname` is running my program at `date`"
		

A more useful job script will execute the software on the Grid with your software commands script. Here's a matlab example:

			#!/bin/bash
			
			# Executes matlab on a compute node 
			# Matlab input commands: /home/dept/username/matlab-commands.m
			# Matlab output: /home/dept/username/matlab-output.`date`.txt
			# Where `date` inserts the timestamp when the job was executed
			# Note that date is surrounded by backticks, not single quotes
			matlab -nodisplay -nodesktop -nosplash < /home/dept/username/matlab-commands.m > /home/dept/username/matlab-output.`date`.txt
		

Below are a few required (and otherwise useful) software execution arguments in your job scripts. Most are needed to turn off the graphical interfaces:

  • Mathematica: math -noprompt
  • Matlab: matlab -nodisplay -nodesktop -nosplash
  • R: R --no-save
  • SAS: sas -nodms -noterminal

There are more complete examples available in /usr/local/demo - please check them out. Also, if you need more information for the software arguments, look in the Interactive Access section.

Once you have a job script, see the Submitting a Job section below to get the Grid to execute your job.

Please Note: Unless you specify an output file (using command > outputfile) for your software executions, the Grid will automatically send your output to your home directory using the defaults below (where XXXX is the job ID):

  • yourscript.sh.oXXXX - standard out
  • yourscript.sh.eXXXX - standard error

Also, if the output file already exists, your job will fail. You can work around this by specifying your output file with a timestamp or changing variable. A timestamp example with a sample matlab execution:

  • Before: matlab -nodisplay -nodesktop -nosplash < matlab-commands.m > matlab-output.txt
  • After: matlab -nodisplay -nodesktop -nosplash < matlab-commands.m > matlab-output.`date`.txt (those are backticks, not single quotes)

Submitting a Job

To submit your job script to the Grid, use the qsub command. To submit your job script named job-script.sh in your current directory, execute: qsub job-script.sh

Please Note: when your script begins execution, the working directory is your home directory. Use the -cwd argument with qsub to use the current directory you are in as the working directory.

WARNING: as noted in the Transferring Files section, if you transferred your file from Windows using a file transfer client, line breaks may be broken. Use dos2unix filename to fix any program or job scripts and set your client to ASCII mode to transfer your scripts.

Please Note: if you are receiving the warning:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
		

You can fix it by submitting your job with qsub -S /bin/bash job-script.sh.

Multiple Runs of a Job

There are a number of ways to programmatically have a job do multiple runs (if you are using randomizing functions, etc). Shell languages like bash provide the while, for, and foreach constructs. Their discussion is outside the scope of this FAQ, however here is a simple example:

			#!/bin/bash
			
			# this will run a R job 5 times
			# there are many ways to specify loops, read your shell's documentation!
			for i in 1 2 3 4 5
			do
				R --no-save < /home/dept/username/commands.R
			done	
		

Please Note: Make sure that you do not create an infinite loop by using while [[ 1 ]], or something that will always evaluate to true.

Chaining Jobs

You may need to chain jobs to perform operations on output files of previous jobs or to ensure you only use certain resources. To run jobs that are chained together, you simply need to add a qsub next-job-script.sh to the end of your job script. A simple example:

			#!/bin/bash
			
			# this will run a R job, then chain (submit) the next job specified in next-job-script.sh
			R --no-save < /home/dept/username/commands.R
			qsub /home/dept/username/next-job-script.sh	
		

Please Note: When chaining jobs together, do NOT qsub to the same job script file! It will create an infinite loop of job submissions.

Job Submission Queues

The Sun Grid Engine scheduler controls all access to the Grid's compute nodes. All jobs must use the qrsh or qsub commands, which submit jobs to the Grid in an orderly fashion and allocates available resources.

There are policies in place to prevent a single user from dominating the machine by flooding the queue with jobs, particularly a job limit per user. THIS SYSTEM IS NOT FOOLPROOF. Please be courteous and run as few simultaneous jobs as possible, particularly if you notice that there is a lot of usage (with the qstat command). If it is determined that you seem to be 'gaming' the system in any way at the expense of other users, your access to these systems may be revoked without warning.

You can check the status of the submission queues by using the qstat and qhost commands. For more information see the section on monitoring section below.

Job/CPU/Memory Limitations

The grid is designed to prevent any job or user from overtaking resources from other jobs and users. While not foolproof, there are some limitations to be aware of. Users (or your group if you are sponsored) is limited to using only 20 running slots and 500 total queued slots. Slots are defined as "slices" within the grid. Non-parallel jobs (such as R) allocate as 1 slot. Parallel jobs (such as MPI, parallel Matlab) can allocate slots up to the limit. Users/groups can mix any amount of non-parallel and parallel jobs up to their 20 slot limit.

The other limitations are strictly for protecting the CPU and memory resources within a single compute node since each node handles more than 1 slot. Currently, each slot is limited to 1 CPU and 4GB of RAM. The CPU and memory restrictations can usually be fixed in your jobs by splitting the workflow differently or into more jobs. Please contact grid-admin@wharton.upenn.edu with any questions about these limitations.

Monitoring Jobs/Hosts & Maintenance

For general information about overall host, job, and upcoming scheduled maintenance for the Grid, please check out the Status page. There are also some useful commands for job and host monitoring (man command for more usage information):

  • qstat: displays the status of the Grid queues. It includes running and queued jobs
  • qhost: displays the status of all the nodes of the Grid

Altering Jobs

The qalter command is used to alter queue options for queued and running jobs. Be sure to check out more usage information with man qalter

Deleting Jobs

The qdel command is used to kill queued and running jobs. Be sure to check out more usage information with man qdel

Advanced Topics

Login Shells

Please email grid-admin@wharton.upenn.edu if you would like to change your shell.

Please Note: You cannot change your default login shell with the chsh command. The setting will not remain.

Aliases

You can feel free to simplify some of your common command by adding aliases to your local .aliases file. Please be sure to check out the man alias and online documentation for more information.

Compilers

Fortran, C, and C++ compilers are available, in both GNU and Intel versions. Parallel environments include OpenMPI and MPICH2 (currently in development).

  • To choose the parallel environment setup you wish to use: mpi-selector
    Please Note: you must log out and log back in to activate the new environment.
  • To see what your default is: mpi-selector --query
  • To see the available options: mpi-selector --list
  • To set your new default: mpi-selector --set selectorname

Then, you can compile your program by executing one of the following sets of statements based on whether you are compiling singly or in parallel, and depending on which MPI parallel environment you have chosen.

  • C Compiler: mpicc program-name.c for the MPI compiler, gcc program-name.c for the GNU compiler
  • C++ Compiler: mpicxx program-name.C for the MPI compiler or c++/g++ program-name.C for GNU compiler
  • Fortran Compiler: mpif90/mpif77 program-name.f for the MPI fortran compilers, ifort program-name.f for the Intel fortran compiler (with IMSL libraries), gfortran/f95 program-name.f for the GNU fortran compiler

The make command executes /usr/bin/make. It is GNU make (gmake).

IMSL Libraries

To use the IMSL Libraries (ONLY with Intel products):

  • Select the Intel MPI setup with the command: mpi-selector --set mpich2-intel
  • Log off and back on
  • To test, and for sample code and a job submission script, follow the instructions in /usr/local/demo/imsl/README.

Reporting Problems/Comments

Please send any problems, questions, or comments to grid-admin@wharton.upenn.edu

More Documentation

Most Unix commands and MPI routines have manual (man) pages associated with them to provide usage information. To view a man page, execute: man command

This documentation only touches on some of the available features and way to use the software. There is much more widely available (and more up to date) documentation on the web to supplement this FAQ.

Dell M1000e Chassis (Front)
Dell M1000e Chassis (Back)
Dell/Equalogic PS5000 iSCSI Chassis