This lesson is in the early stages of development (Alpha version)

Basic Shell Interaction

Overview

Teaching: 20 min
Exercises: 30 min
Questions
  • How do we interact with a UNIX shell, the basic HPC interface?

  • What is a file system?

  • How do we navigate around files and directories from a UNIX shell?

  • How do we manage files and directories from a UNIX shell?

  • How do we work with text files from a UNIX shell?

  • How do we get help on UNIX shell commands?

Objectives
  • Navigating around files and directories

  • Managing files and directories: copying, moving/renaming, and erasing

  • Viewing and editing text files

  • Preparing files for the upcoming hands-on exercises

UNIX shell plays an important role in the user interaction with HPC systems. This episode serves as a crash course on UNIX shell and tools. What is presented here is short enough to allow newcomers to become quickly productive in the (apparently) clunky interface used in many HPC systems. Take heart: despite the archaic look, the tools are actually very powerful and will allow you to be very productive.

In this episode…

We will learn how to navigate file systems with pwd, ls, cd; how to manage files and directories (create, copy, move/rename, delete) using mkdir, cp, mv, rm, rmdir; view and edit text files using cat, less, nano. We will also show you how to look for helpful documentation on these tools so that you can learn more about these commands and become effective in using them. There is a summary at the end of this lesson for your quick reference.

Login to Wahab First!

For this session, we assume that you have just logged in to Wahab HPC cluster. Please review the previous episode for instructions on how to connect to this cluster.

UNIX Shell: First Impression

On Wahab, the UNIX shell prompt looks like this:

wahab-01:~> 

This prompt tells you that you are interacting with a computer named wahab-01. (Frequently, but not always, your user name is also shown on the prompt.) We can also find this fact out by using the hostname command. Type hostname followed by Enter, and see what comes out:

wahab-01:~>  hostname
wahab-01

This type of interface with a computer is called command-line interface (CLI). You, as the user, give instructions to the computer by typing commands (and optionally some arguments), then press Enter (also known as Return) to execute the command. This read-evaluate-print loop (REPL) is the heart of a command-line interface. Depending on the command, some output may be printed on the screen. In the example above, the hostname produces a short string: wahab-01. UNIX shell is a very basic CLI: Without additional software, the shell only accepts text from the keyboard as its input, and only produce text as its output. The combination of keyboard and text-based screen is often referred to as terminal, and for our purposes, is equivalent to CLI.

Why Learning Command-Line Interface?

The vast majority of computer systems today have graphical user interface (GUI) with windows, images, and videos for output; whereas mouse, trackpad and touch screen are often used for input. CLI or terminal interface is a very old computer interface, dating back in 1960s.

You may wonder: Why do I need to learn to use an ancient CLI at all? As we shall see in this lesson, CLI allows computer users to become very effective and productive. Just by typing a few commands you are able to accomplish many actions that require many points and clicks on a GUI. Frequently used combination of commands can be saved into a script: just running this script will allow us to accomplish a large unit of tasks, and repeat them with ease. Next to CLI, scripting (the action of creating and using script) is another foundational matter in using a supercomputer.

Throughout the workshops, you will encounter many commands and examples on how to use them. Do not be afraid to try out and experiment with these commands—that is the only way to learn, become familiar, and eventually skilled with the CLI.

But wait! Will I damage the computer if I type a wrong command?
Good question! On a shared UNIX-like system like Wahab, there are plenty of protection measures that makes it very difficult for you to inflict fatal damage. For example, you are not able to delete or alter system files or files belonging to other people.

First Exploration with UNIX Shell

IMPORTANT

Please follow along all the commands shown below, as they are also meant to prepare your files for the hands-on exercises later on.

Just as your laptop computer, files on an HPC system are organized in terms of (sub)directories and files. (Some operating systems use the term “folder” for a directory; they are interchangeable.) The following illustration shows you representative directories and files on Turing HPC (the layout on your specific HPC may vary–FIXME):

Tree diagram showing key/illustrative directories and files on Turing

In UNIX, there is only one filesystem, starting with the root directory (/) as the top-level directory. Directly under /, there are many directories. We will explore some of these interesting directories shortly.

There are three life-saver commands on UNIX shell that you must always remember:

These three commands are essentially the most frequently used actions in a graphical file explorer (such as Windows Explorer or Finder): knowing where we are in a folder structure, changing folder, and listing the folder contents.

pwd — Print Working Directory

When you first log in to a computer, you will be placed in your home directory. To find out the location of your home directory, we invoke the pwd command (sort for print working directory):

$ pwd
/home/USER_ID

USER_ID in the output above is just a placeholder for your user ID on Wahab; in your case, you will see your own user ID after /home/. So, /home/USER_ID is your home directory. (For ODU students and faculty, the user ID is the same as the MIDAS ID.)

Your home directory is denoted by the ~ character in the shell prompt. This ~ is a common shortcut in UNIX, and is very useful in many instances when working on a UNIX shell.

ls — List the Contents of a Directory

Use the ls command to list the contents of a given directory. Used by itself, ls lists the contents of the current working directory. Let’s try this now:

$ ls

This shows the contents of your home directory, which most likely is empty (which is often true for new HPC users).

Did I See Nothing … Or Not?

Unless you have been an existing HPC user and have created some files or transferred some files/directories into your home directory, you will see an empty output. Otherwise, you will see your own files and directories. For example, on the author’s home directory, ls gives the following output (truncated):

admin  CItraining    Consult     Desktop    HPC              ...
bin    CLC_Data      data-shell  Documents  hpc-analytics-1  ...
BUILD  CLCdatabases  DeapSECURE  Downloads  hpc-analytics-2  ...

(All these are directories that contain many other files and directories. We encourage you to use directories to organize your data in a way that’s logical to you.)

When invoking ls command, you can specify the directory which you want to list the content of. Let’s peek into the contents of the root (/) directory:

$ ls /
RC    etc             lib64       proc     shared  usr
bin   home            lost+found  root     snap    var
boot  initrd.img      media       run      srv     vmlinuz
cm    initrd.img.old  mnt         sbin     sys     vmlinuz.old
dev   lib             opt         scratch  tmp

There are many objects in /—they can be either files or directories. A more descriptive output can be given by adding the -F option (also known as flag):

$ ls -F /
RC/    etc/             lib64/       proc/     shared/  usr/
bin/   home/            lost+found/  root/     snap/    var/
boot/  initrd.img@      media/       run/      srv/     vmlinuz@
cm/    initrd.img.old@  mnt/         sbin/     sys/     vmlinuz.old@
dev/   lib/             opt/         scratch/  tmp/

The slash (/) character appended after bin, dev, etc. indicate that these objects are directories. Other character indicators exist. Most notably: * for executable files and @ for symbolic links.

ls Colors

On modern HPC systems running Linux OS, most likely you will get a colored output with or without the -F flag, as an alternative indicator the type of the files/directories. This is a feature of a modern ls command to improve command-line user experience. You can learn more about this in our extra lesson.

Activity: Touring the Cluster Filesystem

We already took a peek at the / directory. Now let’s use ls to peek at some other directories on Wahab:

  • /bin
  • /shared/apps
  • /scratch/Workshops/DeapSECURE
  • /home/USER_ID (your own home directory)
  • /home/hpc-0081 (somebody else’s home directory)

If you can see additional directories inside, you’re welcome to browse further.

With someone sitting next to you, please answer the following questions:

  • What do you see in each directory?
  • Can you decide what each directory contains (e.g. what kind of files, what’s the purpose of that directory)?
  • Do you have a directory you can’t see?
  • Do you observe any problems with typing the directory name?

Solution

  • /bin contains executable programs, such as ls and hostname.
  • /shared/apps contains many programs that users can use on the cluster.
  • /scratch/Workshops/DeapSECURE is a shared directory containing datasets, programs, and exercise files for our workshop.
  • /home/USER_ID contains, well, your own files.
  • /home/hpc-0081: You can’t read this directory using ls, because HPC system is designed to protect individual home directories such that only the owner can read and write files in his/her own home directory.

In UNIX, file names are generally case-sensitive. This means that /Bin and /bin are not equal.

Please look deep inside

/scratch/Workshops/DeapSECURE/module-hpc/Exercises/Unix

and answer the following questions:

  • How many files end with .txt?
  • What other file extension(s) do you see there?

Directory within Directory?

A directory is like an envelope that can contain other files (letters) and directories (other envelopes). Or it can be empty. For example, a directory named

/scratch/Workshops/DeapSECURE

means that directory DeapSECURE is contained within the Workshops directory, which in turn is contained within scratch, which in turn is contained within the root (/) directory of the filesystem. This kind of nesting can go arbitrarily deep—subject to the operating system’s limitation.

Suppose that one user has a DeapSECURE within his home directory, e.g.

/home/tjones/DeapSECURE

To refer to that DeapSECURE directory, one has to specify which “bigger envelope” contains the directory (/home/tjones or /scratch/Workshops). This is analogous to saying that to say that Mr. Thompson lives on 1800 Thomas Lane is not sufficient—we also need to specify the city and state.

File and directory names are case sensitive!

Please be aware that in Linux and UNIX, file names are case sensitive. For example, this means that DeapSECURE and deapsecure are not equal, and both names can exist in the same directory.

More About UNIX Commands

An anatomy of "ls -F /bin" command

Each UNIX command comes with a rich set of options (or sometimes called flags) that enrich the usability of the command. For example, ls with the -l option tells a lot more information about each file or directory printed.

$ ls -l /scratch/Workshops/DeapSECURE
total 60
-rw-r--r-- 1 wpurwant users 1422 Sep 30 11:04 README.DeapSECURE-workshops.txt
-rw-r--r-- 1 wpurwant users 1422 Sep 30 11:04 README.txt
drwxr-xr-x 6 wpurwant users 4096 Oct  5 00:17 _scratch
drwxr-xr-x 2 wpurwant users 4096 Sep 30 13:14 bin
drwxr-xr-x 7 wpurwant users 4096 Nov 14  2019 datasets
drwxr-xr-x 4 wpurwant users 4096 Oct  5 00:16 handson
drwxr-xr-x 2 wpurwant users 4096 Oct  5 07:36 lib
drwxr-xr-x 2 wpurwant users 4096 Oct  5 00:32 module-bd
drwxr-xr-x 2 wpurwant users 4096 Oct  5 00:32 module-crypt
drwxr-xr-x 2 wpurwant users 4096 Oct  4 23:54 module-hpc
drwxr-xr-x 2 wpurwant users 4096 Oct  5 00:32 module-ml
drwxr-xr-x 2 wpurwant users 4096 Oct  5 00:32 module-nn
drwxr-xr-x 2 wpurwant users 4096 Oct  5 00:32 module-par
drwxr-xr-x 4 wpurwant users 4096 Sep 30 17:05 src
drwxr-xr-x 4 wpurwant users 4096 Sep 30 17:06 tools

Each file is printed with the following attributes (let’s take README.txt as an example):

As you can see, directories have d as their first letter in the file attribute.

The -l option can be specified along with the file name. (The typical UNIX convention specifies that flags need to come before the file/directory arguments.)

Another important option is -a, which means “show all files”. In UNIX, files whose names begin with a period are called hidden files; they will not be printed by ls, unless the -a option is given.

What’s in My Home Directory

Please perform ls -a on your own home directory: What did you see?

Solutions

On the author’s home directory, he finds:

.              .emacs.d     .mc                   .turing_tcshrc
..             .fontconfig  .mozilla              .viminfo
.bash_logout   .gnome2      .ssh                  .wahab_bash_logout
.bash_profile  .gnupg       .tcshrc               .wahab_bash_profile
.bashrc        .history     .turing_bash_history  .wahab_bashrc
.cache         .ipython     .turing_bash_logout   .wahab_history
CItraining     .keras       .turing_bash_profile  .wahab_tcshrc
.config        .kshrc       .turing_bashrc        .Xauthority
.emacs         .lesshst     .turing_history

The precise answer would vary for your home directory—it partly depends on what you programs have launched so far on the cluster.

You may wonder why there are so many hidden files in your home directory. No, someone did not hack into your account and create these. Many hidden files are configuration files that you (end users) do not need to touch on daily basis. For example, .bashrc and .tcshrc are configuration files for the bash and tcsh shells, respectively.

cd – Changing Working Directory

UNIX shell and programs have the concept of current working directory. This is the directory which the shell “thinks” as, well, its working directory. For a given shell, there is only one working directory at any point in time. The pwd command prints this working directory. The cd command allows us change this working directory to a different one.

$ cd /

Explore Working Directory

After the last command (cd /), please verify that your current working directory is indeed changed by invoking the __ command. Also, use ls to list the files/directories in the new working directory.

Now use cd and ls to go into the previously visited directories (from the previous hands-on activity) and see their contents again.

TODO Notice how the shell prompt changes as we change our working directory.

Too Much Typing? History and Tab to the Rescue

Feeling like typing too much already? Well, UNIX tool developers have put a lot of thought on making your life easier! There are three features of UNIX shell that are worth knowing:

  1. Command history: The shell keeps a history of previously executed commands. To recall these, simply use the Up and Down arrow keys.

  2. Tab completion: The shell’s input editor allows you to complete a partially typed word simply by pressing the Tab key. Depending on the context of the word, the shell will choose to complete it based on the available commands names, or file and directory names. We will demonstrate this feature below, as this is truly a keystroke saver.

  3. Line editing: UNIX CLI relies completely on keyboard navigation. Mouse click will not change the position of the cursor. The Left and Right keys are used to move the cursor one character at a time. Use the Ctrl+A and Ctrl+E combination keys to jump to the beginning or end of the input line, respectively. (In many systems, the Home and End keys can be used as well.)

Tab Completion—Let’s say you want to change the current directory to /scratch/Workshops/DeapSECURE/module-hpc. You begin by typing

$ cd /s

then press the Tab. You will get several suggestions,

sbin/    scratch/ shared/  snap/    srv/     sys/

(If the suggestions do not appear, you may need to press the Tab twice.) It partially completed the path to the longest possible common substring. Now you need to add more letter(s) to continue your input. Please append c or cr (don’t add any whitespace), then hit the Tab key again; the command line will become:

$ cd /scratch/

This is already a valid command, and the path is correct. But the directory is not yet the full path that we want.

Exercise

Keep completing the full path using the tab completion feature.

In the example above, you are trying to complete an argument to the cd command. The shell was using the choices of files and directories that begin with /s.

We can also do complete a command name. As an illustration, suppose we partially type py on a blank input line, then press Tab. You will get some choices:

py3clean             pygettext2.7         python3-jsondiff
py3compile           pygettext3           python3-jsonpatch
py3versions          pygettext3.6         python3-jsonpointer
pyclean              pygmentize           python3-jsonschema
pycompile            pyhtmlizer3          python3.6
pydoc                pyjwt3               python3.6m
pydoc2.7             python               python3m
pydoc3               python2              pyversions
pydoc3.6             python2.7
pygettext            python3

Since this is the first word of the statement, the shell looks for command names as the possible completion options.

While this explanation may look complicated, a little practice will help you get used to this tab completion facility. You will find that this little feature is such a tremendous keystroke saver!

More on UNIX Path and File Name

Absolute Path

The few examples with ls and cd above shows that we can view (and access) any file and directory located anywhere within the UNIX filesystem. But so far, we do so by specifying the absolute path of the file or directory—which, as you can tell, is very tedious. An absolute path shows the way to get to a particular file from the root directory, passing through all the intermediate directories, like this one:

/scratch/Workshops/DeapSECURE/README.txt

An absolute path always begins with a leading slash. An absolute path leaves no room for ambiguity. For example, the following three paths are all distinct:

/scratch/Workshops/DeapSECURE/README.txt
/home/tjones/DeapSECURE/README.txt
/backup/projects/DeapSECURE/README.txt

But it involves a lot of typing; it is also error-prone.

Relative Path

Let us change directory to our shared training directory:

$ cd /scratch/Workshops/DeapSECURE/module-hpc
$ ls -F
Exercises/  geoip@  spams/

Both Exercises and spams are directories located within module-hpc. We can there peek into the contents of these directories simply by using their base names (i.e. without any / character):

$ ls -F Exercises
Bandwidth_test/  Slurm/  Spam_bash/  Unix/  results/
$ ls -F spams
chmod.py  Results_amp  Results_par  Results_seq  Results_seq_alg2  untroubled@

That’s a whole lot shorter than writing the full absolute path like /scratch/Workshops/DeapSECURE/module-hpc/Exercises! Why? Because our current working directory is already where Exercises is located. A relative path gives the location of a file or directory in relation to the current working directory.

Relative path is an alternative way to refer to a file or a directory, which is often much shorter than the absolute path. In the real world, you typically work with files and directories that are in your current working directory, or located only 1-2 subdirectories away. If your focus shifts to another group of files in a different directory, you can change to that directory before working on these files.

There are two magic directory names that always exist in every directory:

Example:

Understanding . and ..

Our shell is currently at the /scratch/Workshops/DeapSECURE/module-hpc directory. In this context, . refer to the same directory; whereas .. refers to /scratch/Workshops/DeapSECURE.

Other than the special meanings above, . and .. work just like any other relative path.

  • From the current directory, what are the contents of . and .. directories?

Solution

Exercises/  geoip@  spams/
README.DeapSECURE-workshops.txt  datasets/   module-crypt/  module-par/
README.txt                       handson/    module-hpc/    src/
_scratch/                        lib/        module-ml/     tools/
bin/                             module-bd/  module-nn/
  • The subdirectory /scratch/Workshops/DeapSECURE contains other directories like module-crypt and module-par. How would you go to these directory using a single cd command?

Solution

Using absolute path:

$ cd /scratch/Workshops/DeapSECURE/module-crypt

Using relative path:

$ cd ../module-crypt
  • From the current directory /scratch/Workshops/DeapSECURE/module-hpc, what would be the output of pwd after we cd to one of the following directories?

    • ../.
    • ../..
    • ../../..
    • ../module-hpc
    • ./module-hpc

    (Clarification: These are not meant to be a sequence of relative directories to be cd-ed in that order. Rather, only execute one cd in your mind, then think what would be the output of `pwd.)

Home Directory

Home directory is so important that it has a magic directory name to denote it: ~. We saw this already in the first shell prompt. No matter where you are on the filesystem, just invoke cd ~ to return to your home directory. Even niftier: invoking cd command with no argument will also return you to your home directory. This is handy in many situations.

Finding Your Way Back to Home Directory

Starting from /scratch/Workshops/DeapSECURE/module-hpc, which of the following commands could you use to navigate to your home directory, which is /home/USER_ID?

  1. cd .
  2. cd /
  3. cd /home/USER_ID
  4. cd ../..
  5. cd ~
  6. cd home
  7. cd ../../../../home/USER_ID
  8. home
  9. cd
  10. cd USER_ID

Solution

  1. No: . stands for the current directory.
  2. No: / stands for the root directory.
  3. Yes: /home/USER_ID is the absolute path of your home directory.
  4. No: this goes up two levels, i.e. ends in /scratch-lustre.
  5. Yes: ~ stands for the user’s home directory, in this case /home/USER_ID.
  6. No: this would navigate into a directory home in the current directory if it exists (otherwise, an error message would appear).
  7. Yes: unnecessarily complicated, but correct.
  8. No: home is not a valid Linux command, therefore this will result in an error.
  9. Yes: this is a special way to go to your home directory.
  10. No: There is no directory named USER_ID in /scratch/Workshops/DeapSECURE/module-hpc.

Wildcards

UNIX shell has a pattern-matching capability to generate a list of files that match a given pattern. This is an important feature to allow bulk processing.

As an example, let’s start from the /scratch/Workshops/DeapSECURE/module-hpc/Exercises/Unix directory. This directory contains the following files:

cat1.txt  clutter.sh  garbage1.txt  garbage4.txt  hello.sh    input3.txt
cat2.txt  create.sh   garbage2.txt  garbage5.txt  input1.txt  my-ntoebok.txt
cat3.txt  destroy.sh  garbage3.txt  garbage6.txt  input2.txt  Tutorial.txt

We can list only files that have the .txt extension:

$ ls *.txt
cat1.txt  garbage1.txt  garbage4.txt  input1.txt  my-ntoebok.txt
cat2.txt  garbage2.txt  garbage5.txt  input2.txt  Tutorial.txt
cat3.txt  garbage3.txt  garbage6.txt  input3.txt

What Do These Pattern Do?

On the same directory, try the following and observe the result:

$ ls c*
$ ls C*
$ ls *m*
$ ls *r*
$ ls *r*sh
$ ls cat?.txt
$ ls *[135].txt
$ ls *[2-5].txt
$ ls *a[gl]*
ls ../R*

When an argument has one or more of these special characters: ?, *, or a sequence enclosed by square brackets like [A-Z], [0-9], [01234], the shell will treat this argument as a pattern (sometimes called glob pattern). It then searches for all file/directory names matching this pattern and feeds the matching names to the command. The special characters or the [...] sequences are called a wildcard, and here are their meanings:

It is the shell that does the matching and list generation, not the ls command. For example, with ls c*, the shell will look for all file/directory names that begin with a c, and translate the statement to

ls cat1.txt cat2.txt cat3.txt clutter.sh create.sh

then execute the ls command.

Quoting an Argument

The shell treats the following characters as having special meanings:

|  &  ;  (  )  [  ]  {  }  <  >   `  '  "  \  #  *  ~  $  ?  !

The Space and Tab characters are also special. If an argument really has to contain these special characters, then the argument has to be quoted within a pair of either single or double quotes. The single quote does preserve all the characters literally, whereas the double quote still does a few extra actions with the $, ‘!’, ` (backtick), and \ (backslash) characters ($ invokes variable value substitution, as we learn later).

Good Names for Files and Directories

Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files.

  1. Don’t use spaces.

    Spaces can make a name more meaningful, but since spaces are used to separate arguments on the command line it is better to avoid them in names of files and directories. You can use - or _ instead (e.g. event-logs-2019/ rather than event logs 2019/).

  2. Don’t begin the name with - (dash).

    Commands treat names starting with - as options.

  3. Stick with letters, numbers, . (period or ‘full stop’), - (dash) and _ (underscore).

    Many other characters have special meanings on the command line. We will learn about some of these during this lesson. There are special characters that can cause your command to not work as expected and can even result in data loss.

Sanitizing File Names

On GUI-centric platforms such as Windows, Mac, and Android, many users are used to using a whole gamut of characters in file names, which tend to cause trouble with UNIX way of processing. Here are some examples:

  1. Member list.xlsx
  2. Thesis revision 5 (backup).pptx
  3. Tom & Jerry.txt
  4. Question #70.pdf
  5. Caution!
  6. Stocks valued above $10.xlsx

How should we quote these file names?

Solution

In all cases but the last two, one can use either single quotes or double quotes: 'Member list.xlsx' or "Member list.xlsx". In the last two cases, however, the ! and $ would invoke additional action with the double quotes; hence these names must be quoted with a pair single quotes for use in UNIX commands such as:

$ ls -l 'Caution!'
$ ls -l 'Stocks valued above $10.xlsx'

Editing Directory Content

Now that you have familiarity with location-based commands, let us learn how to edit directory contents. In this section, we will learn the following:

Here’s our goal: we will create some new directories in your own home directory and copy the exercise files for our hands-on activities to this directory. We will also rename and delete some files.

IMPORTANT

We are going to use the directories created and the files copied below in the subsequent exercises. Please be sure follow along and do the commands in the light blue boxes below on your HPC account.

mkdir — Creating New Directories

Let’s create a directory called CItraining on your home directory, then another one called hpc within that directory. mkdir is the command to create new directories.

Preparing Our Hands-On Activities (Part 1 of 2)

Please follow along and invoke the following commands in this order:

$ mkdir ~/CItraining
$ cd ~/CItraining
$ mkdir module-hpc
$ cd module-hpc

Review

  1. What did we just do?

  2. What does ~/CItraining mean? What absolute path does this translate to?

  3. Use pwd and check where your working directory is now.

Solution

  • We just created a directory named CItraining in your home directory. We then created another directory named module-hpc within CItraining.

    • In the first mkdir, we use an absolute path to specify the new directory’s name.
    • In the second mkdir, we use a relative path instead.
  • That little wiggle ~ character stands for /home/USER_ID, therefore ~/CItraining actually stands for /home/USER_ID/CItraining.

    • You can substitute /home/USER_ID/ in place of ~/. This is necessary outside of the context of UNIX shell, as many programs actually do not know how to interpret the ~ at the beginning of a path name.
    • The ~/ prefix is not necessary if you are already in the home directory.
  • If all is well, pwd should give: /home/USER_ID/CItraining/module-hpc.

cp – Copy files and directories

The cp command is used to copy one or more files and/or directories.

Preparing Our Hands-On Activities (Part 2 of 2)

Check your current working directory first! Please make sure that it is /home/USER_ID/CItraining/module-hpc before issuing the copying statement below. Otherwise, change to this directory, or make one if you don’t have it.

Now we will copy the entire hands-on directory for this lesson so you have your own files to work on. DO NOT miss even a period.

$ cp -r /scratch/Workshops/DeapSECURE/module-hpc/Exercises/. .

Check the Result

  • What does this cp command do?
  • What files and directories get copied to your directory? Use ls, pwd, and/or cd to find out.

The -r (recursive copy) option is powerful: It allows us to copy an entire directory with subdirectories and files contained therein.

Congratulations! You just copied this lesson’s hands-on directory with all the files and directories contained therein to your own home directory. Now we are ready to do some activities with files and directories.

Congratulations! You just copied this lesson’s hands-on directory with all the files therein to your own home directory. Now we are ready to hands-on involving files and directories.

The following activities show some capabilities of cp. Let us go to the Unix subdirectory first. Now create a directory called junk then copy garbage1.txt and garbage2.txt to it:

$ cd Unix
$ mkdir junk
$ cp garbage1.txt garbage2.txt junk

The cp command has multiple possible syntax:

These syntax are fairly natural to our understanding: by reading the statements above, we can deduce what each cp invocation is trying to accomplish.

Copying Multiple Files

Now copy all files that begin with garbage and ends with .txt to the junk directory.

Solution

There are many possible solutions, but the most compact statement would be:

$ cp garbage*.txt junk/

Please double-check the junk directory that cp does what you want it to do.

mv – Moving and Renaming Files and Directories

Moving and renaming files and directories can be done using the mv command.

rm and rmdir – Deleting Files and Directories

The rm command deletes a file. For example, all the garbage*.txt files are, well, garbage.

Warning: File Deletion is Permanent!

In the UNIX world there is no concept of “Recycle Bin” or “Trash Bin”. Once a file is deleted, it is permanently inaccessible. Therefore always perform rm with extra care!

The rmdir command can be used to delete an empty directory. It refuses to delete a directory that is not empty. For example:

$ mkdir meow
$ cp cat*.txt meow/
$ rmdir meow
rmdir: failed to remove 'meow': Directory not empty

How to do this, then?

$ rm meow/*
$ rmdir meow

In the example above we create a directory meow to contain the copy of cat*.txt files. To delete the meow directory, the files and directories inside it must be removed first before rmdir can work.

Zapping an Entire Directory Tree

The vanilla rm command will not delete a directory:

$ mkdir meow
$ cp cat*.txt meow/
$ rm meow
rm: cannot remove `meow': Is a directory

But rm comes with the -r (recursive) flag which can delete an entire directory tree:

$ rm -r meow
$ ls meow
ls: cannot access meow: No such file or directory

Warnings: Power Tools are Dangerous!

UNIX utilities are very powerful, therefore you must use them with some care. Here are some warnings:

  • Recursive copy (cp -r): With recursive copy, everything in the source directory will be copied without warning. If the source directory has extremely large files and/or has many files, your can easily run out of storage. Therefore, know what you’re copying before executing cp -r.

  • Remove (rm) command: once a file is deleted, it is permanently inaccessible to you. Unlike GUI file managers like Windows Explorer or Finder, there is no recycle bin to retrieve the recently deleted files. (File recovery is actually possible, but it is extremely difficult operation that requires forensic tools.) In an HPC environment, you simply assume that the file is “gone” once rm removes a file. Even more dangerous is rm -r command–it is deleting files and subdirectories specified in the argument.

Asking Before Clobbering Files

The cp, mv, and rm has the -i option to prevent accidental overwriting or deletion of files. If you are not sure whether you will clobber existing files, it is a good idea to include this option.

Viewing and Editing Text Files

UNIX terminal is also a useful interface to view the content of text files, as well as editing these files.

cat — Simple Viewing

The cat command simply concatenates the contents files given as input, then outputs the result to standard output (i.e., the terminal). If you provide one file the content of that file is displayed. If you call cat without input, the command will print what you type.

Some example uses: In ~/CItraining/module-hpc/Unix, there are three files whose names begin with cat:

$ cat cat1.txt
Life is an interesting adventure.
$ cat cat1.txt cat2.txt
Life is an interesting adventure.
If you like adventures then you probably like life.
It is full of surprises in both good and bad forms.
$ cat cat2.txt cat2.txt
If you like adventures then you probably like life.
It is full of surprises in both good and bad forms.
If you like adventures then you probably like life.
It is full of surprises in both good and bad forms.
$ cat cat3.txt cat2.txt cat1.txt
You can laugh, you can cry.
All this makes life interesting.
And that is why life is an adventure!
If you like adventures then you probably like life.
It is full of surprises in both good and bad forms.
Life is an interesting adventure.

less and more — Paged Viewing

less is a more sophisticated and versatile pager. With less, you can view the file in both directions.

Basic Navigation with less

  • Up and Down: Scrolling up and down, one line at a time.
  • Page Down or d or Space: Show the next page.
  • Page Up or u or Backspace: Show the previous page.
  • q: Exit less.

The more command allows you only to view a file, one screen at a time, in a forward manner. You cannot go back. The command terminates once the end of the file is reached. more is an older and more primitive pager.

Our recommendation: whenever possible, please use the less command because it is more powerful. However, when you are in rare situation where you do not have a fully functioning terminal or less is not available, then more will be a useful tool to have.

nano – Editing text files

Several text editors are popular in the UNIX-like world:

In this training module, we will focus on the nano text editor, because it is a very lightweight open-source software, and its availability is quite widespread (MacOS X and most Linux distributions have it by default; whereas it is available for Windws in several ways– one of which is through “Git for Windows” package. Its interface is the most intuitive over all the other editor’s.

vi is the bread-and-butter editor for many UNIX users. It is almost universally available on any systems running UNIX-like systems: ranging from bare bone servers, Raspberry Pi (tiny computers), supercomputers, cloud computers, etc. However, its interface is terse and the learning curve is rather steep.

emacs is a good alternative editor. Its interface looks somewhat similar to other text editors in the GUI world, and there is a graphical interface for emacs if so desired. However, its keyboard shortcuts are a different breed compared to what many Windows or Mac users are used to.

Creating and Editing Text Files with nano

When working on an HPC system, we will frequently need to create or edit text files. Text is one of the simplest computer file formats, defined as a simple sequence of text lines. What if we want to make a file? There are a few ways of doing this, the easiest of which is simply using a text editor. To create or edit a file, type nano [FILENAME], on the terminal, where [FILENAME] is the name of the file. If the file does not already exist, it will be created. Let’s make a new file now, type whatever you want in it, and save it.

$ nano draft.txt

Nano text editor

Nano defines a number of shortcut keys (prefixed by the Control or Ctrl key) to perform actions such as saving the file or exiting the editor. Here are the shortcut keys for a few common actions:

Editing Tryout

Please open the cat1.txt file and add a few lines of your favorite quotes. Save the file, exit nano, and view the contents using cat, less, or more.

Getting Help on UNIX Shell

How can you find more information about a UNIX command? How could someone on earth possibly remember all the options for these commands? There are at least three ways to find help:

  1. Using the --help option;

  2. Using the man (manual) page;

  3. Using web search (e.g. Google, Bing, DuckDuckGo) to look for reference or tutorial.

In the following subsections we will touch each one of these.

Program’s Help Option

At least on Linux, many programs are equipped with the --help option that will give you a (fairly brief) documentation on how to use the program. For example: ls --help will give you an output like this:

Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort.

Mandatory arguments to long options are mandatory for short options too.
  -a, --all                  do not ignore entries starting with .
  -A, --almost-all           do not list implied . and ..
      --author               with -l, print the author of each file
  -b, --escape               print octal escapes for nongraphic characters
      --block-size=SIZE      use SIZE-byte blocks.  See SIZE format below
  -B, --ignore-backups       do not list implied entries ending with ~
  -c                         with -lt: sort by, and show, ctime (time of last
                               modification of file status information)
                               with -l: show ctime and sort by name
                               otherwise: sort by ctime
  -C                         list entries by columns
      --color[=WHEN]         colorize the output.  WHEN defaults to `always'
                               or can be `never' or `auto'.  More info below
  -d, --directory            list directory entries instead of contents,
                               and do not dereference symbolic links
  -D, --dired                generate output designed for Emacs' dired mode
  -f                         do not sort, enable -aU, disable -ls --color
  -F, --classify             append indicator (one of */=>@|) to entries
      --file-type            likewise, except do not append `*'
      --format=WORD          across -x, commas -m, horizontal -x, long -l,
                               single-column -1, verbose -l, vertical -C
      --full-time            like -l --time-style=full-iso
  -g                         like -l, but do not list owner
      --group-directories-first
                             group directories before files.
                               augment with a --sort option, but any
                               use of --sort=none (-U) disables grouping
  -G, --no-group             in a long listing, don't print group names
  -h, --human-readable       with -l, print sizes in human readable format
                               (e.g., 1K 234M 2G)
      --si                   likewise, but use powers of 1000 not 1024
  -H, --dereference-command-line
                             follow symbolic links listed on the command line
      --dereference-command-line-symlink-to-dir
                             follow each command line symbolic link
                             that points to a directory
      --hide=PATTERN         do not list implied entries matching shell PATTERN
                               (overridden by -a or -A)
      --indicator-style=WORD  append indicator with style WORD to entry names:
                               none (default), slash (-p),
                               file-type (--file-type), classify (-F)
  -i, --inode                print the index number of each file
  -I, --ignore=PATTERN       do not list implied entries matching shell PATTERN
  -k                         like --block-size=1K
  -l                         use a long listing format
  -L, --dereference          when showing file information for a symbolic
                               link, show information for the file the link
                               references rather than for the link itself
...

(The output was truncated because it was so long—more than 100 lines.)

A few things are worth mentioning:

Reading a Command Syntax Specification

Near the top of this help text, the syntax of using the command is spelled out. For ls:

Usage: ls [OPTION]... [FILE]...

For mv:

Usage: mv [OPTION]... [-T] SOURCE DEST
  or:  mv [OPTION]... SOURCE... DIRECTORY
  or:  mv [OPTION]... -t DIRECTORY SOURCE...

This syntax specification is key to correct utilization of these commands. Here are some help on how to read this syntax specification:

man — UNIX Manual Page

On most UNIX/Linux systems, many programs come with a manual page that can be read using man command. Most of the time, you only need to know how to use a command in its most basic form. But if you are going beyond the basics and need to know an option to do certain action, man can give you that. Here is an example of the output of man:

$ man ls
LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort.

       Mandatory arguments to long options are  mandatory  for  short  options
       too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print octal escapes for nongraphic characters

       --block-size=SIZE
              use SIZE-byte blocks.  See SIZE format below

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with -lt: sort by, and show, ctime (time of last modification of
              file status information) with -l: show ctime and  sort  by  name
              otherwise: sort by ctime

       -C     list entries by columns

       --color[=WHEN]
              colorize  the  output.   WHEN  defaults  to  ‘always’  or can be
              ‘never’ or ‘auto’.  More info below
...

At times, the output is often very similar to the --help output, but often there are details in the manual page that are not mentioned in the --help output. man presents the documentation through a pager (e.g. less or more), therefore one can navigate through the documentation up and down and use the full capabilities of the pager. Press q to quit the pager and return to the shell.

Internet sites and search engines can also provide us with information on UNIX commands. On your favorite search engine, type man ls or ls manual page. Add UNIX or Linux if the first few hits are not what you want. Several sites may turn up, such as:

(These websites are good ones that you can bookmark for future reference.)

The Shell Command Reference

Summary of Common UNIX Commands

  1. Querying the shell’s current working directory:

    $ pwd
    
  2. Change directory:

    $ cd DIR
    
  3. List content of current directory

    $ ls
    
  4. Paths

    • Relative paths start in relation to a given directory such as current directory or parent directory.
    • Absolute paths are given from root drive, starting with: /
    • Home directory : ~
    • Current directory: .
    • Parent directory: ..
  5. Create new directory

    $ mkdir DIR_NAME | DIR_PATH
    
  6. Copy file:

    $ cp [-p] SRC_FILE DEST_FILE
    $ cp [-p] SRC_FILE [SRC_FILE2 ...] DEST_DIR
    

    Optional -p flag preserves file modification time and permissions. The first syntax can be used to copy the file and create a new filename.

  7. Copy directory tree:

    $ cp -r [-p] SRC_DIR DEST_DIR
    

    If DEST_DIR exist, then a subdirectory with the base name of SRC_DIR will be made to contain the copy; otherwise, the copy will be stored in DEST_DIR.

  8. Rename a file or directory:

    $ mv SRC_FILE DEST_FILE
    $ mv SRC_FILE [SRC_FILE2 ...] DEST_DIR
    $ mv SRC_DIR DEST_DIR
    
  9. Remove file or directory

    # Remove file
    rm FILE_NAME | FILE_PATH [FILE_NAME2 | FILE_PATH2 | ...]
    
    # Remove directory
    rm -r DIR_NAME | DIR_PATH
    rmdir DIR_NAME | DIR_PATH
    
  10. Inspecting the content of a (text) file

    $ cat FILE
    $ more FILE
    $ less FILE
    
    • Use cat for relatively short text file.
    • more command allows pause between pages, but no scrolling backward.
    • less is the most sophisticated of all, allowing forward and backward scrolls, and more.
  11. Determining the type of a file

    $ file FILE
    
  12. Text editors for terminal

    $ nano [FILE]
    $ vi [FILE]
    $ emacs [-nw] [FILE]
    
    • New users are encouraged to use nano, which is the easiest to use of all.
    • vi is the classic editor on UNIX platforms (it is actually vim on many modern Linux distributions).
    • emacs is another favorite editor on UNIX-like platforms. The -nw optional flag can be used to suppress the X11 GUI window of Emacs, if your SSH connection supports X11 programs.
  13. Command documentation man

    $ man COMMAND
    

Key Points

  • UNIX shell provides a basic means to interact with HPC systems

  • pwd, cd, and ls provides essential means to navigate around files and directories

  • Directories and files are addressed by their paths, which can be relative or absolute

  • Basic file management tools: mkdir, cp, mv, rm, rmdir

  • Basic text viewing and editing tools: cat, less, nano