This is a page with extra lesson materials that we feel would be too much to include in the main lesson pages. They are included for completeness, and to entertain curious minds (which we hope you have).
Key Places on Turing HPC Filesystem
In this section, we want to discuss in more depth a few key places (directories) in the filesystem of Turing cluster. Let us revisit the following illustration that shows you representative directories and files on Turing HPC:
Directly under the top-level /
, there are many directories.
These are a few locations that you need to be aware of,
because they have specific purposes.
-
/bin
contains essential executable programs. In the illustration above, two programs are mentioned:hostname
andls
. You can see by yourself that/bin
contains over 100 programs, includingcp
,mv
,rm
, andnano
. These programs are part of Linux OS. Additional programs (over 1700 on Turing) are stored at/usr/bin
, also part of Linux OS. -
/home
contains user home directories (one per user). There are many users on an HPC system, therefore/home
contains very many directories. Your own home directory would be/home/YOUR_MIDAS_ID
. In the picture above, we illustrate directories belonging to three users:hpc-0123
(a guest account user), ‘tjon012(a student account), and
tjones` (a faculty/staff account). -
/cm
contains many software provisioned specifically for Turing cluster (i.e. they are not part of the base Linux OS). The most relevant one would be/cm/shared/applications
, where software applications and libraries are stored. For example, the newest Python 3.6 software suite is stored in the following subdirectory:/cm/shared/applications/Python/3.6.8
. -
/etc
contains many system-wide configuration files. This is part of standard Linux OS, and many applications also use this location to store their system-wide configurations. -
/scratch-lustre
contains user scratch directories. You have your own scratch space in/scratch-lustre/YOUR_MIDAS_ID
. This directory is special because it featuresa fast storage system. You will notice that the read/write speed for large files in this directory is snappier compared to the same read/write on your own home directory. However, files on the scratch space is NOT backed up. This space is meant for temporary storage while processing the data or performing calculation. -
/scratch-lustre/DeapSECURE
contains shared programs, libraries, datasets for this training program. You will become more familiar with the files in this directory as we progress in this training.
More about UNIX-like Filesystem Hierarchy
UNIX-like operating systems have very similar structures in the naming of directories according to their purposes. For example,
/bin
,/usr/bin
, and/usr/local/bin
contains executable programs. (The namebin
stems from binary, because many executable programs contains machine instructions in binary numbers, generally not for human comprehension.) This Wikipedia article contains a brief listing of directories found on many Linux operating systems.
ls
Colors and File Type Indicators
On Turing, most likely ls
will produce colored output with or without the
-F
flag.
This is a feature of a modern ls
command (GNU ls
).
These are the meaning of colors and indicator characters
placed after the file/object name printed by ls
:
Color | Indicator | Meaning |
---|---|---|
Black/white | (no character) | Regular file |
Blue | / |
Directory |
Green | * |
Executable file |
Teal (cyan) | @ |
Symbolic link |
There are many more colors and indicators; only most frequently used ones
are noted here.
Further, the colors may be different on other systems
(e.g. on MacOS where BSD ls
is used).
Colors is a matter of preference;
there are ways to tweak these
to suit your taste.
This is an advanced topic which we will leave up to you to pursue if interested.
Symbolic Link
Symbolic link, also known as symlink or soft link, is an “alias” to a file or directory object residing elsewhere on the filesystem. With symlink, we can refer to, open, read, and write files using the symlink’s name, instead of the original path. For example:
$ ls -l /etc/system-release
lrwxrwxrwx 1 root root 14 Sep 10 2018 /etc/system-release -> redhat-release
$ cat /etc/system-release
Red Hat Enterprise Linux Server release 6.10 (Santiago)
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.10 (Santiago)
Why Use Symlink?
Symlink is useful for three purposes:
-
To shorten or simplify access to files or directories that may have complicated path name. Suppose you found a nice reference on Dvorak keyboard layout in your system, located at
/usr/share/doc/kbd-1.15/dvorak.txt
:$ head -n 10 /usr/share/doc/kbd-1.15/dvorak.txt Dvorak Layout Diagram By request, here's a quick typographical representation of the Dvorak keyboard layout. I never put this up before because when I created my pages all (more like "both of") the other Dvorak pages had layout pictures. For a prettier picture, try the Dvorak International Web Page or The Dvorak Keyboard by Marcos Huerta. Shifted:
You can make a reference to this file in your current directory by making a symlink to that file:
$ ln -s /usr/share/doc/kbd-1.15/dvorak.txt .
Now you can read this reference by, typing, e.g.
less dvorak.txt
from this subdirectory instead of writing the lengthyless /usr/share/doc/kbd-1.15/dvorak.txt
. -
To provide backward compatible names when you reorganize your files. Suppose originally you wrote a program by the name of
jobs.sh
. But later you realize that this name is not very descriptive—it should have been namedfilter-my-jobs.sh
and be placed in your~/bin
as a general-purpose tool. So you did—you renamed the file. But you already wrote many other ad-hoc scripts that needs access tojobs.sh
in your current directory. So, what should you do? Ideally you should edit all those ad-hoc scripts to reflect the new name/location; but that may be an extensive work. In the short term, you can provide a symlink namedjobs.sh
that points to~/bin/filter-my-jobs.sh
. Here is the complete sequence of commands:# Move the current tool to the general location $ mv jobs.sh ~/bin/filter-my-jobs.sh # Check: jobs.sh no longer exists here $ ls -l jobs.sh ls: cannot access jobs.sh: No such file or directory # Provide link in place of the old name, that points to the new location: $ ln -s ~/bin/filter-my-jobs.sh jobs.sh # Check again: jobs.sh is a symlink $ ls -l jobs.sh lrwxr-xr-x 1 tjones users 35 Aug 23 2016 jobs.sh -> /home/tjones/bin/filter-my-jobs.sh
Now your old scripts will still work using the old name (
jobs.sh
), but newer tools can usefilter-my-jobs.sh
for the name of this program. -
To save space by avoiding duplicates of large files—especially those that are accessed read-only. In the DeapSECURE shared directory, we have a number of very large datasets. (For example, the complete spam dataset is more than 45 GB!) It does not make sense if everyone makes a copy of these datasets, whereas all you need is to read them.
In this case, you will want to make a symlink in an appropriate directory that points to the location of the original dataset. Here is an example:
# Check: spam-dataset does not exist yet $ ls -l spam-dataset ls: cannot access spam-dataset: No such file or directory $ ln -s /scratch-lustre/DeapSECURE/datasets/untroubled-spam spam-dataset
Now you can work with the spam dataset by using a much shorter file path:
$ head spam-dataset/1998/03/890929468.24864.txt
Return-Path: <aj881c@ix.netcom.com> Delivered-To: bguenter-bait@mikhail.qcc.sk.ca Received: (qmail 881 invoked by alias); 1 Feb 1998 08:47:36 -0000 Delivered-To: bait@mikhail.qcc.sk.ca Received: (qmail 875 invoked from network); 1 Feb 1998 08:47:35 -0000 Received: from iis.cybermania.net (208.135.0.2) by mikhail.qcc.sk.ca with SMTP; 1 Feb 1998 08:47:35 -0000 Received: from [204.31.253.89] by iis.cybermania.net (SMTPD32-3.03) id A5CC641B01EA; Sun, 01 Feb 1998 03:43:56 -0500 From: aj881c <aj881c@ix.netcom.com>