This lesson is in the early stages of development (Alpha version)

DeapSECURE module 1: Introduction to HPC: Extra Lesson Materials: UNIX Filesystem

This is a page with extra lesson materials that we feel would be too much to include in the main lesson pages. They are included for completeness, and to entertain curious minds (which we hope you have).

Key Places on Turing HPC Filesystem

In this section, we want to discuss in more depth a few key places (directories) in the filesystem of Turing cluster. Let us revisit the following illustration that shows you representative directories and files on Turing HPC:

Tree diagram showing key/illustrative directories and files on Turing

Directly under the top-level /, there are many directories. These are a few locations that you need to be aware of, because they have specific purposes.

More about UNIX-like Filesystem Hierarchy

UNIX-like operating systems have very similar structures in the naming of directories according to their purposes. For example, /bin, /usr/bin, and /usr/local/bin contains executable programs. (The name bin stems from binary, because many executable programs contains machine instructions in binary numbers, generally not for human comprehension.) This Wikipedia article contains a brief listing of directories found on many Linux operating systems.

ls Colors and File Type Indicators

On Turing, most likely ls will produce colored output with or without the -F flag. This is a feature of a modern ls command (GNU ls). These are the meaning of colors and indicator characters placed after the file/object name printed by ls:

Color Indicator Meaning
Black/white (no character) Regular file
Blue / Directory
Green * Executable file
Teal (cyan) @ Symbolic link

There are many more colors and indicators; only most frequently used ones are noted here. Further, the colors may be different on other systems (e.g. on MacOS where BSD ls is used). Colors is a matter of preference; there are ways to tweak these to suit your taste. This is an advanced topic which we will leave up to you to pursue if interested.

Symbolic link, also known as symlink or soft link, is an “alias” to a file or directory object residing elsewhere on the filesystem. With symlink, we can refer to, open, read, and write files using the symlink’s name, instead of the original path. For example:

$ ls -l /etc/system-release
lrwxrwxrwx 1 root root 14 Sep 10  2018 /etc/system-release -> redhat-release

$ cat /etc/system-release
Red Hat Enterprise Linux Server release 6.10 (Santiago)

$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.10 (Santiago)

Symlink is useful for three purposes:

  1. To shorten or simplify access to files or directories that may have complicated path name. Suppose you found a nice reference on Dvorak keyboard layout in your system, located at /usr/share/doc/kbd-1.15/dvorak.txt:

    $ head -n 10 /usr/share/doc/kbd-1.15/dvorak.txt
    
                                 Dvorak Layout Diagram
    
       By request, here's a quick typographical representation of the Dvorak
       keyboard layout. I never put this up before because when I created my
       pages all (more like "both of") the other Dvorak pages had layout
       pictures. For a prettier picture, try the Dvorak International Web
       Page or The Dvorak Keyboard by Marcos Huerta.
    
    Shifted:
    

    You can make a reference to this file in your current directory by making a symlink to that file:

    $ ln -s /usr/share/doc/kbd-1.15/dvorak.txt .
    

    Now you can read this reference by, typing, e.g. less dvorak.txt from this subdirectory instead of writing the lengthy less /usr/share/doc/kbd-1.15/dvorak.txt.

  2. To provide backward compatible names when you reorganize your files. Suppose originally you wrote a program by the name of jobs.sh. But later you realize that this name is not very descriptive—it should have been named filter-my-jobs.sh and be placed in your ~/bin as a general-purpose tool. So you did—you renamed the file. But you already wrote many other ad-hoc scripts that needs access to jobs.sh in your current directory. So, what should you do? Ideally you should edit all those ad-hoc scripts to reflect the new name/location; but that may be an extensive work. In the short term, you can provide a symlink named jobs.sh that points to ~/bin/filter-my-jobs.sh. Here is the complete sequence of commands:

    # Move the current tool to the general location
    $ mv jobs.sh ~/bin/filter-my-jobs.sh
    
    # Check: jobs.sh no longer exists here
    $ ls -l jobs.sh
    ls: cannot access jobs.sh: No such file or directory
    
    # Provide link in place of the old name, that points to the new location:
    $ ln -s ~/bin/filter-my-jobs.sh jobs.sh
    
    # Check again: jobs.sh is a symlink
    $ ls -l jobs.sh
    lrwxr-xr-x 1 tjones users 35 Aug 23  2016 jobs.sh -> /home/tjones/bin/filter-my-jobs.sh
    

    Now your old scripts will still work using the old name (jobs.sh), but newer tools can use filter-my-jobs.sh for the name of this program.

  3. To save space by avoiding duplicates of large files—especially those that are accessed read-only. In the DeapSECURE shared directory, we have a number of very large datasets. (For example, the complete spam dataset is more than 45 GB!) It does not make sense if everyone makes a copy of these datasets, whereas all you need is to read them.

    In this case, you will want to make a symlink in an appropriate directory that points to the location of the original dataset. Here is an example:

    # Check: spam-dataset does not exist yet
    $ ls -l spam-dataset
    ls: cannot access spam-dataset: No such file or directory
    
    $ ln -s /scratch-lustre/DeapSECURE/datasets/untroubled-spam spam-dataset
    

    Now you can work with the spam dataset by using a much shorter file path:

    $ head spam-dataset/1998/03/890929468.24864.txt
    
    Return-Path: <aj881c@ix.netcom.com>
    Delivered-To: bguenter-bait@mikhail.qcc.sk.ca
    Received: (qmail 881 invoked by alias); 1 Feb 1998 08:47:36 -0000
    Delivered-To: bait@mikhail.qcc.sk.ca
    Received: (qmail 875 invoked from network); 1 Feb 1998 08:47:35 -0000
    Received: from iis.cybermania.net (208.135.0.2)
      by mikhail.qcc.sk.ca with SMTP; 1 Feb 1998 08:47:35 -0000
    Received: from [204.31.253.89] by iis.cybermania.net
      (SMTPD32-3.03) id A5CC641B01EA; Sun, 01 Feb 1998 03:43:56 -0500
    From:     aj881c <aj881c@ix.netcom.com>