A Brief Introduction to Python

Overview

Teaching: 30 min
Exercises: 10 min

Questions

What is Python?

How do I use Python to perform computation with numbers and text?

How do I handle a large amount of data using Python?

How do I write a tool in Python?

Objectives

Understand essential elements of a Python programming language.

Being able to write simple Python programs to process data.

This episode serves a crash course to Python programming language. It covers only the bare essential to get you started with Python. At the end of this lesson, there will be a pointer to lesson series which you can pursue on your own to become proficient with Python.

What Is Python? Why Python?

Python is a high-level, general purpose programming language. Python emphasizes code readability and ease of use, and its syntax encourages good programming practices as well as productivity. Today, Python has become one of the most popular programming languages, even widely deployed by tech giants such as Google, Amazon, Facebook, … Python comes with a vast array of powerful libraries, such as:

numpy and scipy for numerical calculations (“number crunching”),
pandas for data analytics,
matplotlib and seaborn for plotting and visualization,
scikit-learn, tensorflow, keras, pytorch for machine learning,
nltk for natural language processing,
pycrypto for cryptography,
scrapy, beautifulsoup, and selenium for web scraping.

These libraries enable programmers to accomplish their goals (i.e. “obtain the largest five eigenvalues of a matrix” or “create a machine learning model to flag spam emails”) without having to know the details of the complex underlying algorithms.

Those who have used other programming languages will find it rather easy to pick up Python. Python is an interpreted language, which means that a Python interpreter is required to run a Python program.

How Does Python Compare to C/C++ Language?

C emphasizes low-level details of the program, down to the bare metal details (such as, integers, bits, pointers, data alignment). C++ provides much more convenience by adding object-oriented capabilities and significantly expanded productivity libraries (standard C++ library, Boost), yet still allowing (and often requiring) programmers to take care of machine-level issues. Python, on the other hand, emphasizes high-level programmability without forcing programmers to worry about gory details.

Compared to C/C++, Python provides a much gentler learning curve to new programmers, and much shorter time-to-productivity.

C/C++ programs are compiled to produce a binary executable programs, containing machine instructions that can be executed directly by the processor (CPU). In contrast, because Python is an interpreted language, a Python interpreter is always required to run a Python program. The intepreter translates the human-friendly Python statements into instructions to be executed by the CPU one bit at a time. The process of interpreting the program in this way takes time; therefore interpreted computer programs run significantly slower than the binary executable programs. Quite often, a well-written C/C++ program can accomplish the same task up to 5-100 times faster as an equivalent implementation written in pure Python. We will discuss this issue more in module 6 of this training program. However, this picture would change for programs that rely heavily on high-performance libraries made available for Python (such as NumPy, TensorFlow, etc.). In this training program, we strive to use Python libraries in a manner that are conducive to high-performance computation.

In this training program we will focus on Python 3, which is the current version of Python.

Accessing Python from Turing HPC

On Turing, as in many HPC systems, access to available software packages is managed using a shell command called module. (Due to the age of the cluster, two variants of module is available on Turing. We recommend using the newer variant called lmod, by first invoking enable_lmod.) Here is the sequence of commands needed:
$ enable_lmod
$ module load python
$ module load ipython    # recommended, see below
$ module list            # optional, but please try
(You will need to repeat these commands the next time you login to Turing again.)

By default, module load python will load Python 3.6 on Turing. Python 3.7 is also available; use module load python/3.7 to specify the exact software version.

The module command requires a subcommand name, and sometimes additional arguments. Here are the most frequently used invocations:

module avail — lists the available module on the system;

module load PACKAGENAME — loads the module PACKAGENAME, i.e. make this software available in the shell;

module unload PACKAGENAME — unloads the module PACKAGENAME;

module whatis PACKAGENAME — prints information about module PACKAGENAME;

module list — lists the currently loaded modules.

Python Operating Modes

There are two ways to interact with the Python interpreter: the interactive mode and the script mode.

Interactive mode

Python interactive mode allows you to execute Python statements instantly from the command line. This is very much like the way we interact with UNIX shell: we enter a Python statement, press Enter, Python runs the statement, and prints the result (if applicable), then returns to the prompt again. To launch the Python interpreter, invoke python command from your (UNIX) command line:

$ python

Python 3.6.9 (default, Sep 17 2019, 12:17:19)
[GCC Intel(R) C++ gcc 4.9.4 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

The >>> indicates Python’s prompt; it tells you that you are in the interactive mode.

To exit Python, use the quit() or exit() command, use the Ctrl+D keyboard shortcut.

>>> quit()

ipython—Better Interactive Python

In this workshop, we recommend a more sophisticated Python front-end called ipython—short for interactive Python. It provides better command history, output history, syntax highlighting, tab completion, and customizability. It even doubles as a UNIX-like “shell” for the most commonly used commands. To use ipython, please make sure that the ipython module is loaded after python module.
$ module load python      # if python has not been loaded
$ module load ipython     # if ipython has not been loaded
$ ipython
Python 3.6.9 (default, Sep 17 2019, 12:17:19)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:
Notice that the prompt is different. The number 1 will be incremented every time a new command is entered.

Arithmetic with Python

From now on, you can type valid Python statements to execute and get outputs from. Python in interactive mode can be used to perform arithmetic:

>>> 2 + 3

>>> 5 / 2

2.5

Python supports the usual computer arithmetic operators:

operator	Meaning	Examples
`+`	Addition	`5 + 7` , `7.43 + 54`
`-`	Subtraction	`10 - 3` , `3.5 - 10`
`*`	Multiplication	`5 * 7` , `3.5 * 10`
`/`	Division	`4 / 2` , `10 / 3` , `1 / 7`, `14 / 0.2`
`**`	Exponentiation	`2 3` , `9 0.5`
`(` and `)`	Group expression for evaluation; override standard operator precedence	`5 * (3 + 4)` , compare against `5 * 3 + 4`

The usual mathematical convention for operation order (which one gets computed first, also termed operator precedence) applies: Exponentiation is computed first, followed by multiplication and division, then addition and subtraction. Python documentation has a reference table of operator precedence which you can be helpful to ensure that you write correct Python expressions. Please try many other expressions you can think of so you become comfortable with Python as a calculator.

Computing Total Price and Splitting the Bill

Mary and her two roommates split their grocery expenses evenly. Mary just bought the following:

A dozen eggs for $1.49

A loaf of bread for $2.79

A bag of potato chips for $1.99

A bottle of hand soap for $2.49

A stack of paper plates for $3.99

In Virginia, tax rates are 2.5% for food and 6% for non-food. Please create Python expressions to do the following:

Compute the total cost of the grocery bill.

Compute the payment of each person to cover this bill.
Solutions

We can use parentheses to compute the quantities in one step. This is just one way among many ways to get the computation done.
(1.49 + 2.79 + 1.99) * 1.025 + (2.49 + 3.99) * 1.06
((1.49 + 2.79 + 1.99) * 1.025 + (2.49 + 3.99) * 1.06) / 3
The total cost is $13.30 and each one has to pay $4.43 (but one has to pay one cent more to cover the total cost).

The examples above show that Python can represent whole numbers (integers) and real numbers (those with decimal points). While integers are represented perfectly, real numbers are represented with limited number of digits (approximately 15 in today’s computer) and are subject to roundoff errors. This shows up in at least one of the examples above, where 13.29555 was printed as 13.295550000000002. Real numbers have sufficiently long number of digits to allow for reliable computation in the vast majority of cases. We will not discuss this further as it is an advanced topics.

Scripting with Python

The interactive mode is useful but not always practical. Imagine you have thousands of statements to execute in order, and you need to repeat the process at least one more time. In such a case, we can save these Python statements into a text file called a Python script and have the Python interpreter run them. Python scripts, or programs, usually have the .py filename extension.

`Hello World`: Our First Script

In interactive mode, Python prints the result of an expression immediately after it is executed. In the scripting mode, an expression has to be printed in order to be output (to the terminal or to a file).

Let us print a simple “Hello World” message—which is a tradition when learning a new programming language. Using a text editor (e.g. nano), create a text file named hello.py containing one statement:

print("Hello World")

Save the file, then execute the script:

$ python hello.py

Hello World

In general, a Python script is executed from the UNIX shell in this way: python /PATH/TO/SCRIPT.py. Any command you are able to run in the interactive mode can be added to a script and executed by the Python interpreter. We can print the result of a math expression:

print(1 + 2 + 3 + 4 + 5)

We can use multiple print statements to print multiple text lines. A print can also print multiple items in a single statement:

print("Some math examples")
print("Sum of 1 through 5 is ", 1 + 2 + 3 + 4 + 5)
print("Square root of 1, 2, 3, 4 are", 1**0.5, 2**0.5, 3**0.5, 4**0.5)

Please run this script several times. You will notice that the output are always printed in the order the statements appear in the script. This is a bedrock principle of a sequential computer program: the computer will read and execute the commands/statements one at a time, and in the order these commands appear in the program. Remembering this principle will help you (1) predict the outcome of a computer program just by reading it, and (2) avoid confusion about what a program will do.

Statements, Indentation, Code Blocks

Python language syntax has a few rules that distinguish it from other computer languages. Let us use the following program snippets to illustrate the notable features of Python language:

# This is a sample program written in Python

def greet(name, gender, majors, graduate_year):
    if gender == "M":
        pronoun = "He"
        pronoun3 = "his"  # third-person pronoun
    else:
        pronoun = "She"
        pronoun3 = "her"
    print("Hello, this is", name)
    print(pronoun, "has", len(majors), "major(s):")
    for m in majors:
        print("-", m)
    print(pronoun, "completed", pronoun3, "education in", graduate_year)
    print()

greet("Elaine", "F", ["Art", "Mathematics", "History"], 1993)
greet("Johnson", "M", ["Sociology"], 2000)

That is a complete Python program which can be run to produce the following output:

Hello, this is Elaine
She has 3 major(s):
- Art
- Mathematics
- History
She completed her education in 1993

Hello, this is Johnson
He has 1 major(s):
- Sociology
He completed his education in 2000

Unlike C/C++, a Python statement generally ends with the new line. There is no mandatory end-of-statement marker like a semicolon (;), although a semicolon is indeed recognized as such. If a line is so long that it has to wrap, please terminate the incomplete line by appending a backslash character (\).

Python is well known for its extensive and strict use of whitespaces to indent program lines. (To indent a program line means to add a number of whitespace characters before the first non-space character in that line.) In Python, like any other programming language, a set of commands or statements can be grouped into a block, which then becomes an integral part of a language construct (loops, conditionals, function definition). Python uses indentation to distinguish a block of statements. A block is clearly identified by its consistent indentation level. In the example program shown earlier, the def greet(...): statement is followed by a code block that starts with if gender == "M": and terminates after the lone print() statement. Similarly, the if clause initiates a new block containing two program lines, followed by the else clause and yet another block. The “if–block–else–block” sequence constitutes a complete construct for conditional execution—as we will learn later in this episode.

Python’s convention for code blocks is in contrast to the case of C/C++ language, where a code block in the if, for or a function definition like int main(...) is clearly delimited by a matching pair of curly brackets {…}. This means if you have started a new block using four-whitespace indentation, you must indent every command in this block with four whitespace characters. Python will catch inconsistencies in block-level indentation and issue a syntax error. While this may appear to be overly restrictive, it actually encourages good programming behavior and readability of Python programs. The most widespread convention among many Python programmers is to prepend extra four whitespace characters to introduce a new (sub)block. We recommend that you also follow this practice. We will see this more practically in the next sections.

Comments

Comments (non-executable texts) can be added to a Python script by prepending the text with the hash # character. Comments can also appear after a statement. Both cases appear in the sample program above.

Fear not, all the constructs used in the program above will be explained shortly, so you will understand what the program is doing after finishing this episode. Finally, the rules regarding indentations and comments above apply not only to Python scripts, but also to Python statements entered in the interactive mode.

Basic Elements of a Program

For the rest of this episode, you will learn the basics building blocks of a program. We will learn these in the context of Python programming language, but they are applicable in many other languages. These are just a few things that you can use in your scripts to get started with computer programming. At the end of this episode we will provide some pointers for further learning.

As a roadmap, here are the key elements included in this episode:

Variables;
Data types, with initial emphasis on numbers and strings;
Statement block;
Looping through iteration using the for statement;
Conditional statements (if, elif, `else);
Lists;
Arrays using numpy;
Data structure using dict;
Functions;
Script arguments.

We will also present a quick overview of key Python libraries that you may find useful for cybersecurity applications.

Variables

Arithmetic is useful, but algebra makes mathematics even more useful by allowing us to make manipulations and define relationships among yet-to-be-specified quantities, denoted by symbols such as x, y, and so on. The same thing goes with computer program: A variable plays the role of symbols in algebra. In Python, a variable is simply a label for a value (or another type of object we will learn shortly). This gives us a handle to refer to that value indirectly by the name of the variable. We define a variable by assigning a value to it, using the = operator. Some examples:

a = 4
b = 5 / 2
c = "Hello World"
d = a + b
name = "Thomas"

Several rules regarding Python variables:

Variable name can contain only letters (a-z, A-Z), digits (0-9), and underscores (_). The name cannot start with a digit. Names that start with an underscore are often reserved (e.g. __file__, __name__) or have certain meanings. (If you just get started with Python, it is best to use variable names that begin with a letter until you know the uses of names that start with an underscore.)
Names are case sensitive: For example, name and Name and NAME are three distinct variables.
Use names that conveys the meaning of the value in a concise way. For example, use weight = 127 to represent a weight quantity with a value of 127; while iurj2k3u = 127 obfuscate the meaning of the variable.

Be aware of Python’s reserved words and absolutely avoid using them for variable names:

False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield

It is best to also avoid built-in function and type names and standard Python library names as well as popular library names like numpy and others mentioned at the beginning of this episode.

Variables can be printed using print statement as before:

print(a)
print(a, "+", b, "=", d)
print("Greetings ", name)

4
4 + 2.5 = 6.5
Greetings  Thomas

The value of a variable can be updated; after that, the variable reflects the updated value. For example:

a = 4
print(a)
a = 27
print(a)
b = a - 7
print(b)

4
27
20

Data Types

Python supports various data types. We have seen three data types so far:

integers (whole numbers), representing discrete quantities, such as 4 and -30;
floats (real numbers), representing continuous values, such as 2.7 and 0.00471659, clearly indicated by the presence of a decimal point;
strings (sequence of characters), such as "Hello World" or 'Hello Thomas'. Both single and double quotes are supported, but a string has to be opened and closed using the same quotation character, not like: "This is a bad string'.

Integers are used to represent discrete quantities such as the count of a particular type of events, the number of processors a computer have. Real numbers are needed for quantities that can be arbitrary in value, such as lengths, power, probability, etc.

The type function can be used to determine the type of a variable. For example,

a = 4
b = a / 2
c = "Hello world"
print(type(a))
print(type(b))
print(type(c))

<class 'int'>
<class 'float'>
<class 'str'>

Assigned Value Determines Data Type

Unlike C and C++, where the data type of a variable is fixed at compile-time by explicitly declaring the type, the data type of a Python variable is determined by the value assigned to that variable. It is possible that a variable will change type because it is assigned a new value with a different datatype:
a = 24
print(type(a))
a = "Hello world"
print(type(a))

Integers

An integer is simply a whole number, which can be positive, zero, or negative. It cannot store fractional numbers (thus no decimal point). Python’s integer has an unlimited precision: It can store an arbitrarily large number! (Try it.)

Why Integers?

An integer is actually the most basic representation of data in a digital computer. In digital computers, integers are represented as binary numbers (zeros and ones) with a fixed number of digits. When we speak of a 32-bit or 64-bit computers, this term actually refers to the number of bits that the processor registers have (a register can be thought of as a computer’s native “variable”). For example, as of year 2019, an Intel Core i3, i5, i7, or i9 processor has native 64-bit registers, but it can also process 8-, 16-, and 32-bit numbers. How can Python support arbitrarily long integers? It uses software to emulate the operations on such long integers.

In Python, the / division operator always produces a real number, regardless the type of the operands involved. To force an integer division, use the // operator.

Division Exercises

What are the results of the following statements? Before running these on the IPython prompt, think what the results should be. Then run them and observe the outcome.
7 / 3
7 // 3
7.0 // 3
8 / 3
8 // 3
8.9 // 3
8 // 2.95
This exercise exhibits the nature of Python division operators under varying circumstances.

Binary, Octal, and Hexadecimal Numbers

For system-level programming (involving operating system, hardware, networking, etc.), integers play a crucial role. In this context, the underlying binary nature of integers is very much exploited.

Python makes it convenient to deal with this type of data.

Binary (base-two) numbers are denoted by the prefix 0b (that is, a digit zero plus a literal letter b). It must be followed by a sequence of ones and zeros. For example, 0b101110 is equal to the number 46 (forty-six) = 1×32 + 0×16 + 1×8 + 1×4 + 1×2 + 0×1 in our customary numbering system—the decimal system.
Octal (base-eight) numbers are denoted by the prefix 0o (a digit zero plus a literal letter o), followed by sequence of digits, each ranging from 0 through 7, inclusive. As an example, 0o775 stands for 509 in the decimal system.
Hexadecimal (base-sixteen) numbers are denoted by the prefix 0x (a digit zero plus a literal letter x), followed by a sequence of digits 0 through 9 as well as letters a through f. This gives a total of sixteen symbols (ten digits and six letters) to represent a number in a base-sixteen representation. For example, 0xa7f3 stands for number 42995 in the decimal system.

(Note that the letters in the representation above are case-insensitive. For example, 0B0110 is equivalent to 0b0110; 0xA7F3 and 0Xa7F3 and 0XA7F3 are all equivalent to 0xa7f3. However, be consistent with the case convention you use in your program. Most people use lowercase for the prefix, so please follow that also in your program.)

If you need a tutorial or review of these seemingly fancy numbering systems, we would refer you to some excellent resources on the Internet:

Number Systems—Decimal, Binary, Octal and Hexadecimal, a brief tutorial by Rukshani Athapathu on building a number in binary, octal, and hexadecimal representations.
Number Conversion—Binary Octal Hexadecimal, a brief tutorial by DYclassroom (Yusuf Shakeel) on the various number systems and how to convert a number between different representations.
Binary number (from Wikipedia), an in-depth treatise covering history, representation, arithmetic, etc.
Hexadecimal (from Wikipedia).

Strings

A string is simply an ordered sequence of characters. Unlike C language, Python string comes natively with a rich set of features which make string processing a breeze. Here is a quick overview (many “by example”) of the features of Python strings.

The length of the string can be queried using the len function.

A = "Hello world"
print(len(A))

Quite often, one would need to extract a character or a substring from a string. Python uses the same convention as C for element indexing: the characters in a string are indexed using an integer from 0, 1, 2, …, using the square brackets as the indexing operator.

A = "Hello world"
B = A[0]
print(B)
print(A[4])
print(A[11])

H
o

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

The last line raises an error called IndexError because the index (11) is beyond what is valid for the string (0 .. 10 in this case). Python supports negative index, which is counted from the end of the string:

A = "Hello world"
print(A[-1])
print(A[-5])
print(A[-11])

d
w
H

How do we extract a substring? Python has the concept of a slice, which defines a range of elements to pick out from a given sequence. Let’s start with some examples:

A = "Hello world"   # A has 11 characters
print(A[1:5])
print(A[:3])
print(A[3:])
print(A[3:-3])

ello
Hel
lo world
lo wo

With a slice, two numbers are given, separated by a colon. The slice syntax is therefore

STRING_VAR [ START : STOP ]

If START or STOP is omitted, then it is implicitly taken to be the beginning or end of the sequence, depending on which index is omitted. Python takes a rather quirky convention: whereas the START element is included in the slice result, the STOP element is not.

Strings can be joined (concatenated) to form a longer string. For example:

A = "Hello world"
B = A + "Thomas"
print(B)

Hello worldThomas

Modifying a String

A string is an immutable object in Python: which means, once created, it cannot be modified. For example, this kind of statement is invalid for string: A[1] = "a". To modify a string, we will need to create a new string to include the modification. How can we change A from “Hello world” to “Hallo world” ? (Hint: use the concatenation (+) operator.
Solution
B = A[:1] + "a" + A[2:]
print(B)

There are many more capabilities built into a Python string! Python uses an object-oriented approach to manipulate strings. A string has the upper and lower methods to create an uppercase and lowercase versions of the string, respectively; split to split the string at a specified separator. Some examples:

A = "Hello world"
print(A.upper())
print(A.split())

HELLO WORLD
['Hello', 'world']

(The last command yields a list, which we will cover very soon.)

If you are working with text, we recommend you to learn more about Python strings through the following resources:

Python string tutorial (from TutorialsPoint).
Reference documentation for string methods.

We encourage you to experiment using interactive Python mode to gain an understanding on string and other topics related to Python. There is no better way to learn than to experiment with the language elements directly!

Converting Data Types

Quite often, we have to convert data from one type to another. For example, data read from a text file would be a string. To convert this to a number that can be processed numerically, we use the int or float functions (for conversion to an integer or a real number):

rr = '71'
ss = '142.5'
R = int(rr)
S = float(ss)
print(R)
print(S)
print("Double all up:")
print(R * 2)
print(S * 2)

Conversely, a number can be converted to a string by the str function.

Convert ‘Em!

What would be the output of the following snippet?

R = 71
S = 142.5
print(float(R))
print(int(S))
R_str = str(R)
print(R_str, "is a", type(R_str))
age_message = "My age is " + str(R) + "years old"
print(age_message)

Solutions

71.0
142
71 is a <class 'str'>
My age is 71years old

Lists

One important reason of using a computer is its ability to store and process a lot of data. For this reason, Python provides a number of container data types, which are capable of containing multiple values (or objects). In this short lesson we will cover only two types in details, namely list and dict.

A list is an ordered sequence of values or objects. Here are a few examples of list objects:

blank = []
trio = [1, 2, 3]
record = [1998, 3, "204.31.253.89", "United States"]

# Now print them out:
print(blank)
print(trio)
print(trio[0])
print(trio[1:3])
print(len(record))
print(record[2])

[]
[1, 2, 3]
1
[2, 3]
4
204.31.253.89

A list object has many similarities to a string:

The contained elements are ordered and can be indexed by integers from 0, 1, … ;
list supports slicing operator;
len function acting on a list object returns the number of elements in that list.

However, unlike a string, a list can contain values with arbitrary data types, and its contents can be altered (i.e. it is mutable). Items can be added to, or removed from, the list. Here are a few actions that can be done for a list object named L:

Add a new item at an arbitrary location: L.insert method;
Add a new item at the end of the list: L.append method;
Update the value of the i-th element: L[i] = new_value;
Sort the entire list: L.sort method;
Delete an item at index i using del L[i];
Delete the contents of the entire list: L.clear method.

List Manipulation in Action

What is the output of this program?

fruits = ["banana", "apple", "mango"]
print(fruits)
fruits.append("pineapple")
print(fruits)
fruits.insert(1, "orange")
print(fruits)
fruits.sort()
print(fruits)
fruits[1] = "pear"
print(fruits)
del fruits[2]
print(fruits)
fruits.clear()
print(fruits)

Solution

['banana', 'apple', 'mango']
['banana', 'apple', 'mango', 'pineapple']
['banana', 'orange', 'apple', 'mango', 'pineapple']
['apple', 'banana', 'mango', 'orange', 'pineapple']
['apple', 'pear', 'mango', 'orange', 'pineapple']
['apple', 'pear', 'orange', 'pineapple']
[]

To learn more about list and how to effectively use it, please refer to the TutorialsPoint’s lesson on list.

A list can be used in many ways in Python; but the most common uses are:

to store a collection of items of the same type (this collection is often termed an array):
```
trio = [1, 2, 3]
```
to store a collection of items that has a defined structure (this kind of collection is often termed a data structure or a record):
```
record = [1998, 3, "204.31.253.89", "United States"]
```
In the example above, numbers 1998 and 3 refer to the year and month of a spam email, whereas the strings 204.31.253.89 and United States refer to the deduced originting IP and country. The example also shows that the data types of the items are not uniform (two integers and two strings).

These are not an exhaustive list of possible uses of a list.

A list can also be nested, that is, contain other lists, to create a multidimensional array or a table:

# An example two-dimensional array
sudoku = [ [ 4, 9, 2 ],
           [ 3, 5, 7 ],
           [ 8, 1, 6 ] ]

# An example of a structured table
# (an array of records)
results = [ [1998, 3, "204.31.253.89", "United States"],
            [1999, 1, "194.213.210.20", "Czech Republic" ],
            [1999, 12, "202.96.198.238", "China"] ]

Accessing elements would involve two indexing operators:

print(sudoku[0][1])
print(results[1][2])

9
194.213.210.20

The first number indexes the outermost dimension, i.e. the “row”, the second number indexes the inner dimension, i.e. the “column”.

Repeating Actions: `for` Loop

Now that we have a way to store a bunch of data in a list, we need a way to perform repetitive actions on these data. Python uses the for statement to define a loop construct to repeat actions over a sequence of data.

Here is an illustration of the for statement:

for A in [ 0, 1, 2, 3 ]:
   print(A)

The syntax of a for statement is:

for LOOP_VARIABLE in SEQUENCE:
    STATEMENTS...

Here, SEQUENCE is a sequence object (list, string, dict, etc.) containing zero or more items which we want to iterate over. STATEMENTS is a placeholder for a code block (explained earlier), which contains one or more Python statements to be repeated. Python’s for statement has a different behavior from C-style for. In most common cases, where SEQUENCE contains n items, the STATEMENTS block will be executed n times (once for every element in SEQUENCE). The items will be iterated in order (from the beginning to the end of the sequence), and the value of the LOOP_VARIABLE will be set to the current item. Be aware that the colon after the SEQUENCE is mandatory, as well as the indentation of STATEMENTS.

Let’s revisit the illustration above: the SEQUENCE is a list of four elements: [0, 1, 2, 3]; therefore the for loop will execute the STATEMENTS four times. In this case, the STATEMENTS simply consists of one statement: print(A). The value of A is set to 0 at the first iteration, then it is updated to 1 at the second iteration, and so on.

In every iteration, LOOP_VARIABLE will be set to one item from SEQUENCE; so that after , as the name suggests, is a variable that will change value at every iteration: In Python, the value of LOOP_VARIABLE will be set to the The value of the item will be copied to LOOP_VARIABLE at every

is a variable whose value will be set to

A string is a sequence of characters, therefore it can also be used as the sequence to loop over:

word = "oxygen"
for char in word:
    print(char)

The action of this loop is illustrated as follows:

Illustration of a loop over the word "oxygen"

It is very common to iterate over a range of values, something like 0, 1, 2, ..., 100; or 1, 4, 7, 10, ... 28. Python provides a range function to define a sequence-like object that can be iterated using for. The range function has several possible syntax:

range(STOP) — yielding a sequence of 0, 1, 2, … STOP-1.
range(START, STOP) — yielding START, START+1, … STOP-1. Again, the Python’s convention is that STOP is excluded from the result.
range(START, STOP, STEP) — yielding START, START+STEP, START+2*STEP, … not including STOP and beyond.

All START, STOP, and STEP arguments have to be integers.

`range` Exercises

What are the outcome of these statements?

for A in range(5):
    print(A)

for A in range(4,8):
    print(A)

for A in range(32,45,3):
    print(A)

Solutions

Making Sum with for Loop?

One common use of a for loop is to create a sum, or perform an aggregation (maximum value, minimum value, average, etc.)

Suppose you need to calculate the sum of values contained in a list L. One way to achieve this is to use a for statement:
L = [1.5, 3.7, 4.0, -5.1 ]
sum_L = 0.
for val in L:
    sum_L = sum_L + val
print(sum_L)
This will yield 4.1. However, Python has the built-in sum function to do exactly this:
L = [1.5, 3.7, 4.0, -5.1 ]
print(sum(L))
Voila, you just shaved three lines off the program! Python has a lot of nifty tools like the sum function, which can make your programs a lot cleaner, shorter, effective, and more fun to write.

Performance Note

It pays to learn more about Python functionalities. Unlike lower-level languages like C, where we cannot avoid using a loop to perform aggregation like making a sum, Python provides a lot of commonly used functions which saves us from writing as many hand-written loops. Besides making your program shorter, these functions help avoid a lot of common mistakes. More importantly, these functions are often written in C/C++/Fortran, yielding much higher performance compared to pure Python implementation.

Conditional Statement (`if` – `else`)

A programs often has to take actions only when certain conditions are fulfilled. Sometimes there are different actions for different conditions. This is done in Python using the if statement.

gender = "M"
if gender == "M":
    pronoun = "He"
    pronoun3 = "his"
else:
    pronoun = "She"
    pronoun3 = "her"
print(pronoun, "loves", pronoun3, "cat")

Here, gender = "M" is an assignment statement, whereas gender == "M" is a comparison expression. The latter yield a logical value (True or False). The values of pronoun and pronoun3 variables depend on whether gender is equal to a string "M", therefore the message that is printed would also depend on the value of gender.

Notice that Python does not require parentheses to enclose the condition expression. A numerical and string value can be fed to if statement in lieu of a logical expression: nonzero numbers stand for True, as well as nonempty strings and lists; otherwise, the expression is equivalent to False.

Multiple conditions can be accommodated using the elif continuation. An example would be the determination of student’s grade based on the total score:

score = 83.5
if score > 90:
   grade = "A"
elif score > 80:
   grade = "B"
elif score > 70:
   grade = "C"
elif score >= 60:
   grade = "D"
else:
   grade = "F"
print(grade)

The first condition (score > 90) will be tested first: if it is true, then "A" is assigned to the variable grade and the rest of the conditions are not tested. Otherwise, we go to the second condition (score > 80), and so on. If all conditions do not evaluate to a True value, the statement block after else will be executed. The else part is optional: It may not exist if there is no action needed for “all of the other” cases.

Is It Even or Odd?

Print a message stating whether a variable named val contains an odd or even number.
Solution
val = 3
if val % 2:
    print('odd')
else:
    print('even')
The % operator gives the remainder of the division of val by 2: It is zero for even numbers, and one for odd numbers. Because val is an odd number, val % 2 yields 1, thus the word odd will be printed.

Functions

Certain tasks are used frequently throughout a program. One example is the conversion from a score to a grade, as shown in the previous section. A function is a block of subprogram (sequence of commands and statements) which packaged as a unit, intended to accomplish a specified task. Using functions helps programmers to write the code only once and reuse it as frequently as needed. In Python, a function is created using the def statement, as illustrated in the following snippet:

def message():
    print("Python is a great language to learn.")
    print("It is fun, easy to use, yet powerful at the same time.")
    print("With persistent use and practice, you'll master Python.")

There is no output when you completed the def statement above. But you have just created a function called message, which can be called at any time afterward. The subprogram block in the function’s body will be executed whenever the function is called. Let us call message now:

message()

Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.

We can also call the function multiple times:

message()
print()
print("LET'S SAY THAT AGAIN...")
message()

Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.

LET'S SAY THAT AGAIN...
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.

What’s Different between Loops and Functions?

Function is similar to the for loop or the if-else conditional in that they form a bigger, logical piece of a program. In particular, both function and loop enable a block of subprogram to be executed more than once. There is one important difference, though: A loop only repeats a block of subprogram in one particular location of the program. The block associated with a function, in contrast, can be executed at different locations in the program, i.e. where the function call takes place.

Parameters

A function can have one or more parameters. Inside the function body (block), it acts like regular variables, but their values are not specified within this body. Rather, the values are defined at the point the function is called.

Let us make make_grade function which takes one parameter, that is, the numerical score:

def make_grade(score):
    if score > 90:
       grade = "A"
    elif score > 80:
       grade = "B"
    elif score > 70:
       grade = "C"
    elif score >= 60:
       grade = "D"
    else:
       grade = "F"
    return grade

This function also returns a value, which will need to be captured in a variable or printed. (Otherwise, the return value will be discarded.)

alice_grade = make_grade(89)
print("Alice's grade is", alice_grade)
print("Jason's grade is", make_grade(72))

Alice's grade is B
Jason's grade is C

A function can take multiple arguments, such as:

def add(a, b):
   return a + b

A = 36
C = add(A, 2)
print(C)

Further, the arguments can be of different type. The function below expects a real number for the scale argument, and a sequence (list) for scores:

def scale_score(scale, scores):
    result = []
    for s in scores:
        result.append(scale * s)
    return result

Documenting a Function

Once a function gets more complex, it is better to document the function. Python has a great way of doing this, using a triply-quoted strings (these are basically an ordinary string, yet it allows newlines):

def scale_score(scale, scores):
    """Scales student's scores by a scale factor.

    Args:
        scale (float): scale factor.
        scores (list): a list of students' raw scores.

    Returns:
        list: a list of students' scaled scores.
    """
    result = []
    for s in scores:
        result.append(scale * s)
    return result

The document should state the following:

the purpose of the function;
the input argument(s);
the return value(s);
any other notes regarding the behavior of the function that users may need to be aware of.

This documentation can be queried in an interactive python session:

help(scale_score)

Help on function scale_score in module __main__:

scale_score(scale, scores)
    Scales student's scores by a scale factor.
    
    Args:
        scale (float): scale factor.
        scores (list): a list of students' raw scores.
    
    Returns:
        list: a list of students' scaled scores.

Documenting a function is another good coding practice which you want to foster in yourselves early on. This helps other people (among those are your future selves!) to better understand your code.

Let’s try our new function:

old_scores = [50, 70, 40, 85]
new_scores = scale_score(1.25, old_scores)
print(new_scores)

[62.5, 87.5, 50.0, 106.25]

Changing the Value of a Parameter in a Function?

(Note: This is an intermediate topic, so you can skip when reading the lesson for the first time.)

Values are passed by reference in Python. Technically, all variables are just reference to an object or a value residing somewhere in the Python interpreter’s memory. Changing the value of a parameter inside a function is not prohibited by Python, but it may not do what you want. Assigning a new value to a parameter (e.g. setting it to a new string, int, or list) would not change the original value existing in the caller’s scope. But manipulating a parameter (e.g., appending a new element to a list using the append method) would propagate the effect outside the caller. This is an intended design: in this way, functions can be used to manipulate objects. A completely new data should be returned as a function return value.

Library and Modules

A library is a collection of files (called modules) that contains functions for use by other programs. A module can be viewed as a toolbox that contains a lot of tools: These tools (hammer, screwdrivers, pliers, etc.) are analogous to the functions. This boolbox may be a part of a greater collection of tools for an auto mechanic. This mechanic may have another toolboxes: electrical toolbox, engine toolbox, etc. The entire collection of these toolboxes would be the library.

Libraries or Modules?

A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.

In Python, we need to import a module to use the functions contained in this module. This is done using the import statement:

import MODULE_NAME

This will make the functions, variables, and other objects in the MODULE_NAME module accessible by the Python interpreter. From this point on, you can access the contents of this module (such as functions and variables) by prepending their names with a MODULE_NAME. prefix. The name of the module also serves as a namespace for the functions and variables provided by that module.

Let us consider a concrete example: Python has a module of mathematical functions called math, which contains many mathematical functions and constants, such as: the square root function (sqrt), exponentiation (exp, pow), logarithmic (log, log2, log10), trigonometric functions (sin, cos, tan, asin, acos, …), and many more. (See the math reference documentation for more details.) To calculate the square root of a number, use the sqrt function contained in this module:

import math

a = 25
b = math.sqrt(a)
print(b)
print(sqrt(2))

5.0
1.4142135623730951

We can also use the name sqrt without the math. qualifier, by importing the name directly into the current namespace. This can also be done in Python:

from math import sqrt
from math import cos, pi

print(sqrt(81))
print(cos(pi))

9.0
-1.0

Hint: If you don’t invoke import math beforehand, then the name math is not known to the interpreter, but the sqrt name will still be accessible.

Out of the box, Python already comes with a fairly complete library. We will call this the “core library”. Some notable modules in the core library include:

os: operating-system related functions;
sys: Python/system related functions;
math: mathematical functions and constants;
re: regular expression search and operation;
csv: tools to read and write CSV (comma separated value) files;
json: tools to read/write data in JSON format;
mailbox: tools to read/write internet mailbox in MBOX format;
socket, ssl, urllib, http, …: Network- and Internet-related functions;
and many more!

(Clicking on the module name would lead you to the module’s reference documentation.) We recommend that you survey the Python Standard Library reference documentation to become familiar with the functionalities offered by Python core library.

Examples of Important Python Modules

A vast amount of capabilities in Python actually come from the libraries developed and maintained by many groups and communities throughout the world. In this sidebar we will survey a few important ones—those that have become every Python programmer’s essential toolboxes as well as those that may be relevant for cybersecurity applications.

NumPy and SciPy

NumPy (short for “Numerical Python”) and SciPy (“Scientific Python”) are packages designed for numerical computation. NumPy provides a powerful N-dimmentional array object, an assortment of routines for fast operations on arrays, such as mathematical, logical, shape manipulation, sorting, selecting, input/output, Fourier transforms, basic linear algebra, statistical operations, random number generators and more. SciPy contains modules for optimization, advanced linear algebra, integration, interpolation, special functions, Fourier transforms, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering.

Numpy website: https://numpy.org/

Scipy website: https://www.scipy.org/

Pandas

Pandas stands for Python Data Analysis Library. It is designed to offer data strucures for manipulating numerical tables and time series. It is a powerful tool when dealing with large tables.

Pandas website: https://pandas.pydata.org/

Matplotlib and Seaborn

These are ploting libraries. Matplotlib is a Python 2D plotting library which produces figures in a variety of formats and interactive environments across platforms. You can use Matplotlib to generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Matplotlib website: https://matplotlib.org/ Seaborn website: https://seaborn.pydata.org/

Scikit-learn, Tensorflow, Theano, Keras, and Pytorch

Scikit-learn, Tensorflow, Theano, and Keras are libraries desinged for deep learning applications. Scikit-learn provides methods for classification, regression, clustering, dimentionality reduction, model selection, a preprocessing. Tensorflow and Theano are low level neural network model development tools. Keras is a high level package for neural network capable to run on top of Tensorflow and Theano. Pytorch is packages desinged to replace Numpy in order to take advantage of the power of GPUs, it is also a platform for deep learning providing flexibility and speed.

Scikit-learn website: https://scikit-learn.org/stable/ Tensorflow website : https://www.tensorflow.org/ Keras website: https://keras.io/ Pytorch website: https://pytorch.org/

NLTK

NLTK stands for Natural Language Tool kit. It is a package designed for human language data. It provides interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

NLTK website: https://www.nltk.org/

Pycrypto

Pycrypto as you probably guessed it is a collection of tools for cryptography work. It provides various encription algorithms such as AES, DES, RSA to name a few.

Pycrypto website: https://pypi.org/project/pycrypto/

scrapy, beautifulsoup, and selenium

If you are looking for web related packages then these three are what you need. Scrapy and Beautifulsoup provide ways to extract data out of HTML content, meaning web pages. Selenium on the other hand is a web browser automation tool. It is useful to write test scripts for web based applications.

Scrapy website: https://scrapy.org/ Beautifulsoup website: https://www.crummy.com/software/BeautifulSoup/ Selenium website: https://www.seleniumhq.org/

Exercises

Exercise 1

Write a function that takes as input parameter a list of integer numbers and then prints two lists: the first list being the sublist of the input list containing only even numbers, the second list containing odd numbers only.

Exercise 2

Write a function that takes two numbers as parameters and returns the maximum of the two numbers.

Exercise 3

Write a function called fizz_buzz that takes a number.

If the number is divisible by 3, it should return “Fizz”.
If it is divisible by 5, it should return “Buzz”.
If it is divisible by both 3 and 5, it should return “FizzBuzz”.
Otherwise, it should return the same number.

Exercise 4

Write a function called show_stars(rows). If rows is 5, it should print the following:

*
**
***
****
*****

Hint: Think of addition on strings of characters.

Further Learning

Python’s official documentation: https://docs.python.org/3/

Key Points

Python is a high-level, interpreted, general-purpose programming language.

Key data types are integers, floats (real numbers), and strings.

List and array are container data types to store a large amount of data.

The for statement is useful to repeat actions by looping over a list of values.

The if – elif – else construct allows program to execute commands conditionally.

There are vast number of libraries which makes Python a productive computing platform.

previous episode

DeapSECURE Lesson 5: Cryptography for Privacy-Preserving Computation

next episode

A Brief Introduction to Python

Overview

What Is Python? Why Python?

How Does Python Compare to C/C++ Language?

Accessing Python from Turing HPC

Python Operating Modes

Interactive mode

ipython—Better Interactive Python

Arithmetic with Python

Computing Total Price and Splitting the Bill

Solutions

Scripting with Python

Hello World: Our First Script

Statements, Indentation, Code Blocks

Comments

Basic Elements of a Program

Variables

Data Types

Assigned Value Determines Data Type

Integers

Why Integers?

Division Exercises

Binary, Octal, and Hexadecimal Numbers

Strings

Modifying a String

Solution

Converting Data Types

Convert ‘Em!

Solutions

Lists

List Manipulation in Action

Solution

Repeating Actions: for Loop

range Exercises

Solutions

Making Sum with for Loop?

Performance Note

Conditional Statement (if – else)

Is It Even or Odd?

Solution

Functions

What’s Different between Loops and Functions?

Parameters

Documenting a Function

Changing the Value of a Parameter in a Function?

Library and Modules

Libraries or Modules?

Examples of Important Python Modules

NumPy and SciPy

Pandas

Matplotlib and Seaborn

Scikit-learn, Tensorflow, Theano, Keras, and Pytorch

NLTK

Pycrypto

scrapy, beautifulsoup, and selenium

Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Further Learning

Key Points

previous episode

next episode

`ipython`—Better Interactive Python

`Hello World`: Our First Script

Repeating Actions: `for` Loop

`range` Exercises

Making Sum with `for` Loop?

Conditional Statement (`if` – `else`)

`Pandas`

`Matplotlib` and `Seaborn`

`Scikit-learn`, `Tensorflow`, `Theano`, `Keras`, and `Pytorch`

`NLTK`

`Pycrypto`

`scrapy`, `beautifulsoup`, and `selenium`