A Brief Introduction to Python
Overview
Teaching: 30 min
Exercises: 10 minQuestions
What is Python?
How do I use Python to perform computation with numbers and text?
How do I handle a large amount of data using Python?
How do I write a tool in Python?
Objectives
Understand essential elements of a Python programming language.
Being able to write simple Python programs to process data.
This episode serves a crash course to Python programming language. It covers only the bare essential to get you started with Python. At the end of this lesson, there will be a pointer to lesson series which you can pursue on your own to become proficient with Python.
What Is Python? Why Python?
Python is a high-level, general purpose programming language. Python emphasizes code readability and ease of use, and its syntax encourages good programming practices as well as productivity. Today, Python has become one of the most popular programming languages, even widely deployed by tech giants such as Google, Amazon, Facebook, … Python comes with a vast array of powerful libraries, such as:
numpyandscipyfor numerical calculations (“number crunching”),pandasfor data analytics,matplotlibandseabornfor plotting and visualization,scikit-learn,tensorflow,keras,pytorchfor machine learning,nltkfor natural language processing,pycryptofor cryptography,scrapy,beautifulsoup, andseleniumfor web scraping.
These libraries enable programmers to accomplish their goals (i.e. “obtain the largest five eigenvalues of a matrix” or “create a machine learning model to flag spam emails”) without having to know the details of the complex underlying algorithms.
Those who have used other programming languages will find it rather easy to pick up Python. Python is an interpreted language, which means that a Python interpreter is required to run a Python program.
How Does Python Compare to C/C++ Language?
C emphasizes low-level details of the program, down to the bare metal details (such as, integers, bits, pointers, data alignment). C++ provides much more convenience by adding object-oriented capabilities and significantly expanded productivity libraries (standard C++ library, Boost), yet still allowing (and often requiring) programmers to take care of machine-level issues. Python, on the other hand, emphasizes high-level programmability without forcing programmers to worry about gory details.
Compared to C/C++, Python provides a much gentler learning curve to new programmers, and much shorter time-to-productivity.
C/C++ programs are compiled to produce a binary executable programs, containing machine instructions that can be executed directly by the processor (CPU). In contrast, because Python is an interpreted language, a Python interpreter is always required to run a Python program. The intepreter translates the human-friendly Python statements into instructions to be executed by the CPU one bit at a time. The process of interpreting the program in this way takes time; therefore interpreted computer programs run significantly slower than the binary executable programs. Quite often, a well-written C/C++ program can accomplish the same task up to 5-100 times faster as an equivalent implementation written in pure Python. We will discuss this issue more in module 6 of this training program. However, this picture would change for programs that rely heavily on high-performance libraries made available for Python (such as NumPy, TensorFlow, etc.). In this training program, we strive to use Python libraries in a manner that are conducive to high-performance computation.
In this training program we will focus on Python 3, which is the current version of Python.
Accessing Python from Turing HPC
On Turing, as in many HPC systems, access to available software packages is managed using a shell command called
module. (Due to the age of the cluster, two variants ofmoduleis available on Turing. We recommend using the newer variant calledlmod, by first invokingenable_lmod.) Here is the sequence of commands needed:$ enable_lmod $ module load python $ module load ipython # recommended, see below $ module list # optional, but please try(You will need to repeat these commands the next time you login to Turing again.)
By default,
module load pythonwill load Python 3.6 on Turing. Python 3.7 is also available; usemodule load python/3.7to specify the exact software version.The
modulecommand requires a subcommand name, and sometimes additional arguments. Here are the most frequently used invocations:
module avail— lists the available module on the system;module load PACKAGENAME— loads the modulePACKAGENAME, i.e. make this software available in the shell;module unload PACKAGENAME— unloads the modulePACKAGENAME;module whatis PACKAGENAME— prints information about modulePACKAGENAME;module list— lists the currently loaded modules.
Python Operating Modes
There are two ways to interact with the Python interpreter: the interactive mode and the script mode.
Interactive mode
Python interactive mode allows you
to execute Python statements instantly from the command line.
This is very much like the way we interact with UNIX shell:
we enter a Python statement, press Enter,
Python runs the statement, and prints the result (if applicable),
then returns to the prompt again.
To launch the Python interpreter, invoke python command
from your (UNIX) command line:
$ python
Python 3.6.9 (default, Sep 17 2019, 12:17:19)
[GCC Intel(R) C++ gcc 4.9.4 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
The >>> indicates Python’s prompt;
it tells you that you are in the interactive mode.
To exit Python, use the quit() or exit() command,
use the Ctrl+D keyboard shortcut.
>>> quit()
ipython—Better Interactive PythonIn this workshop, we recommend a more sophisticated Python front-end called
ipython—short for interactive Python. It provides better command history, output history, syntax highlighting, tab completion, and customizability. It even doubles as a UNIX-like “shell” for the most commonly used commands. To useipython, please make sure that theipythonmodule is loaded afterpythonmodule.$ module load python # if python has not been loaded $ module load ipython # if ipython has not been loaded $ ipythonPython 3.6.9 (default, Sep 17 2019, 12:17:19) Type 'copyright', 'credits' or 'license' for more information IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help. In [1]:Notice that the prompt is different. The number
1will be incremented every time a new command is entered.
Arithmetic with Python
From now on, you can type valid Python statements to execute and get outputs from. Python in interactive mode can be used to perform arithmetic:
>>> 2 + 3
5
>>> 5 / 2
2.5
Python supports the usual computer arithmetic operators:
| operator | Meaning | Examples |
|---|---|---|
+ |
Addition | 5 + 7 , 7.43 + 54 |
- |
Subtraction | 10 - 3 , 3.5 - 10 |
* |
Multiplication | 5 * 7 , 3.5 * 10 |
/ |
Division | 4 / 2 , 10 / 3 , 1 / 7, 14 / 0.2 |
** |
Exponentiation | 2 ** 3 , 9 ** 0.5 |
( and ) |
Group expression for evaluation; override standard operator precedence | 5 * (3 + 4) , compare against 5 * 3 + 4 |
The usual mathematical convention for operation order (which one gets computed first, also termed operator precedence) applies: Exponentiation is computed first, followed by multiplication and division, then addition and subtraction. Python documentation has a reference table of operator precedence which you can be helpful to ensure that you write correct Python expressions. Please try many other expressions you can think of so you become comfortable with Python as a calculator.
Computing Total Price and Splitting the Bill
Mary and her two roommates split their grocery expenses evenly. Mary just bought the following:
- A dozen eggs for $1.49
- A loaf of bread for $2.79
- A bag of potato chips for $1.99
- A bottle of hand soap for $2.49
- A stack of paper plates for $3.99
In Virginia, tax rates are 2.5% for food and 6% for non-food. Please create Python expressions to do the following:
- Compute the total cost of the grocery bill.
- Compute the payment of each person to cover this bill.
Solutions
We can use parentheses to compute the quantities in one step. This is just one way among many ways to get the computation done.
(1.49 + 2.79 + 1.99) * 1.025 + (2.49 + 3.99) * 1.06 ((1.49 + 2.79 + 1.99) * 1.025 + (2.49 + 3.99) * 1.06) / 3The total cost is $13.30 and each one has to pay $4.43 (but one has to pay one cent more to cover the total cost).
The examples above show that
Python can represent whole numbers (integers)
and real numbers (those with decimal points).
While integers are represented perfectly,
real numbers are represented with limited number of digits
(approximately 15 in today’s computer)
and are subject to roundoff errors.
This shows up in at least one of the examples above, where
13.29555 was printed as 13.295550000000002.
Real numbers have sufficiently long number of digits
to allow for reliable computation in the vast majority of cases.
We will not discuss this further as it is an advanced topics.
Scripting with Python
The interactive mode is useful but not always practical.
Imagine you have thousands of statements to execute in order,
and you need to repeat the process at least one more time.
In such a case,
we can save these Python statements into a text file called a Python script
and have the Python interpreter run them.
Python scripts, or programs, usually have the .py filename extension.
Hello World: Our First Script
In interactive mode, Python prints the result of an expression immediately after it is executed. In the scripting mode, an expression has to be printed in order to be output (to the terminal or to a file).
Let us print a simple “Hello World” message—which is
a tradition when learning a new programming language.
Using a text editor (e.g. nano),
create a text file named hello.py containing one statement:
print("Hello World")
Save the file, then execute the script:
$ python hello.py
Hello World
In general, a Python script is executed from the UNIX shell in this way:
python /PATH/TO/SCRIPT.py.
Any command you are able to run in the interactive mode
can be added to a script and executed by the Python interpreter.
We can print the result of a math expression:
print(1 + 2 + 3 + 4 + 5)
We can use multiple print statements to print multiple text lines.
A print can also print multiple items in a single statement:
print("Some math examples")
print("Sum of 1 through 5 is ", 1 + 2 + 3 + 4 + 5)
print("Square root of 1, 2, 3, 4 are", 1**0.5, 2**0.5, 3**0.5, 4**0.5)
Please run this script several times. You will notice that the output are always printed in the order the statements appear in the script. This is a bedrock principle of a sequential computer program: the computer will read and execute the commands/statements one at a time, and in the order these commands appear in the program. Remembering this principle will help you (1) predict the outcome of a computer program just by reading it, and (2) avoid confusion about what a program will do.
Statements, Indentation, Code Blocks
Python language syntax has a few rules that distinguish it from other computer languages. Let us use the following program snippets to illustrate the notable features of Python language:
# This is a sample program written in Python
def greet(name, gender, majors, graduate_year):
if gender == "M":
pronoun = "He"
pronoun3 = "his" # third-person pronoun
else:
pronoun = "She"
pronoun3 = "her"
print("Hello, this is", name)
print(pronoun, "has", len(majors), "major(s):")
for m in majors:
print("-", m)
print(pronoun, "completed", pronoun3, "education in", graduate_year)
print()
greet("Elaine", "F", ["Art", "Mathematics", "History"], 1993)
greet("Johnson", "M", ["Sociology"], 2000)
That is a complete Python program which can be run to produce the following output:
Hello, this is Elaine
She has 3 major(s):
- Art
- Mathematics
- History
She completed her education in 1993
Hello, this is Johnson
He has 1 major(s):
- Sociology
He completed his education in 2000
Unlike C/C++, a Python statement generally ends with the new line.
There is no mandatory end-of-statement marker like a semicolon (;),
although a semicolon is indeed recognized as such.
If a line is so long that it has to wrap, please terminate the
incomplete line by appending a backslash character (\).
Python is well known for its extensive and strict use of whitespaces
to indent program lines.
(To indent a program line means to add a number of whitespace characters
before the first non-space character in that line.)
In Python, like any other programming language,
a set of commands or statements can be grouped into a block,
which then becomes an integral part of a language construct
(loops, conditionals, function definition).
Python uses indentation to distinguish a block of statements.
A block is clearly identified by its consistent indentation level.
In the example program shown earlier, the def greet(...):
statement is followed by a code block that starts with
if gender == "M": and terminates after
the lone print() statement.
Similarly, the if clause initiates a new block containing two program lines,
followed by the else clause and yet another block.
The “if–block–else–block” sequence constitutes
a complete construct for conditional execution—as we will learn later
in this episode.
Python’s convention for code blocks
is in contrast to the case of C/C++ language, where
a code block in the if, for or a function definition
like int main(...)
is clearly delimited by a matching pair of curly brackets {…}.
This means if you have started a new block using four-whitespace indentation,
you must indent every command in this block
with four whitespace characters.
Python will catch inconsistencies in block-level indentation and issue a syntax error.
While this may appear to be overly restrictive,
it actually encourages good programming behavior and readability of Python programs.
The most widespread convention among many Python programmers is
to prepend extra four whitespace characters to introduce a new (sub)block.
We recommend that you also follow this practice.
We will see this more practically in the next sections.
Comments
Comments (non-executable texts) can be added to a Python script
by prepending the text with the hash # character.
Comments can also appear after a statement.
Both cases appear in the sample program above.
Fear not, all the constructs used in the program above will be explained shortly, so you will understand what the program is doing after finishing this episode. Finally, the rules regarding indentations and comments above apply not only to Python scripts, but also to Python statements entered in the interactive mode.
Basic Elements of a Program
For the rest of this episode, you will learn the basics building blocks of a program. We will learn these in the context of Python programming language, but they are applicable in many other languages. These are just a few things that you can use in your scripts to get started with computer programming. At the end of this episode we will provide some pointers for further learning.
As a roadmap, here are the key elements included in this episode:
-
Variables;
-
Data types, with initial emphasis on numbers and strings;
-
Statement block;
-
Looping through iteration using the
forstatement; -
Conditional statements (
if,elif, `else); -
Lists;
-
Arrays using
numpy; -
Data structure using
dict; -
Functions;
-
Script arguments.
We will also present a quick overview of key Python libraries that you may find useful for cybersecurity applications.
Variables
Arithmetic is useful, but algebra makes mathematics even more useful
by allowing us to make manipulations and define relationships
among yet-to-be-specified quantities, denoted by symbols such as
x, y, and so on.
The same thing goes with computer program:
A variable plays the role of symbols in algebra.
In Python, a variable is simply a label for a value
(or another type of object we will learn shortly).
This gives us a handle to refer to that value indirectly
by the name of the variable.
We define a variable by assigning a value to it, using the = operator.
Some examples:
a = 4
b = 5 / 2
c = "Hello World"
d = a + b
name = "Thomas"
Several rules regarding Python variables:
-
Variable name can contain only letters (
a-z,A-Z), digits (0-9), and underscores (_). The name cannot start with a digit. Names that start with an underscore are often reserved (e.g.__file__,__name__) or have certain meanings. (If you just get started with Python, it is best to use variable names that begin with a letter until you know the uses of names that start with an underscore.) -
Names are case sensitive: For example,
nameandNameandNAMEare three distinct variables. -
Use names that conveys the meaning of the value in a concise way. For example, use
weight = 127to represent a weight quantity with a value of 127; whileiurj2k3u = 127obfuscate the meaning of the variable. -
Be aware of Python’s reserved words and absolutely avoid using them for variable names:
False await else import pass None break except in raise True class finally is return and continue for lambda try as def from nonlocal while assert del global not with async elif if or yieldIt is best to also avoid built-in function and type names and standard Python library names as well as popular library names like
numpyand others mentioned at the beginning of this episode.
Variables can be printed using print statement as before:
print(a)
print(a, "+", b, "=", d)
print("Greetings ", name)
4
4 + 2.5 = 6.5
Greetings Thomas
The value of a variable can be updated; after that, the variable reflects the updated value. For example:
a = 4
print(a)
a = 27
print(a)
b = a - 7
print(b)
4
27
20
Data Types
Python supports various data types. We have seen three data types so far:
-
integers (whole numbers), representing discrete quantities, such as
4and-30; -
floats (real numbers), representing continuous values, such as
2.7and0.00471659, clearly indicated by the presence of a decimal point; -
strings (sequence of characters), such as
"Hello World"or'Hello Thomas'. Both single and double quotes are supported, but a string has to be opened and closed using the same quotation character, not like:"This is a bad string'.
Integers are used to represent discrete quantities such as the count of a particular type of events, the number of processors a computer have. Real numbers are needed for quantities that can be arbitrary in value, such as lengths, power, probability, etc.
The type function can be used to determine the type of a variable.
For example,
a = 4
b = a / 2
c = "Hello world"
print(type(a))
print(type(b))
print(type(c))
<class 'int'>
<class 'float'>
<class 'str'>
Assigned Value Determines Data Type
Unlike C and C++, where the data type of a variable is fixed at compile-time by explicitly declaring the type, the data type of a Python variable is determined by the value assigned to that variable. It is possible that a variable will change type because it is assigned a new value with a different datatype:
a = 24 print(type(a)) a = "Hello world" print(type(a))
Integers
An integer is simply a whole number, which can be positive, zero, or negative. It cannot store fractional numbers (thus no decimal point). Python’s integer has an unlimited precision: It can store an arbitrarily large number! (Try it.)
Why Integers?
An integer is actually the most basic representation of data in a digital computer. In digital computers, integers are represented as binary numbers (zeros and ones) with a fixed number of digits. When we speak of a 32-bit or 64-bit computers, this term actually refers to the number of bits that the processor registers have (a register can be thought of as a computer’s native “variable”). For example, as of year 2019, an Intel Core i3, i5, i7, or i9 processor has native 64-bit registers, but it can also process 8-, 16-, and 32-bit numbers. How can Python support arbitrarily long integers? It uses software to emulate the operations on such long integers.
In Python, the / division operator always produces a real number,
regardless the type of the operands involved.
To force an integer division, use the // operator.
Division Exercises
What are the results of the following statements? Before running these on the IPython prompt, think what the results should be. Then run them and observe the outcome.
7 / 3 7 // 3 7.0 // 3 8 / 3 8 // 3 8.9 // 3 8 // 2.95This exercise exhibits the nature of Python division operators under varying circumstances.
Binary, Octal, and Hexadecimal Numbers
For system-level programming (involving operating system, hardware, networking, etc.), integers play a crucial role. In this context, the underlying binary nature of integers is very much exploited.
Python makes it convenient to deal with this type of data.
-
Binary (base-two) numbers are denoted by the prefix
0b(that is, a digit zero plus a literal letterb). It must be followed by a sequence of ones and zeros. For example,0b101110is equal to the number 46 (forty-six) = 1×32 + 0×16 + 1×8 + 1×4 + 1×2 + 0×1 in our customary numbering system—the decimal system. -
Octal (base-eight) numbers are denoted by the prefix
0o(a digit zero plus a literal lettero), followed by sequence of digits, each ranging from0through7, inclusive. As an example,0o775stands for 509 in the decimal system. -
Hexadecimal (base-sixteen) numbers are denoted by the prefix
0x(a digit zero plus a literal letterx), followed by a sequence of digits0through9as well as lettersathroughf. This gives a total of sixteen symbols (ten digits and six letters) to represent a number in a base-sixteen representation. For example,0xa7f3stands for number 42995 in the decimal system.
(Note that the letters in the representation above are case-insensitive.
For example, 0B0110 is equivalent to 0b0110;
0xA7F3 and 0Xa7F3 and 0XA7F3 are all equivalent to 0xa7f3.
However, be consistent with the case convention you use in your program.
Most people use lowercase for the prefix, so please follow that also in your program.)
If you need a tutorial or review of these seemingly fancy numbering systems, we would refer you to some excellent resources on the Internet:
-
Number Systems—Decimal, Binary, Octal and Hexadecimal, a brief tutorial by Rukshani Athapathu on building a number in binary, octal, and hexadecimal representations.
-
Number Conversion—Binary Octal Hexadecimal, a brief tutorial by DYclassroom (Yusuf Shakeel) on the various number systems and how to convert a number between different representations.
-
Binary number (from Wikipedia), an in-depth treatise covering history, representation, arithmetic, etc.
-
Hexadecimal (from Wikipedia).
Strings
A string is simply an ordered sequence of characters. Unlike C language, Python string comes natively with a rich set of features which make string processing a breeze. Here is a quick overview (many “by example”) of the features of Python strings.
The length of the string can be queried using the len function.
A = "Hello world"
print(len(A))
11
Quite often, one would need to extract a character or a substring from a string. Python uses the same convention as C for element indexing: the characters in a string are indexed using an integer from 0, 1, 2, …, using the square brackets as the indexing operator.
A = "Hello world"
B = A[0]
print(B)
print(A[4])
print(A[11])
H
o
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
The last line raises an error called IndexError because the index (11)
is beyond what is valid for the string (0 .. 10 in this case).
Python supports negative index, which is counted from the end of the string:
A = "Hello world"
print(A[-1])
print(A[-5])
print(A[-11])
d
w
H
How do we extract a substring? Python has the concept of a slice, which defines a range of elements to pick out from a given sequence. Let’s start with some examples:
A = "Hello world" # A has 11 characters
print(A[1:5])
print(A[:3])
print(A[3:])
print(A[3:-3])
ello
Hel
lo world
lo wo
With a slice, two numbers are given, separated by a colon. The slice syntax is therefore
STRING_VAR [ START : STOP ]
If START or STOP is omitted,
then it is implicitly taken to be the beginning
or end of the sequence, depending on which index is omitted.
Python takes a rather quirky convention:
whereas the START element is included in the slice result,
the STOP element is not.
Strings can be joined (concatenated) to form a longer string. For example:
A = "Hello world"
B = A + "Thomas"
print(B)
Hello worldThomas
Modifying a String
A string is an immutable object in Python: which means, once created, it cannot be modified. For example, this kind of statement is invalid for string:
A[1] = "a". To modify a string, we will need to create a new string to include the modification. How can we changeAfrom “Hello world” to “Hallo world” ? (Hint: use the concatenation (+) operator.Solution
B = A[:1] + "a" + A[2:] print(B)
There are many more capabilities built into a Python string!
Python uses an object-oriented approach to manipulate strings.
A string has the upper and lower methods to create
an uppercase and lowercase versions of the string, respectively;
split to split the string at a specified separator.
Some examples:
A = "Hello world"
print(A.upper())
print(A.split())
HELLO WORLD
['Hello', 'world']
(The last command yields a list, which we will cover very soon.)
If you are working with text, we recommend you to learn more about Python strings through the following resources:
-
Python string tutorial (from TutorialsPoint).
We encourage you to experiment using interactive Python mode to gain an understanding on string and other topics related to Python. There is no better way to learn than to experiment with the language elements directly!
Converting Data Types
Quite often, we have to convert data from one type to another.
For example, data read from a text file would be a string.
To convert this to a number that can be processed numerically,
we use the int or float functions (for conversion to
an integer or a real number):
rr = '71'
ss = '142.5'
R = int(rr)
S = float(ss)
print(R)
print(S)
print("Double all up:")
print(R * 2)
print(S * 2)
Conversely, a number can be converted to a string by the str function.
Convert ‘Em!
What would be the output of the following snippet?
R = 71 S = 142.5 print(float(R)) print(int(S)) R_str = str(R) print(R_str, "is a", type(R_str)) age_message = "My age is " + str(R) + "years old" print(age_message)Solutions
71.0 142 71 is a <class 'str'> My age is 71years old
Lists
One important reason of using a computer is its ability to store and process
a lot of data.
For this reason, Python provides a number of container data types,
which are capable of containing multiple values (or objects).
In this short lesson we will cover only two types in details,
namely list and dict.
A list is an ordered sequence of values or objects.
Here are a few examples of list objects:
blank = []
trio = [1, 2, 3]
record = [1998, 3, "204.31.253.89", "United States"]
# Now print them out:
print(blank)
print(trio)
print(trio[0])
print(trio[1:3])
print(len(record))
print(record[2])
[]
[1, 2, 3]
1
[2, 3]
4
204.31.253.89
A list object has many similarities to a string:
-
The contained elements are ordered and can be indexed by integers from 0, 1, … ;
-
listsupports slicing operator; -
lenfunction acting on a list object returns the number of elements in that list.
However, unlike a string, a list can contain values with arbitrary data types,
and its contents can be altered (i.e. it is mutable).
Items can be added to, or removed from, the list.
Here are a few actions that can be done for a list object named L:
-
Add a new item at an arbitrary location:
L.insertmethod; -
Add a new item at the end of the list:
L.appendmethod; -
Update the value of the
i-th element:L[i] = new_value; -
Sort the entire list:
L.sortmethod; -
Delete an item at index
iusingdel L[i]; -
Delete the contents of the entire list:
L.clearmethod.
List Manipulation in Action
What is the output of this program?
fruits = ["banana", "apple", "mango"] print(fruits) fruits.append("pineapple") print(fruits) fruits.insert(1, "orange") print(fruits) fruits.sort() print(fruits) fruits[1] = "pear" print(fruits) del fruits[2] print(fruits) fruits.clear() print(fruits)Solution
['banana', 'apple', 'mango'] ['banana', 'apple', 'mango', 'pineapple'] ['banana', 'orange', 'apple', 'mango', 'pineapple'] ['apple', 'banana', 'mango', 'orange', 'pineapple'] ['apple', 'pear', 'mango', 'orange', 'pineapple'] ['apple', 'pear', 'orange', 'pineapple'] []
To learn more about list and how to effectively use it, please
refer to the
TutorialsPoint’s lesson on list.
A list can be used in many ways in Python;
but the most common uses are:
-
to store a collection of items of the same type (this collection is often termed an array):
trio = [1, 2, 3] -
to store a collection of items that has a defined structure (this kind of collection is often termed a data structure or a record):
record = [1998, 3, "204.31.253.89", "United States"]In the example above, numbers
1998and3refer to the year and month of a spam email, whereas the strings204.31.253.89andUnited Statesrefer to the deduced originting IP and country. The example also shows that the data types of the items are not uniform (two integers and two strings).
These are not an exhaustive list of possible uses of a list.
A list can also be nested, that is, contain other lists,
to create a multidimensional array or a table:
# An example two-dimensional array
sudoku = [ [ 4, 9, 2 ],
[ 3, 5, 7 ],
[ 8, 1, 6 ] ]
# An example of a structured table
# (an array of records)
results = [ [1998, 3, "204.31.253.89", "United States"],
[1999, 1, "194.213.210.20", "Czech Republic" ],
[1999, 12, "202.96.198.238", "China"] ]
Accessing elements would involve two indexing operators:
print(sudoku[0][1])
print(results[1][2])
9
194.213.210.20
The first number indexes the outermost dimension, i.e. the “row”, the second number indexes the inner dimension, i.e. the “column”.
Repeating Actions: for Loop
Now that we have a way to store a bunch of data in a list,
we need a way to perform repetitive actions on these data.
Python uses the for statement
to define a loop construct
to repeat actions over a sequence of data.
Here is an illustration of the for statement:
for A in [ 0, 1, 2, 3 ]:
print(A)
0
1
2
3
The syntax of a for statement is:
for LOOP_VARIABLE in SEQUENCE:
STATEMENTS...
Here, SEQUENCE is a sequence object (list, string, dict, etc.)
containing zero or more items which we want to iterate over.
STATEMENTS is a placeholder for a code block (explained earlier),
which contains one or more Python statements
to be repeated.
Python’s for statement has a different behavior from C-style for.
In most common cases, where SEQUENCE contains n items,
the STATEMENTS block will be executed n times
(once for every element in SEQUENCE).
The items will be iterated in order
(from the beginning to the end of the sequence), and
the value of the LOOP_VARIABLE will be set to the current item.
Be aware that the colon after the SEQUENCE is mandatory, as well as
the indentation of STATEMENTS.
Let’s revisit the illustration above:
the SEQUENCE is a list of four elements: [0, 1, 2, 3];
therefore the for loop will execute the STATEMENTS four times.
In this case, the STATEMENTS simply consists of one statement:
print(A).
The value of A is set to 0 at the first iteration, then
it is updated to 1 at the second iteration, and so on.
In every iteration, LOOP_VARIABLE will be set to one item from SEQUENCE;
so that after
, as the name suggests, is a variable that will change
value at every iteration:
In Python, the value of LOOP_VARIABLE will be set to the
The value of the item will be copied to LOOP_VARIABLE at every
is a variable whose value will be set to
A string is a sequence of characters, therefore it can also be used as the sequence to loop over:
word = "oxygen"
for char in word:
print(char)
The action of this loop is illustrated as follows:

It is very common to iterate over a range of values, something like
0, 1, 2, ..., 100; or 1, 4, 7, 10, ... 28.
Python provides a range function to define a sequence-like object
that can be iterated using for.
The range function has several possible syntax:
-
range(STOP)— yielding a sequence of0,1,2, …STOP-1. -
range(START, STOP)— yieldingSTART,START+1, …STOP-1. Again, the Python’s convention is thatSTOPis excluded from the result. -
range(START, STOP, STEP)— yieldingSTART,START+STEP,START+2*STEP, … not includingSTOPand beyond.
All START, STOP, and STEP arguments have to be integers.
rangeExercisesWhat are the outcome of these statements?
for A in range(5): print(A)for A in range(4,8): print(A)for A in range(32,45,3): print(A)Solutions
0 1 2 3 44 5 6 732 35 38 41 44
Making Sum with
forLoop?One common use of a
forloop is to create a sum, or perform an aggregation (maximum value, minimum value, average, etc.)Suppose you need to calculate the sum of values contained in a list
L. One way to achieve this is to use aforstatement:L = [1.5, 3.7, 4.0, -5.1 ] sum_L = 0. for val in L: sum_L = sum_L + val print(sum_L)This will yield
4.1. However, Python has the built-insumfunction to do exactly this:L = [1.5, 3.7, 4.0, -5.1 ] print(sum(L))Voila, you just shaved three lines off the program! Python has a lot of nifty tools like the
sumfunction, which can make your programs a lot cleaner, shorter, effective, and more fun to write.
Performance Note
It pays to learn more about Python functionalities. Unlike lower-level languages like C, where we cannot avoid using a loop to perform aggregation like making a sum, Python provides a lot of commonly used functions which saves us from writing as many hand-written loops. Besides making your program shorter, these functions help avoid a lot of common mistakes. More importantly, these functions are often written in C/C++/Fortran, yielding much higher performance compared to pure Python implementation.
Conditional Statement (if – else)
A programs often has to take actions only when certain conditions are fulfilled.
Sometimes there are different actions for different conditions.
This is done in Python using the if statement.
gender = "M"
if gender == "M":
pronoun = "He"
pronoun3 = "his"
else:
pronoun = "She"
pronoun3 = "her"
print(pronoun, "loves", pronoun3, "cat")
Here, gender = "M" is an assignment statement,
whereas gender == "M" is a comparison expression.
The latter yield a logical value (True or False).
The values of pronoun and pronoun3 variables depend on whether
gender is equal to a string "M", therefore the message that is printed
would also depend on the value of gender.
Notice that Python does not require parentheses to enclose the condition expression.
A numerical and string value can be fed to if statement
in lieu of a logical expression:
nonzero numbers stand for True, as well as nonempty strings and lists;
otherwise, the expression is equivalent to False.
Multiple conditions can be accommodated using the elif continuation.
An example would be the determination of student’s grade based on the total score:
score = 83.5
if score > 90:
grade = "A"
elif score > 80:
grade = "B"
elif score > 70:
grade = "C"
elif score >= 60:
grade = "D"
else:
grade = "F"
print(grade)
The first condition (score > 90) will be tested first: if it is true,
then "A" is assigned to the variable grade
and the rest of the conditions are not tested.
Otherwise, we go to the second condition (score > 80), and so on.
If all conditions do not evaluate to a True value, the statement block
after else will be executed.
The else part is optional: It may not exist if there is no action needed
for “all of the other” cases.
Is It Even or Odd?
Print a message stating whether a variable named
valcontains an odd or even number.Solution
val = 3 if val % 2: print('odd') else: print('even')The
%operator gives the remainder of the division ofvalby 2: It is zero for even numbers, and one for odd numbers. Becausevalis an odd number,val % 2yields1, thus the wordoddwill be printed.
Functions
Certain tasks are used frequently throughout a program.
One example is the conversion from a score to a grade,
as shown in the previous section.
A function is a block of subprogram (sequence of commands and statements)
which packaged as a unit, intended to accomplish a specified task.
Using functions helps programmers to write the code only once
and reuse it as frequently as needed.
In Python, a function is created using the def statement,
as illustrated in the following snippet:
def message():
print("Python is a great language to learn.")
print("It is fun, easy to use, yet powerful at the same time.")
print("With persistent use and practice, you'll master Python.")
There is no output when you completed the def statement above.
But you have just created a function called message, which can be called
at any time afterward.
The subprogram block in the function’s body
will be executed whenever the function is called.
Let us call message now:
message()
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.
We can also call the function multiple times:
message()
print()
print("LET'S SAY THAT AGAIN...")
message()
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.
LET'S SAY THAT AGAIN...
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.
What’s Different between Loops and Functions?
Function is similar to the
forloop or theif-elseconditional in that they form a bigger, logical piece of a program. In particular, both function and loop enable a block of subprogram to be executed more than once. There is one important difference, though: A loop only repeats a block of subprogram in one particular location of the program. The block associated with a function, in contrast, can be executed at different locations in the program, i.e. where the function call takes place.
Parameters
A function can have one or more parameters. Inside the function body (block), it acts like regular variables, but their values are not specified within this body. Rather, the values are defined at the point the function is called.
Let us make make_grade function which takes one parameter,
that is, the numerical score:
def make_grade(score):
if score > 90:
grade = "A"
elif score > 80:
grade = "B"
elif score > 70:
grade = "C"
elif score >= 60:
grade = "D"
else:
grade = "F"
return grade
This function also returns a value, which will need to be captured in a variable or printed. (Otherwise, the return value will be discarded.)
alice_grade = make_grade(89)
print("Alice's grade is", alice_grade)
print("Jason's grade is", make_grade(72))
Alice's grade is B
Jason's grade is C
A function can take multiple arguments, such as:
def add(a, b):
return a + b
A = 36
C = add(A, 2)
print(C)
Further, the arguments can be of different type.
The function below expects a real number for the scale argument,
and a sequence (list) for scores:
def scale_score(scale, scores):
result = []
for s in scores:
result.append(scale * s)
return result
Documenting a Function
Once a function gets more complex, it is better to document the function. Python has a great way of doing this, using a triply-quoted strings (these are basically an ordinary string, yet it allows newlines):
def scale_score(scale, scores):
"""Scales student's scores by a scale factor.
Args:
scale (float): scale factor.
scores (list): a list of students' raw scores.
Returns:
list: a list of students' scaled scores.
"""
result = []
for s in scores:
result.append(scale * s)
return result
The document should state the following:
-
the purpose of the function;
-
the input argument(s);
-
the return value(s);
-
any other notes regarding the behavior of the function that users may need to be aware of.
This documentation can be queried in an interactive python session:
help(scale_score)
Help on function scale_score in module __main__:
scale_score(scale, scores)
Scales student's scores by a scale factor.
Args:
scale (float): scale factor.
scores (list): a list of students' raw scores.
Returns:
list: a list of students' scaled scores.
Documenting a function is another good coding practice which you want to foster in yourselves early on. This helps other people (among those are your future selves!) to better understand your code.
Let’s try our new function:
old_scores = [50, 70, 40, 85]
new_scores = scale_score(1.25, old_scores)
print(new_scores)
[62.5, 87.5, 50.0, 106.25]
Changing the Value of a Parameter in a Function?
(Note: This is an intermediate topic, so you can skip when reading the lesson for the first time.)
Values are passed by reference in Python. Technically, all variables are just reference to an object or a value residing somewhere in the Python interpreter’s memory. Changing the value of a parameter inside a function is not prohibited by Python, but it may not do what you want. Assigning a new value to a parameter (e.g. setting it to a new string, int, or list) would not change the original value existing in the caller’s scope. But manipulating a parameter (e.g., appending a new element to a list using the
appendmethod) would propagate the effect outside the caller. This is an intended design: in this way, functions can be used to manipulate objects. A completely new data should be returned as a function return value.
Library and Modules
A library is a collection of files (called modules) that contains functions for use by other programs. A module can be viewed as a toolbox that contains a lot of tools: These tools (hammer, screwdrivers, pliers, etc.) are analogous to the functions. This boolbox may be a part of a greater collection of tools for an auto mechanic. This mechanic may have another toolboxes: electrical toolbox, engine toolbox, etc. The entire collection of these toolboxes would be the library.
Libraries or Modules?
A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.
In Python, we need to import a module
to use the functions contained in this module.
This is done using the import statement:
import MODULE_NAME
This will make the functions, variables, and other objects
in the MODULE_NAME module accessible by the Python interpreter.
From this point on, you can access the contents of this module
(such as functions and variables) by prepending their names
with a MODULE_NAME. prefix.
The name of the module also serves as a namespace
for the functions and variables provided by that module.
Let us consider a concrete example:
Python has a module of mathematical functions called math,
which contains many mathematical functions and constants,
such as: the square root function (sqrt),
exponentiation (exp, pow), logarithmic (log, log2, log10),
trigonometric functions (sin, cos, tan, asin, acos, …),
and many more.
(See the math reference documentation
for more details.)
To calculate the square root of a number, use the sqrt function contained in this module:
import math
a = 25
b = math.sqrt(a)
print(b)
print(sqrt(2))
5.0
1.4142135623730951
We can also use the name sqrt without the math. qualifier,
by importing the name directly into the current namespace.
This can also be done in Python:
from math import sqrt
from math import cos, pi
print(sqrt(81))
print(cos(pi))
9.0
-1.0
Hint: If you don’t invoke import math beforehand,
then the name math is not known to the interpreter,
but the sqrt name will still be accessible.
Out of the box, Python already comes with a fairly complete library. We will call this the “core library”. Some notable modules in the core library include:
-
os: operating-system related functions; -
sys: Python/system related functions; -
math: mathematical functions and constants; -
re: regular expression search and operation; -
csv: tools to read and write CSV (comma separated value) files; -
json: tools to read/write data in JSON format; -
mailbox: tools to read/write internet mailbox in MBOX format; -
socket,ssl,urllib,http, …: Network- and Internet-related functions; -
and many more!
(Clicking on the module name would lead you to the module’s reference documentation.) We recommend that you survey the Python Standard Library reference documentation to become familiar with the functionalities offered by Python core library.
Examples of Important Python Modules
A vast amount of capabilities in Python actually come from the libraries developed and maintained by many groups and communities throughout the world. In this sidebar we will survey a few important ones—those that have become every Python programmer’s essential toolboxes as well as those that may be relevant for cybersecurity applications.
NumPy and SciPy
NumPy (short for “Numerical Python”) and SciPy (“Scientific Python”) are packages designed for numerical computation. NumPy provides a powerful N-dimmentional array object, an assortment of routines for fast operations on arrays, such as mathematical, logical, shape manipulation, sorting, selecting, input/output, Fourier transforms, basic linear algebra, statistical operations, random number generators and more. SciPy contains modules for optimization, advanced linear algebra, integration, interpolation, special functions, Fourier transforms, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering.
- Numpy website: https://numpy.org/
- Scipy website: https://www.scipy.org/
Pandas
Pandasstands for Python Data Analysis Library. It is designed to offer data strucures for manipulating numerical tables and time series. It is a powerful tool when dealing with large tables.
Pandaswebsite: https://pandas.pydata.org/
MatplotlibandSeabornThese are ploting libraries.
Matplotlibis a Python 2D plotting library which produces figures in a variety of formats and interactive environments across platforms. You can useMatplotlibto generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc.,Seabornis a Python data visualization library based onMatplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Matplotlibwebsite: https://matplotlib.org/Seabornwebsite: https://seaborn.pydata.org/
Scikit-learn,Tensorflow,Theano,Keras, andPytorch
Scikit-learn,Tensorflow,Theano, andKerasare libraries desinged for deep learning applications.Scikit-learnprovides methods for classification, regression, clustering, dimentionality reduction, model selection, a preprocessing.TensorflowandTheanoare low level neural network model development tools.Kerasis a high level package for neural network capable to run on top ofTensorflowandTheano.Pytorchis packages desinged to replaceNumpyin order to take advantage of the power of GPUs, it is also a platform for deep learning providing flexibility and speed.
Scikit-learnwebsite: https://scikit-learn.org/stable/Tensorflowwebsite : https://www.tensorflow.org/Keraswebsite: https://keras.io/Pytorchwebsite: https://pytorch.org/
NLTK
NLTKstands for Natural Language Tool kit. It is a package designed for human language data. It provides interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
NLTKwebsite: https://www.nltk.org/
Pycrypto
Pycryptoas you probably guessed it is a collection of tools for cryptography work. It provides various encription algorithms such as AES, DES, RSA to name a few.
Pycryptowebsite: https://pypi.org/project/pycrypto/
scrapy,beautifulsoup, andseleniumIf you are looking for web related packages then these three are what you need.
ScrapyandBeautifulsoupprovide ways to extract data out of HTML content, meaning web pages.Seleniumon the other hand is a web browser automation tool. It is useful to write test scripts for web based applications.
Scrapywebsite: https://scrapy.org/Beautifulsoupwebsite: https://www.crummy.com/software/BeautifulSoup/Seleniumwebsite: https://www.seleniumhq.org/
Exercises
Exercise 1
Write a function that takes as input parameter a list of integer numbers and then prints two lists: the first list being the sublist of the input list containing only even numbers, the second list containing odd numbers only.
Exercise 2
Write a function that takes two numbers as parameters and returns the maximum of the two numbers.
Exercise 3
Write a function called fizz_buzz that takes a number.
- If the number is divisible by 3, it should return “Fizz”.
- If it is divisible by 5, it should return “Buzz”.
- If it is divisible by both 3 and 5, it should return “FizzBuzz”.
- Otherwise, it should return the same number.
Exercise 4
Write a function called show_stars(rows). If rows is 5, it should print the following:
*
**
***
****
*****
Hint: Think of addition on strings of characters.
Further Learning
- Python’s official documentation: https://docs.python.org/3/
*
Key Points
Python is a high-level, interpreted, general-purpose programming language.
Key data types are integers, floats (real numbers), and strings.
List and array are container data types to store a large amount of data.
The
forstatement is useful to repeat actions by looping over a list of values.The
if–elif–elseconstruct allows program to execute commands conditionally.There are vast number of libraries which makes Python a productive computing platform.