A Brief Introduction to Python
Overview
Teaching: 30 min
Exercises: 10 minQuestions
What is Python?
How do I use Python to perform computation with numbers and text?
How do I handle a large amount of data using Python?
How do I write a tool in Python?
Objectives
Understand essential elements of a Python programming language.
Being able to write simple Python programs to process data.
This episode serves a crash course to Python programming language. It covers only the bare essential to get you started with Python. At the end of this lesson, there will be a pointer to lesson series which you can pursue on your own to become proficient with Python.
What Is Python? Why Python?
Python is a high-level, general purpose programming language. Python emphasizes code readability and ease of use, and its syntax encourages good programming practices as well as productivity. Today, Python has become one of the most popular programming languages, even widely deployed by tech giants such as Google, Amazon, Facebook, … Python comes with a vast array of powerful libraries, such as:
numpy
andscipy
for numerical calculations (“number crunching”),pandas
for data analytics,matplotlib
andseaborn
for plotting and visualization,scikit-learn
,tensorflow
,keras
,pytorch
for machine learning,nltk
for natural language processing,pycrypto
for cryptography,scrapy
,beautifulsoup
, andselenium
for web scraping.
These libraries enable programmers to accomplish their goals (i.e. “obtain the largest five eigenvalues of a matrix” or “create a machine learning model to flag spam emails”) without having to know the details of the complex underlying algorithms.
Those who have used other programming languages will find it rather easy to pick up Python. Python is an interpreted language, which means that a Python interpreter is required to run a Python program.
How Does Python Compare to C/C++ Language?
C emphasizes low-level details of the program, down to the bare metal details (such as, integers, bits, pointers, data alignment). C++ provides much more convenience by adding object-oriented capabilities and significantly expanded productivity libraries (standard C++ library, Boost), yet still allowing (and often requiring) programmers to take care of machine-level issues. Python, on the other hand, emphasizes high-level programmability without forcing programmers to worry about gory details.
Compared to C/C++, Python provides a much gentler learning curve to new programmers, and much shorter time-to-productivity.
C/C++ programs are compiled to produce a binary executable programs, containing machine instructions that can be executed directly by the processor (CPU). In contrast, because Python is an interpreted language, a Python interpreter is always required to run a Python program. The intepreter translates the human-friendly Python statements into instructions to be executed by the CPU one bit at a time. The process of interpreting the program in this way takes time; therefore interpreted computer programs run significantly slower than the binary executable programs. Quite often, a well-written C/C++ program can accomplish the same task up to 5-100 times faster as an equivalent implementation written in pure Python. We will discuss this issue more in module 6 of this training program. However, this picture would change for programs that rely heavily on high-performance libraries made available for Python (such as NumPy, TensorFlow, etc.). In this training program, we strive to use Python libraries in a manner that are conducive to high-performance computation.
In this training program we will focus on Python 3, which is the current version of Python.
Accessing Python from Turing HPC
On Turing, as in many HPC systems, access to available software packages is managed using a shell command called
module
. (Due to the age of the cluster, two variants ofmodule
is available on Turing. We recommend using the newer variant calledlmod
, by first invokingenable_lmod
.) Here is the sequence of commands needed:$ enable_lmod $ module load python $ module load ipython # recommended, see below $ module list # optional, but please try
(You will need to repeat these commands the next time you login to Turing again.)
By default,
module load python
will load Python 3.6 on Turing. Python 3.7 is also available; usemodule load python/3.7
to specify the exact software version.The
module
command requires a subcommand name, and sometimes additional arguments. Here are the most frequently used invocations:
module avail
— lists the available module on the system;module load PACKAGENAME
— loads the modulePACKAGENAME
, i.e. make this software available in the shell;module unload PACKAGENAME
— unloads the modulePACKAGENAME
;module whatis PACKAGENAME
— prints information about modulePACKAGENAME
;module list
— lists the currently loaded modules.
Python Operating Modes
There are two ways to interact with the Python interpreter: the interactive mode and the script mode.
Interactive mode
Python interactive mode allows you
to execute Python statements instantly from the command line.
This is very much like the way we interact with UNIX shell:
we enter a Python statement, press Enter,
Python runs the statement, and prints the result (if applicable),
then returns to the prompt again.
To launch the Python interpreter, invoke python
command
from your (UNIX) command line:
$ python
Python 3.6.9 (default, Sep 17 2019, 12:17:19)
[GCC Intel(R) C++ gcc 4.9.4 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
The >>>
indicates Python’s prompt;
it tells you that you are in the interactive mode.
To exit Python, use the quit()
or exit()
command,
use the Ctrl+D keyboard shortcut.
>>> quit()
ipython
—Better Interactive PythonIn this workshop, we recommend a more sophisticated Python front-end called
ipython
—short for interactive Python. It provides better command history, output history, syntax highlighting, tab completion, and customizability. It even doubles as a UNIX-like “shell” for the most commonly used commands. To useipython
, please make sure that theipython
module is loaded afterpython
module.$ module load python # if python has not been loaded $ module load ipython # if ipython has not been loaded $ ipython
Python 3.6.9 (default, Sep 17 2019, 12:17:19) Type 'copyright', 'credits' or 'license' for more information IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help. In [1]:
Notice that the prompt is different. The number
1
will be incremented every time a new command is entered.
Arithmetic with Python
From now on, you can type valid Python statements to execute and get outputs from. Python in interactive mode can be used to perform arithmetic:
>>> 2 + 3
5
>>> 5 / 2
2.5
Python supports the usual computer arithmetic operators:
operator | Meaning | Examples |
---|---|---|
+ |
Addition | 5 + 7 , 7.43 + 54 |
- |
Subtraction | 10 - 3 , 3.5 - 10 |
* |
Multiplication | 5 * 7 , 3.5 * 10 |
/ |
Division | 4 / 2 , 10 / 3 , 1 / 7 , 14 / 0.2 |
** |
Exponentiation | 2 ** 3 , 9 ** 0.5 |
( and ) |
Group expression for evaluation; override standard operator precedence | 5 * (3 + 4) , compare against 5 * 3 + 4 |
The usual mathematical convention for operation order (which one gets computed first, also termed operator precedence) applies: Exponentiation is computed first, followed by multiplication and division, then addition and subtraction. Python documentation has a reference table of operator precedence which you can be helpful to ensure that you write correct Python expressions. Please try many other expressions you can think of so you become comfortable with Python as a calculator.
Computing Total Price and Splitting the Bill
Mary and her two roommates split their grocery expenses evenly. Mary just bought the following:
- A dozen eggs for $1.49
- A loaf of bread for $2.79
- A bag of potato chips for $1.99
- A bottle of hand soap for $2.49
- A stack of paper plates for $3.99
In Virginia, tax rates are 2.5% for food and 6% for non-food. Please create Python expressions to do the following:
- Compute the total cost of the grocery bill.
- Compute the payment of each person to cover this bill.
Solutions
We can use parentheses to compute the quantities in one step. This is just one way among many ways to get the computation done.
(1.49 + 2.79 + 1.99) * 1.025 + (2.49 + 3.99) * 1.06 ((1.49 + 2.79 + 1.99) * 1.025 + (2.49 + 3.99) * 1.06) / 3
The total cost is $13.30 and each one has to pay $4.43 (but one has to pay one cent more to cover the total cost).
The examples above show that
Python can represent whole numbers (integers)
and real numbers (those with decimal points).
While integers are represented perfectly,
real numbers are represented with limited number of digits
(approximately 15 in today’s computer)
and are subject to roundoff errors.
This shows up in at least one of the examples above, where
13.29555
was printed as 13.295550000000002
.
Real numbers have sufficiently long number of digits
to allow for reliable computation in the vast majority of cases.
We will not discuss this further as it is an advanced topics.
Scripting with Python
The interactive mode is useful but not always practical.
Imagine you have thousands of statements to execute in order,
and you need to repeat the process at least one more time.
In such a case,
we can save these Python statements into a text file called a Python script
and have the Python interpreter run them.
Python scripts, or programs, usually have the .py
filename extension.
Hello World
: Our First Script
In interactive mode, Python prints the result of an expression immediately after it is executed. In the scripting mode, an expression has to be printed in order to be output (to the terminal or to a file).
Let us print a simple “Hello World” message—which is
a tradition when learning a new programming language.
Using a text editor (e.g. nano
),
create a text file named hello.py
containing one statement:
print("Hello World")
Save the file, then execute the script:
$ python hello.py
Hello World
In general, a Python script is executed from the UNIX shell in this way:
python /PATH/TO/SCRIPT.py
.
Any command you are able to run in the interactive mode
can be added to a script and executed by the Python interpreter.
We can print the result of a math expression:
print(1 + 2 + 3 + 4 + 5)
We can use multiple print
statements to print multiple text lines.
A print
can also print multiple items in a single statement:
print("Some math examples")
print("Sum of 1 through 5 is ", 1 + 2 + 3 + 4 + 5)
print("Square root of 1, 2, 3, 4 are", 1**0.5, 2**0.5, 3**0.5, 4**0.5)
Please run this script several times. You will notice that the output are always printed in the order the statements appear in the script. This is a bedrock principle of a sequential computer program: the computer will read and execute the commands/statements one at a time, and in the order these commands appear in the program. Remembering this principle will help you (1) predict the outcome of a computer program just by reading it, and (2) avoid confusion about what a program will do.
Statements, Indentation, Code Blocks
Python language syntax has a few rules that distinguish it from other computer languages. Let us use the following program snippets to illustrate the notable features of Python language:
# This is a sample program written in Python
def greet(name, gender, majors, graduate_year):
if gender == "M":
pronoun = "He"
pronoun3 = "his" # third-person pronoun
else:
pronoun = "She"
pronoun3 = "her"
print("Hello, this is", name)
print(pronoun, "has", len(majors), "major(s):")
for m in majors:
print("-", m)
print(pronoun, "completed", pronoun3, "education in", graduate_year)
print()
greet("Elaine", "F", ["Art", "Mathematics", "History"], 1993)
greet("Johnson", "M", ["Sociology"], 2000)
That is a complete Python program which can be run to produce the following output:
Hello, this is Elaine
She has 3 major(s):
- Art
- Mathematics
- History
She completed her education in 1993
Hello, this is Johnson
He has 1 major(s):
- Sociology
He completed his education in 2000
Unlike C/C++, a Python statement generally ends with the new line.
There is no mandatory end-of-statement marker like a semicolon (;
),
although a semicolon is indeed recognized as such.
If a line is so long that it has to wrap, please terminate the
incomplete line by appending a backslash character (\
).
Python is well known for its extensive and strict use of whitespaces
to indent program lines.
(To indent a program line means to add a number of whitespace characters
before the first non-space character in that line.)
In Python, like any other programming language,
a set of commands or statements can be grouped into a block,
which then becomes an integral part of a language construct
(loops, conditionals, function definition).
Python uses indentation to distinguish a block of statements.
A block is clearly identified by its consistent indentation level.
In the example program shown earlier, the def greet(...):
statement is followed by a code block that starts with
if gender == "M":
and terminates after
the lone print()
statement.
Similarly, the if
clause initiates a new block containing two program lines,
followed by the else
clause and yet another block.
The “if
–block–else
–block” sequence constitutes
a complete construct for conditional execution—as we will learn later
in this episode.
Python’s convention for code blocks
is in contrast to the case of C/C++ language, where
a code block in the if
, for
or a function definition
like int main(...)
is clearly delimited by a matching pair of curly brackets {
…}
.
This means if you have started a new block using four-whitespace indentation,
you must indent every command in this block
with four whitespace characters.
Python will catch inconsistencies in block-level indentation and issue a syntax error.
While this may appear to be overly restrictive,
it actually encourages good programming behavior and readability of Python programs.
The most widespread convention among many Python programmers is
to prepend extra four whitespace characters to introduce a new (sub)block.
We recommend that you also follow this practice.
We will see this more practically in the next sections.
Comments
Comments (non-executable texts) can be added to a Python script
by prepending the text with the hash #
character.
Comments can also appear after a statement.
Both cases appear in the sample program above.
Fear not, all the constructs used in the program above will be explained shortly, so you will understand what the program is doing after finishing this episode. Finally, the rules regarding indentations and comments above apply not only to Python scripts, but also to Python statements entered in the interactive mode.
Basic Elements of a Program
For the rest of this episode, you will learn the basics building blocks of a program. We will learn these in the context of Python programming language, but they are applicable in many other languages. These are just a few things that you can use in your scripts to get started with computer programming. At the end of this episode we will provide some pointers for further learning.
As a roadmap, here are the key elements included in this episode:
-
Variables;
-
Data types, with initial emphasis on numbers and strings;
-
Statement block;
-
Looping through iteration using the
for
statement; -
Conditional statements (
if
,elif
, `else); -
Lists;
-
Arrays using
numpy
; -
Data structure using
dict
; -
Functions;
-
Script arguments.
We will also present a quick overview of key Python libraries that you may find useful for cybersecurity applications.
Variables
Arithmetic is useful, but algebra makes mathematics even more useful
by allowing us to make manipulations and define relationships
among yet-to-be-specified quantities, denoted by symbols such as
x
, y
, and so on.
The same thing goes with computer program:
A variable plays the role of symbols in algebra.
In Python, a variable is simply a label for a value
(or another type of object we will learn shortly).
This gives us a handle to refer to that value indirectly
by the name of the variable.
We define a variable by assigning a value to it, using the =
operator.
Some examples:
a = 4
b = 5 / 2
c = "Hello World"
d = a + b
name = "Thomas"
Several rules regarding Python variables:
-
Variable name can contain only letters (
a
-z
,A
-Z
), digits (0
-9
), and underscores (_
). The name cannot start with a digit. Names that start with an underscore are often reserved (e.g.__file__
,__name__
) or have certain meanings. (If you just get started with Python, it is best to use variable names that begin with a letter until you know the uses of names that start with an underscore.) -
Names are case sensitive: For example,
name
andName
andNAME
are three distinct variables. -
Use names that conveys the meaning of the value in a concise way. For example, use
weight = 127
to represent a weight quantity with a value of 127; whileiurj2k3u = 127
obfuscate the meaning of the variable. -
Be aware of Python’s reserved words and absolutely avoid using them for variable names:
False await else import pass None break except in raise True class finally is return and continue for lambda try as def from nonlocal while assert del global not with async elif if or yield
It is best to also avoid built-in function and type names and standard Python library names as well as popular library names like
numpy
and others mentioned at the beginning of this episode.
Variables can be printed using print
statement as before:
print(a)
print(a, "+", b, "=", d)
print("Greetings ", name)
4
4 + 2.5 = 6.5
Greetings Thomas
The value of a variable can be updated; after that, the variable reflects the updated value. For example:
a = 4
print(a)
a = 27
print(a)
b = a - 7
print(b)
4
27
20
Data Types
Python supports various data types. We have seen three data types so far:
-
integers (whole numbers), representing discrete quantities, such as
4
and-30
; -
floats (real numbers), representing continuous values, such as
2.7
and0.00471659
, clearly indicated by the presence of a decimal point; -
strings (sequence of characters), such as
"Hello World"
or'Hello Thomas'
. Both single and double quotes are supported, but a string has to be opened and closed using the same quotation character, not like:"This is a bad string'
.
Integers are used to represent discrete quantities such as the count of a particular type of events, the number of processors a computer have. Real numbers are needed for quantities that can be arbitrary in value, such as lengths, power, probability, etc.
The type
function can be used to determine the type of a variable.
For example,
a = 4
b = a / 2
c = "Hello world"
print(type(a))
print(type(b))
print(type(c))
<class 'int'>
<class 'float'>
<class 'str'>
Assigned Value Determines Data Type
Unlike C and C++, where the data type of a variable is fixed at compile-time by explicitly declaring the type, the data type of a Python variable is determined by the value assigned to that variable. It is possible that a variable will change type because it is assigned a new value with a different datatype:
a = 24 print(type(a)) a = "Hello world" print(type(a))
Integers
An integer is simply a whole number, which can be positive, zero, or negative. It cannot store fractional numbers (thus no decimal point). Python’s integer has an unlimited precision: It can store an arbitrarily large number! (Try it.)
Why Integers?
An integer is actually the most basic representation of data in a digital computer. In digital computers, integers are represented as binary numbers (zeros and ones) with a fixed number of digits. When we speak of a 32-bit or 64-bit computers, this term actually refers to the number of bits that the processor registers have (a register can be thought of as a computer’s native “variable”). For example, as of year 2019, an Intel Core i3, i5, i7, or i9 processor has native 64-bit registers, but it can also process 8-, 16-, and 32-bit numbers. How can Python support arbitrarily long integers? It uses software to emulate the operations on such long integers.
In Python, the /
division operator always produces a real number,
regardless the type of the operands involved.
To force an integer division, use the //
operator.
Division Exercises
What are the results of the following statements? Before running these on the IPython prompt, think what the results should be. Then run them and observe the outcome.
7 / 3 7 // 3 7.0 // 3 8 / 3 8 // 3 8.9 // 3 8 // 2.95
This exercise exhibits the nature of Python division operators under varying circumstances.
Binary, Octal, and Hexadecimal Numbers
For system-level programming (involving operating system, hardware, networking, etc.), integers play a crucial role. In this context, the underlying binary nature of integers is very much exploited.
Python makes it convenient to deal with this type of data.
-
Binary (base-two) numbers are denoted by the prefix
0b
(that is, a digit zero plus a literal letterb
). It must be followed by a sequence of ones and zeros. For example,0b101110
is equal to the number 46 (forty-six) = 1×32 + 0×16 + 1×8 + 1×4 + 1×2 + 0×1 in our customary numbering system—the decimal system. -
Octal (base-eight) numbers are denoted by the prefix
0o
(a digit zero plus a literal lettero
), followed by sequence of digits, each ranging from0
through7
, inclusive. As an example,0o775
stands for 509 in the decimal system. -
Hexadecimal (base-sixteen) numbers are denoted by the prefix
0x
(a digit zero plus a literal letterx
), followed by a sequence of digits0
through9
as well as lettersa
throughf
. This gives a total of sixteen symbols (ten digits and six letters) to represent a number in a base-sixteen representation. For example,0xa7f3
stands for number 42995 in the decimal system.
(Note that the letters in the representation above are case-insensitive.
For example, 0B0110
is equivalent to 0b0110
;
0xA7F3
and 0Xa7F3
and 0XA7F3
are all equivalent to 0xa7f3
.
However, be consistent with the case convention you use in your program.
Most people use lowercase for the prefix, so please follow that also in your program.)
If you need a tutorial or review of these seemingly fancy numbering systems, we would refer you to some excellent resources on the Internet:
-
Number Systems—Decimal, Binary, Octal and Hexadecimal, a brief tutorial by Rukshani Athapathu on building a number in binary, octal, and hexadecimal representations.
-
Number Conversion—Binary Octal Hexadecimal, a brief tutorial by DYclassroom (Yusuf Shakeel) on the various number systems and how to convert a number between different representations.
-
Binary number (from Wikipedia), an in-depth treatise covering history, representation, arithmetic, etc.
-
Hexadecimal (from Wikipedia).
Strings
A string is simply an ordered sequence of characters. Unlike C language, Python string comes natively with a rich set of features which make string processing a breeze. Here is a quick overview (many “by example”) of the features of Python strings.
The length of the string can be queried using the len
function.
A = "Hello world"
print(len(A))
11
Quite often, one would need to extract a character or a substring from a string. Python uses the same convention as C for element indexing: the characters in a string are indexed using an integer from 0, 1, 2, …, using the square brackets as the indexing operator.
A = "Hello world"
B = A[0]
print(B)
print(A[4])
print(A[11])
H
o
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
The last line raises an error called IndexError
because the index (11)
is beyond what is valid for the string (0
.. 10
in this case).
Python supports negative index, which is counted from the end of the string:
A = "Hello world"
print(A[-1])
print(A[-5])
print(A[-11])
d
w
H
How do we extract a substring? Python has the concept of a slice, which defines a range of elements to pick out from a given sequence. Let’s start with some examples:
A = "Hello world" # A has 11 characters
print(A[1:5])
print(A[:3])
print(A[3:])
print(A[3:-3])
ello
Hel
lo world
lo wo
With a slice, two numbers are given, separated by a colon. The slice syntax is therefore
STRING_VAR [ START : STOP ]
If START
or STOP
is omitted,
then it is implicitly taken to be the beginning
or end of the sequence, depending on which index is omitted.
Python takes a rather quirky convention:
whereas the START
element is included in the slice result,
the STOP
element is not.
Strings can be joined (concatenated) to form a longer string. For example:
A = "Hello world"
B = A + "Thomas"
print(B)
Hello worldThomas
Modifying a String
A string is an immutable object in Python: which means, once created, it cannot be modified. For example, this kind of statement is invalid for string:
A[1] = "a"
. To modify a string, we will need to create a new string to include the modification. How can we changeA
from “Hello world” to “Hallo world” ? (Hint: use the concatenation (+
) operator.Solution
B = A[:1] + "a" + A[2:] print(B)
There are many more capabilities built into a Python string!
Python uses an object-oriented approach to manipulate strings.
A string has the upper
and lower
methods to create
an uppercase and lowercase versions of the string, respectively;
split
to split the string at a specified separator.
Some examples:
A = "Hello world"
print(A.upper())
print(A.split())
HELLO WORLD
['Hello', 'world']
(The last command yields a list, which we will cover very soon.)
If you are working with text, we recommend you to learn more about Python strings through the following resources:
-
Python string tutorial (from TutorialsPoint).
We encourage you to experiment using interactive Python mode to gain an understanding on string and other topics related to Python. There is no better way to learn than to experiment with the language elements directly!
Converting Data Types
Quite often, we have to convert data from one type to another.
For example, data read from a text file would be a string.
To convert this to a number that can be processed numerically,
we use the int
or float
functions (for conversion to
an integer or a real number):
rr = '71'
ss = '142.5'
R = int(rr)
S = float(ss)
print(R)
print(S)
print("Double all up:")
print(R * 2)
print(S * 2)
Conversely, a number can be converted to a string by the str
function.
Convert ‘Em!
What would be the output of the following snippet?
R = 71 S = 142.5 print(float(R)) print(int(S)) R_str = str(R) print(R_str, "is a", type(R_str)) age_message = "My age is " + str(R) + "years old" print(age_message)
Solutions
71.0 142 71 is a <class 'str'> My age is 71years old
Lists
One important reason of using a computer is its ability to store and process
a lot of data.
For this reason, Python provides a number of container data types,
which are capable of containing multiple values (or objects).
In this short lesson we will cover only two types in details,
namely list
and dict
.
A list
is an ordered sequence of values or objects.
Here are a few examples of list
objects:
blank = []
trio = [1, 2, 3]
record = [1998, 3, "204.31.253.89", "United States"]
# Now print them out:
print(blank)
print(trio)
print(trio[0])
print(trio[1:3])
print(len(record))
print(record[2])
[]
[1, 2, 3]
1
[2, 3]
4
204.31.253.89
A list
object has many similarities to a string:
-
The contained elements are ordered and can be indexed by integers from 0, 1, … ;
-
list
supports slicing operator; -
len
function acting on a list object returns the number of elements in that list.
However, unlike a string, a list
can contain values with arbitrary data types,
and its contents can be altered (i.e. it is mutable).
Items can be added to, or removed from, the list.
Here are a few actions that can be done for a list object named L
:
-
Add a new item at an arbitrary location:
L.insert
method; -
Add a new item at the end of the list:
L.append
method; -
Update the value of the
i
-th element:L[i] = new_value
; -
Sort the entire list:
L.sort
method; -
Delete an item at index
i
usingdel L[i]
; -
Delete the contents of the entire list:
L.clear
method.
List Manipulation in Action
What is the output of this program?
fruits = ["banana", "apple", "mango"] print(fruits) fruits.append("pineapple") print(fruits) fruits.insert(1, "orange") print(fruits) fruits.sort() print(fruits) fruits[1] = "pear" print(fruits) del fruits[2] print(fruits) fruits.clear() print(fruits)
Solution
['banana', 'apple', 'mango'] ['banana', 'apple', 'mango', 'pineapple'] ['banana', 'orange', 'apple', 'mango', 'pineapple'] ['apple', 'banana', 'mango', 'orange', 'pineapple'] ['apple', 'pear', 'mango', 'orange', 'pineapple'] ['apple', 'pear', 'orange', 'pineapple'] []
To learn more about list
and how to effectively use it, please
refer to the
TutorialsPoint’s lesson on list
.
A list
can be used in many ways in Python;
but the most common uses are:
-
to store a collection of items of the same type (this collection is often termed an array):
trio = [1, 2, 3]
-
to store a collection of items that has a defined structure (this kind of collection is often termed a data structure or a record):
record = [1998, 3, "204.31.253.89", "United States"]
In the example above, numbers
1998
and3
refer to the year and month of a spam email, whereas the strings204.31.253.89
andUnited States
refer to the deduced originting IP and country. The example also shows that the data types of the items are not uniform (two integers and two strings).
These are not an exhaustive list of possible uses of a list
.
A list
can also be nested, that is, contain other lists,
to create a multidimensional array or a table:
# An example two-dimensional array
sudoku = [ [ 4, 9, 2 ],
[ 3, 5, 7 ],
[ 8, 1, 6 ] ]
# An example of a structured table
# (an array of records)
results = [ [1998, 3, "204.31.253.89", "United States"],
[1999, 1, "194.213.210.20", "Czech Republic" ],
[1999, 12, "202.96.198.238", "China"] ]
Accessing elements would involve two indexing operators:
print(sudoku[0][1])
print(results[1][2])
9
194.213.210.20
The first number indexes the outermost dimension, i.e. the “row”, the second number indexes the inner dimension, i.e. the “column”.
Repeating Actions: for
Loop
Now that we have a way to store a bunch of data in a list,
we need a way to perform repetitive actions on these data.
Python uses the for
statement
to define a loop construct
to repeat actions over a sequence of data.
Here is an illustration of the for
statement:
for A in [ 0, 1, 2, 3 ]:
print(A)
0
1
2
3
The syntax of a for
statement is:
for LOOP_VARIABLE in SEQUENCE:
STATEMENTS...
Here, SEQUENCE
is a sequence object (list
, string, dict
, etc.)
containing zero or more items which we want to iterate over.
STATEMENTS
is a placeholder for a code block (explained earlier),
which contains one or more Python statements
to be repeated.
Python’s for
statement has a different behavior from C-style for
.
In most common cases, where SEQUENCE
contains n
items,
the STATEMENTS
block will be executed n
times
(once for every element in SEQUENCE
).
The items will be iterated in order
(from the beginning to the end of the sequence), and
the value of the LOOP_VARIABLE
will be set to the current item.
Be aware that the colon after the SEQUENCE
is mandatory, as well as
the indentation of STATEMENTS
.
Let’s revisit the illustration above:
the SEQUENCE
is a list of four elements: [0, 1, 2, 3]
;
therefore the for
loop will execute the STATEMENTS
four times.
In this case, the STATEMENTS
simply consists of one statement:
print(A)
.
The value of A
is set to 0
at the first iteration, then
it is updated to 1
at the second iteration, and so on.
In every iteration, LOOP_VARIABLE
will be set to one item from SEQUENCE
;
so that after
, as the name suggests, is a variable that will change
value at every iteration:
In Python, the value of LOOP_VARIABLE
will be set to the
The value of the item will be copied to LOOP_VARIABLE
at every
is a variable whose value will be set to
A string is a sequence of characters, therefore it can also be used as the sequence to loop over:
word = "oxygen"
for char in word:
print(char)
The action of this loop is illustrated as follows:
It is very common to iterate over a range of values, something like
0, 1, 2, ..., 100
; or 1, 4, 7, 10, ... 28
.
Python provides a range
function to define a sequence-like object
that can be iterated using for
.
The range
function has several possible syntax:
-
range(STOP)
— yielding a sequence of0
,1
,2
, …STOP-1
. -
range(START, STOP)
— yieldingSTART
,START+1
, …STOP-1
. Again, the Python’s convention is thatSTOP
is excluded from the result. -
range(START, STOP, STEP)
— yieldingSTART
,START+STEP
,START+2*STEP
, … not includingSTOP
and beyond.
All START
, STOP
, and STEP
arguments have to be integers.
range
ExercisesWhat are the outcome of these statements?
for A in range(5): print(A)
for A in range(4,8): print(A)
for A in range(32,45,3): print(A)
Solutions
0 1 2 3 4
4 5 6 7
32 35 38 41 44
Making Sum with
for
Loop?One common use of a
for
loop is to create a sum, or perform an aggregation (maximum value, minimum value, average, etc.)Suppose you need to calculate the sum of values contained in a list
L
. One way to achieve this is to use afor
statement:L = [1.5, 3.7, 4.0, -5.1 ] sum_L = 0. for val in L: sum_L = sum_L + val print(sum_L)
This will yield
4.1
. However, Python has the built-insum
function to do exactly this:L = [1.5, 3.7, 4.0, -5.1 ] print(sum(L))
Voila, you just shaved three lines off the program! Python has a lot of nifty tools like the
sum
function, which can make your programs a lot cleaner, shorter, effective, and more fun to write.
Performance Note
It pays to learn more about Python functionalities. Unlike lower-level languages like C, where we cannot avoid using a loop to perform aggregation like making a sum, Python provides a lot of commonly used functions which saves us from writing as many hand-written loops. Besides making your program shorter, these functions help avoid a lot of common mistakes. More importantly, these functions are often written in C/C++/Fortran, yielding much higher performance compared to pure Python implementation.
Conditional Statement (if
– else
)
A programs often has to take actions only when certain conditions are fulfilled.
Sometimes there are different actions for different conditions.
This is done in Python using the if
statement.
gender = "M"
if gender == "M":
pronoun = "He"
pronoun3 = "his"
else:
pronoun = "She"
pronoun3 = "her"
print(pronoun, "loves", pronoun3, "cat")
Here, gender = "M"
is an assignment statement,
whereas gender == "M"
is a comparison expression.
The latter yield a logical value (True
or False
).
The values of pronoun
and pronoun3
variables depend on whether
gender
is equal to a string "M"
, therefore the message that is printed
would also depend on the value of gender
.
Notice that Python does not require parentheses to enclose the condition expression.
A numerical and string value can be fed to if
statement
in lieu of a logical expression:
nonzero numbers stand for True
, as well as nonempty strings and lists;
otherwise, the expression is equivalent to False
.
Multiple conditions can be accommodated using the elif
continuation.
An example would be the determination of student’s grade based on the total score:
score = 83.5
if score > 90:
grade = "A"
elif score > 80:
grade = "B"
elif score > 70:
grade = "C"
elif score >= 60:
grade = "D"
else:
grade = "F"
print(grade)
The first condition (score > 90
) will be tested first: if it is true,
then "A"
is assigned to the variable grade
and the rest of the conditions are not tested.
Otherwise, we go to the second condition (score > 80
), and so on.
If all conditions do not evaluate to a True
value, the statement block
after else
will be executed.
The else
part is optional: It may not exist if there is no action needed
for “all of the other” cases.
Is It Even or Odd?
Print a message stating whether a variable named
val
contains an odd or even number.Solution
val = 3 if val % 2: print('odd') else: print('even')
The
%
operator gives the remainder of the division ofval
by 2: It is zero for even numbers, and one for odd numbers. Becauseval
is an odd number,val % 2
yields1
, thus the wordodd
will be printed.
Functions
Certain tasks are used frequently throughout a program.
One example is the conversion from a score to a grade,
as shown in the previous section.
A function is a block of subprogram (sequence of commands and statements)
which packaged as a unit, intended to accomplish a specified task.
Using functions helps programmers to write the code only once
and reuse it as frequently as needed.
In Python, a function is created using the def
statement,
as illustrated in the following snippet:
def message():
print("Python is a great language to learn.")
print("It is fun, easy to use, yet powerful at the same time.")
print("With persistent use and practice, you'll master Python.")
There is no output when you completed the def
statement above.
But you have just created a function called message
, which can be called
at any time afterward.
The subprogram block in the function’s body
will be executed whenever the function is called.
Let us call message
now:
message()
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.
We can also call the function multiple times:
message()
print()
print("LET'S SAY THAT AGAIN...")
message()
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.
LET'S SAY THAT AGAIN...
Python is a great language to learn.
It is fun, easy to use, yet powerful at the same time.
With persistent use and practice, you'll master Python.
What’s Different between Loops and Functions?
Function is similar to the
for
loop or theif-else
conditional in that they form a bigger, logical piece of a program. In particular, both function and loop enable a block of subprogram to be executed more than once. There is one important difference, though: A loop only repeats a block of subprogram in one particular location of the program. The block associated with a function, in contrast, can be executed at different locations in the program, i.e. where the function call takes place.
Parameters
A function can have one or more parameters. Inside the function body (block), it acts like regular variables, but their values are not specified within this body. Rather, the values are defined at the point the function is called.
Let us make make_grade
function which takes one parameter,
that is, the numerical score:
def make_grade(score):
if score > 90:
grade = "A"
elif score > 80:
grade = "B"
elif score > 70:
grade = "C"
elif score >= 60:
grade = "D"
else:
grade = "F"
return grade
This function also returns a value, which will need to be captured in a variable or printed. (Otherwise, the return value will be discarded.)
alice_grade = make_grade(89)
print("Alice's grade is", alice_grade)
print("Jason's grade is", make_grade(72))
Alice's grade is B
Jason's grade is C
A function can take multiple arguments, such as:
def add(a, b):
return a + b
A = 36
C = add(A, 2)
print(C)
Further, the arguments can be of different type.
The function below expects a real number for the scale
argument,
and a sequence (list
) for scores
:
def scale_score(scale, scores):
result = []
for s in scores:
result.append(scale * s)
return result
Documenting a Function
Once a function gets more complex, it is better to document the function. Python has a great way of doing this, using a triply-quoted strings (these are basically an ordinary string, yet it allows newlines):
def scale_score(scale, scores):
"""Scales student's scores by a scale factor.
Args:
scale (float): scale factor.
scores (list): a list of students' raw scores.
Returns:
list: a list of students' scaled scores.
"""
result = []
for s in scores:
result.append(scale * s)
return result
The document should state the following:
-
the purpose of the function;
-
the input argument(s);
-
the return value(s);
-
any other notes regarding the behavior of the function that users may need to be aware of.
This documentation can be queried in an interactive python session:
help(scale_score)
Help on function scale_score in module __main__:
scale_score(scale, scores)
Scales student's scores by a scale factor.
Args:
scale (float): scale factor.
scores (list): a list of students' raw scores.
Returns:
list: a list of students' scaled scores.
Documenting a function is another good coding practice which you want to foster in yourselves early on. This helps other people (among those are your future selves!) to better understand your code.
Let’s try our new function:
old_scores = [50, 70, 40, 85]
new_scores = scale_score(1.25, old_scores)
print(new_scores)
[62.5, 87.5, 50.0, 106.25]
Changing the Value of a Parameter in a Function?
(Note: This is an intermediate topic, so you can skip when reading the lesson for the first time.)
Values are passed by reference in Python. Technically, all variables are just reference to an object or a value residing somewhere in the Python interpreter’s memory. Changing the value of a parameter inside a function is not prohibited by Python, but it may not do what you want. Assigning a new value to a parameter (e.g. setting it to a new string, int, or list) would not change the original value existing in the caller’s scope. But manipulating a parameter (e.g., appending a new element to a list using the
append
method) would propagate the effect outside the caller. This is an intended design: in this way, functions can be used to manipulate objects. A completely new data should be returned as a function return value.
Library and Modules
A library is a collection of files (called modules) that contains functions for use by other programs. A module can be viewed as a toolbox that contains a lot of tools: These tools (hammer, screwdrivers, pliers, etc.) are analogous to the functions. This boolbox may be a part of a greater collection of tools for an auto mechanic. This mechanic may have another toolboxes: electrical toolbox, engine toolbox, etc. The entire collection of these toolboxes would be the library.
Libraries or Modules?
A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.
In Python, we need to import a module
to use the functions contained in this module.
This is done using the import
statement:
import MODULE_NAME
This will make the functions, variables, and other objects
in the MODULE_NAME
module accessible by the Python interpreter.
From this point on, you can access the contents of this module
(such as functions and variables) by prepending their names
with a MODULE_NAME.
prefix.
The name of the module also serves as a namespace
for the functions and variables provided by that module.
Let us consider a concrete example:
Python has a module of mathematical functions called math
,
which contains many mathematical functions and constants,
such as: the square root function (sqrt
),
exponentiation (exp
, pow
), logarithmic (log
, log2
, log10
),
trigonometric functions (sin
, cos
, tan
, asin
, acos
, …),
and many more.
(See the math
reference documentation
for more details.)
To calculate the square root of a number, use the sqrt
function contained in this module:
import math
a = 25
b = math.sqrt(a)
print(b)
print(sqrt(2))
5.0
1.4142135623730951
We can also use the name sqrt
without the math.
qualifier,
by importing the name directly into the current namespace.
This can also be done in Python:
from math import sqrt
from math import cos, pi
print(sqrt(81))
print(cos(pi))
9.0
-1.0
Hint: If you don’t invoke import math
beforehand,
then the name math
is not known to the interpreter,
but the sqrt
name will still be accessible.
Out of the box, Python already comes with a fairly complete library. We will call this the “core library”. Some notable modules in the core library include:
-
os
: operating-system related functions; -
sys
: Python/system related functions; -
math
: mathematical functions and constants; -
re
: regular expression search and operation; -
csv
: tools to read and write CSV (comma separated value) files; -
json
: tools to read/write data in JSON format; -
mailbox
: tools to read/write internet mailbox in MBOX format; -
socket
,ssl
,urllib
,http
, …: Network- and Internet-related functions; -
and many more!
(Clicking on the module name would lead you to the module’s reference documentation.) We recommend that you survey the Python Standard Library reference documentation to become familiar with the functionalities offered by Python core library.
Examples of Important Python Modules
A vast amount of capabilities in Python actually come from the libraries developed and maintained by many groups and communities throughout the world. In this sidebar we will survey a few important ones—those that have become every Python programmer’s essential toolboxes as well as those that may be relevant for cybersecurity applications.
NumPy and SciPy
NumPy (short for “Numerical Python”) and SciPy (“Scientific Python”) are packages designed for numerical computation. NumPy provides a powerful N-dimmentional array object, an assortment of routines for fast operations on arrays, such as mathematical, logical, shape manipulation, sorting, selecting, input/output, Fourier transforms, basic linear algebra, statistical operations, random number generators and more. SciPy contains modules for optimization, advanced linear algebra, integration, interpolation, special functions, Fourier transforms, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering.
- Numpy website: https://numpy.org/
- Scipy website: https://www.scipy.org/
Pandas
Pandas
stands for Python Data Analysis Library. It is designed to offer data strucures for manipulating numerical tables and time series. It is a powerful tool when dealing with large tables.
Pandas
website: https://pandas.pydata.org/
Matplotlib
andSeaborn
These are ploting libraries.
Matplotlib
is a Python 2D plotting library which produces figures in a variety of formats and interactive environments across platforms. You can useMatplotlib
to generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc.,Seaborn
is a Python data visualization library based onMatplotlib
. It provides a high-level interface for drawing attractive and informative statistical graphics.
Matplotlib
website: https://matplotlib.org/Seaborn
website: https://seaborn.pydata.org/
Scikit-learn
,Tensorflow
,Theano
,Keras
, andPytorch
Scikit-learn
,Tensorflow
,Theano
, andKeras
are libraries desinged for deep learning applications.Scikit-learn
provides methods for classification, regression, clustering, dimentionality reduction, model selection, a preprocessing.Tensorflow
andTheano
are low level neural network model development tools.Keras
is a high level package for neural network capable to run on top ofTensorflow
andTheano
.Pytorch
is packages desinged to replaceNumpy
in order to take advantage of the power of GPUs, it is also a platform for deep learning providing flexibility and speed.
Scikit-learn
website: https://scikit-learn.org/stable/Tensorflow
website : https://www.tensorflow.org/Keras
website: https://keras.io/Pytorch
website: https://pytorch.org/
NLTK
NLTK
stands for Natural Language Tool kit. It is a package designed for human language data. It provides interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
NLTK
website: https://www.nltk.org/
Pycrypto
Pycrypto
as you probably guessed it is a collection of tools for cryptography work. It provides various encription algorithms such as AES, DES, RSA to name a few.
Pycrypto
website: https://pypi.org/project/pycrypto/
scrapy
,beautifulsoup
, andselenium
If you are looking for web related packages then these three are what you need.
Scrapy
andBeautifulsoup
provide ways to extract data out of HTML content, meaning web pages.Selenium
on the other hand is a web browser automation tool. It is useful to write test scripts for web based applications.
Scrapy
website: https://scrapy.org/Beautifulsoup
website: https://www.crummy.com/software/BeautifulSoup/Selenium
website: https://www.seleniumhq.org/
Exercises
Exercise 1
Write a function that takes as input parameter a list of integer numbers and then prints two lists: the first list being the sublist of the input list containing only even numbers, the second list containing odd numbers only.
Exercise 2
Write a function that takes two numbers as parameters and returns the maximum of the two numbers.
Exercise 3
Write a function called fizz_buzz that takes a number.
- If the number is divisible by 3, it should return “Fizz”.
- If it is divisible by 5, it should return “Buzz”.
- If it is divisible by both 3 and 5, it should return “FizzBuzz”.
- Otherwise, it should return the same number.
Exercise 4
Write a function called show_stars(rows). If rows is 5, it should print the following:
*
**
***
****
*****
Hint: Think of addition on strings of characters.
Further Learning
- Python’s official documentation: https://docs.python.org/3/
*
Key Points
Python is a high-level, interpreted, general-purpose programming language.
Key data types are integers, floats (real numbers), and strings.
List and array are container data types to store a large amount of data.
The
for
statement is useful to repeat actions by looping over a list of values.The
if
–elif
–else
construct allows program to execute commands conditionally.There are vast number of libraries which makes Python a productive computing platform.