Script 3.1 for the video lectures accompanying the online course "An Introduction to Agent-Based Modelling in Python" by Claudius Gräbner

For the course homepage see:

Introduction and motivation

One of the most commonly used packages in Python is numpy. It stands for Numeric Python since it originates from the desire to use Python for numerical analysis.

Basically, the main features numpy provides are the following:

  • A new data type called array. It is well suited to represent vectors, and matrices.
  • A submodule that allows the creation and handling of random numbers
  • All kinds of mathematical routines in the area of numerical and stochastic modeling

All data types and functions provided by numpy are very efficient. This meant they require relatively few computational ressources and they are fast. Why is that? Remember that we mentioned Python to be a relatively slow language. That is true, but most demanding functions of numpy are not written in Python, but in C or Fortran. These are very demanding languages, which require much effort and time from programmers. Yet, they are extremely fast. Thereby, numpy manages to exploit their speed, but to maintain the ease of the Python syntax.

For us, numpy becomes important because many mathematical operations in the context of ABM involve:

  • The use of matrix algebra
  • The use of random processes

For both tasks, numpy provides us with everything we need, so we will use it extensively when building ABMs.

Before we get started we need to import numpy. By convention, this is done using the alias np:

In [2]:
import numpy as np

Introducing arrays

Arrays are the basic data type of numpy (this means: the basic class; many other class objects, such as matrices, inherit from the class ndarray):

In [2]:
a_1 = np.array([1,2])

In many ways, arrays behave similar to lists, yet there are also some important differences:

  1. Arrays can only contain elements that are of the same type
  2. Arrays are the only data type supported by more advanced mathematical functions
  3. Most operations on arrays are much faster than on lists

There are some more specific differences underlying this (such that arrays have constant memory size), and that explain the superiority of arrays over lists, yet we do not want to bother with them right now.

For you, point 1 above is probably the practically most relevant distinction for the moment.

Creating arrays

Lets see how we can create arrays. There are several options.

First, we can take an existing container, e.g. a list or a tuple and transform it into an array:

In [9]:
l_1 = [1,2,3,4]
t_1 = (5,6,7,8)

a_1 = np.array(l_1)
a_2 = np.array(t_1)
<class 'list'>
<class 'tuple'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

Note that building arrays from sets is a bit more problematic and requires more steps, which we do not cover here.

Indexing works for arrays just as for tuples and lists:

In [15]:
[1 2 3 4]

It is also straightforward to create multidimensional arrays, such as matrices.

For example, to create the following matrix:

$\boldsymbol{A}=\begin{pmatrix} 1 & 3 \\ 6 & 8 \end{pmatrix}$

Just type:

In [16]:
a_3 = np.array([[1,3], [6,8]])
[[1 3]
 [6 8]]

At this point it is important to look at the very precise syntax of np.array: it only takes one argument: a container. In case you want to specify a multidimensional array, this needs to be encolosed in brackets:

In [19]:
a_4 = np.array(([1,2], [3,4], [5,6]) )
a_5 = np.array([[1,2], [3,4], [5,6]] ) # This is equivalent
[[1 2]
 [3 4]
 [5 6]]
[[1 2]
 [3 4]
 [5 6]]

The following, however, will not work:

In [20]:
a_6 = np.array([1,2], [3,4], [5,6])
ValueError                                Traceback (most recent call last)
<ipython-input-20-0b1550b4b15d> in <module>()
----> 1 a_6 = np.array([1,2], [3,4], [5,6])

ValueError: only 2 non-keyword arguments accepted

The reasons why the brackets are needed is the following: np.array creates an array from the first object it receives.

If you want to create a matrix you must provide more than one containes (one for each row). So the two (or more containers) must be summarized into one arbitrary container before getting passsed to np.array as its first argument.

If there were no brackts, np.array would receive more than one argument, and it does not know how to handle this!

Aside from creating arrays from tuples, there are a number of convenience functions to create typical arrays.

If you want an array full of zeros you can use np.zeros with the desired shape:

In [5]:
print( np.zeros( (3,4) ) )  # for a 3 * 4 matrix

print("\n") # print new line

print( np.zeros(4) ) # for a 4-element vector
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

[0. 0. 0. 0.]

Exactly the same logic applies to ones:

In [7]:
print( np.ones( (3,4) ) )  # for a 3 * 4 matrix

print("\n") # print new line

print( np.ones( (4) )  ) # for a 4-element vector
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

[1. 1. 1. 1.]

If you want to get a sequence of numbers from $x_0$ to $x_n$, you can use the np.arrange function: it takes $x_0$ and $x_n$ as the first two arguments, and a third argument that specifies the step length. Note that $x_n$ will not be included by default:

In [67]:
np.arange(10, 30, 5 ) # returns a sequence from 10 to 30 in 5 steps, excluding 30
array([10, 15, 20, 25])

If you want to specify the number of elements in the array, and let Python determine the step length, use np.linspace instead:

In [68]:
np.linspace(5, 12, 11 ) # returns an evenly-spaced array with 11 elements from 5 to 12
array([ 5. ,  5.7,  6.4,  7.1,  7.8,  8.5,  9.2,  9.9, 10.6, 11.3, 12. ])

These functions are useful when you want to evaluate functions at many steps:

In [69]:
x = np.linspace(-5, 5, 10)
y = np.sin(x)
array([ 0.95892427,  0.67965796, -0.35584199, -0.99540796, -0.52741539,
        0.52741539,  0.99540796,  0.35584199, -0.67965796, -0.95892427])

Note that np.linspace is recommended when using floats.

Array attributes

We learned about attributes in the previous sessions. We know that there are two main types of attributes:

  1. Attribute properties
  2. Attribute functions

Typical properties of an array are, among others:

In [70]:
a = np.array(([1,2], [3,4]))
(2, 2)

Thus, the shape gives you rows and columns of the array. Since a in the example corresponds to the following matrix:

$\boldsymbol{A}=\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$

it is of shape $2\times2$.

Similar, the ndim attribute gives us the dimension of the array. For the case of a this is obviously 2:

In [22]:

To get the number of elements in the array we can check its size:

In [23]:

Typical attribute functions, i.e. methods of the array instantiations, are among others, the sum of the columns:

In [24]:
array([4, 6])

or the sum of the rows:

In [25]:
array([3, 7])

Similarly, we can get the minimum of the rows and colums:

In [27]:
a.min(axis=1) # for the rows  
array([1, 3])
In [28]:
a.min(axis=0) # for the columns  
array([1, 2])

Finally, the cumulative sum of an array is given by the following:

In [29]:
a.cumsum(axis=0) # for the columns
array([[1, 2],
       [4, 6]])
In [30]:
a.cumsum(axis=1) # for the rows
array([[1, 3],
       [3, 7]])

I leave it to you to figure out what the cumulative sum is;)

We will learn more attributes in the next section.

Common mathematical operations

We now introduce those mathematical operations that we are concerned with most of the time. Many of them will be familiar to you from the 'Einführung in Wirtschaftsmathematik', but here we learn about some very simple tricks to do the calculations in no time!

Transposing matrices

Consider a matrix $\boldsymbol{A}=\begin{pmatrix} a_{11} a_{12} a_{13}\\ a_{21} a_{22} a_{23}\end{pmatrix}$. We can the directly get the shape of the matrix:

In [21]:
A = np.array([[1,2,3], [4,5,6]])
(2, 3)

Often we would like to transpose the matrix: if $\boldsymbol{A}$ is a $2\times 3$ matrix, then its transpose $\boldsymbol{A}^T$ is a $3\times 2$ matrix.

In other words, if

$$\boldsymbol{A}=\begin{pmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\end{pmatrix},$$

then (abusing matrix notation a bit)

$$\boldsymbol{A}^T=\begin{pmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \\ a_{13} & a_{23}\end{pmatrix},$$
In [22]:
A = np.array([[1,2,3], [4,5,6]])
A_t = A.T
(2, 3)
(3, 2)
array([[1, 4],
       [2, 5],
       [3, 6]])

Addition and substraction

Scalar addition/substraction functions like this:

$$\boldsymbol{A} - 5 = \begin{pmatrix} a_{11}-5 & a_{12}-5 & a_{13}-5\\ a_{21}-5 & a_{22}-5 & a_{23}-5\end{pmatrix}$$

It works for all kind of matrices and is really simple to compute:

In [23]:
A - 5
array([[-4, -3, -2],
       [-1,  0,  1]])

If you want to add or substract matrices you need to make sure they are of the same dimension. Then for $\boldsymbol{A}=\begin{pmatrix} a_{11} a_{12}\\ a_{21} a_{22}\end{pmatrix}$ and $\boldsymbol{B}=\begin{pmatrix} b_{11} b_{12}\\ b_{21} b_{22}\end{pmatrix}$ we get the sum (or difference) by:

$$\boldsymbol{A} + \boldsymbol{B} = \begin{pmatrix} a_{11}+b_{11} & a_{12}+b_{12}\\ a_{21}+b_{21} & a_{22}+b_{22}\end{pmatrix}$$
In [24]:
A = np.array([[1,2], [3,4]])
B = np.array([[5,6], [7,8]])
A + B
array([[ 6,  8],
       [10, 12]])


Suppose you want to multiply a matrix with some scalar $b$. If $\boldsymbol{A}=\begin{pmatrix} a_{11} a_{12}\\ a_{21} a_{22}\end{pmatrix}$ then:

$$b\boldsymbol{A}=\begin{pmatrix} ba_{11} & ba_{12}\\ ba_{21} & ba_{22}\end{pmatrix}$$
In [25]:
A = np.array([[1,2], [3,4]])
b = 2
array([[2, 4],
       [6, 8]])

For matrix multiplication to be defined, the number of columns in the first operand must equal the number of rows of the second operand.

This is because matrix multiplication is defined as follows: if If $\boldsymbol{A}=\begin{pmatrix} a_{11} a_{12}\\ a_{21} a_{22}\end{pmatrix}$ (i.e. a $2\times 2$ matrix) and $\boldsymbol{b}=\begin{pmatrix} b_{1} b_{2}\end{pmatrix}$ (i.e. a $1\times 2$ vector):

$$\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix} \begin{pmatrix} b_{1} & b_{2}\end{pmatrix} = \begin{pmatrix} a_{11} b_{1} + a_{12} b_2\\ a_{21} b_{1} + a_{22} b_2 \end{pmatrix} = \begin{pmatrix} c_{1} & c_{2}\end{pmatrix}$$

or, in other words, if $\boldsymbol{A}\boldsymbol{b}=\boldsymbol{C}$, and if $\boldsymbol{A}$ is $n\times m$ and $\boldsymbol{b}$ is $m\times p$, then $\boldsymbol{C}$ will be $n\times p$ with:

$$c_{ij}=a_{i1}b_{1j} + ... + a_{im}b_{mj} = \sum_{k=1}^m a_{ik}b_{kj}.$$

There are several ways to do this computationally:

In [26]:
A = np.array([[1,2], [3,4]])
B = np.array([4,5])

print(, B))
[14 32]
[14 32]
[14 32]

Many ABM somehow relate to game theoretic concepts, where matrix multiplication is convenient to compute payoffs (but we will learn about this using examples later on).

Matrix inverse and solution to linear systems of equations

The inverse of a square matrix $\boldsymbol{A}$ is $\boldsymbol{A}^{-1}$ such that


where $\boldsymbol{I}$ is the identity matrix, i.e. the matrix consisting of ones on its main diagonal, and zeros outside the diagonal:

In [27]:
import scipy.linalg
A = np.array([[1,2], [3,4]])

[[-2.   1. ]
 [ 1.5 -0.5]]

[[1.00000000e+00 0.00000000e+00]
 [1.11022302e-16 1.00000000e+00]]

A standard problem in linear algebra is to solve the equation:

$$\boldsymbol{A}x = b$$

for which the key challenge is to find the inverse of $\boldsymbol{A}$.

Aside from computing $\boldsymbol{A}^{-1}$ as described above, this problem can be solved directly:

In [28]:
A = np.array([[1,2], [3,4]])
b = np.array([5,6])
x = scipy.linalg.solve(A,b)

You can verify the solution:

In [29]:
[5 6]
[5. 6.]

There are many more things to be done with matrices and once you understood the basics of Python, all these are straightforward to implement. The help page of numpy provides a nice overview:

For those of you interested I also upload an optional script that explains you how to do the following:

  • Finding roots for equations
  • Compute fixed points
  • Solve constrained and unconstrained optimization problems

Here we skip this because it is not immediately needed for building ABMs - but it might come in handy in some more mathematical courses during your studies.

The numpy random submodule

The second major application of numpy is the creation and manipulation of random numbers. There is much functionality provided by the numpy submodule numpy.random.

Here we introduce the most important concepts frequently used when using ABM. For a complete documentation of all objects, classes and functions provided by numpy.random see here.

Common random number tasks

If simply want to obtain a few random integers, where each integer appears with equal probability we can use the randint function:

In [59]:
np.random.randint(10, 20, 5) # Creates five random integers in the range of 10 and 20
array([10, 16, 12, 17, 15])

To get random floats between zero and one, we can use the function random:

In [55]:
size_of_sample = 10
array([0.66116787, 0.04909713, 0.7922993 , 0.51871659, 0.42586769,
       0.78818717, 0.41156922, 0.48102628, 0.18162884, 0.3213189 ])

Since we can modify the resulting floats, this is a very commonly used method. It is easy to combine it with if/else statements, e.g. to simulate the toss of a fair coin we could do:

In [56]:
if np.random.random(1) > 0.5:

We might also choose some elements of a list randomly. If all elemens shall be chosen with the same probability:

In [63]:
x = np.arange(1, 5, 1)
np.random.choice(x, 2) # choose two elements out of x
[1 2 3 4]
array([1, 3])

The probabilities can also be specified using a vector of probabilities:

In [66]:
x_probs = [0.1, 0.25, 0.35, 0.3]
np.random.choice(x, 2, p=x_probs)
array([3, 2])

There are several variants, such as choosing with or without replacement. Its best to have a look at the documentation of the function using help(np.random.choice).

Another common task is to shuffle the elements of a list of an array. The function shuffle does this (and thereby modifies the container):

In [44]:
x = np.arange(1, 10)
[1 2 3 4 5 6 7 8 9]
array([8, 9, 3, 6, 4, 5, 7, 1, 2])

If you do not want to change the original container, but return a new object, you can use the function permutation (which, since it does not modifies the container, also works with tuples):

In [49]:
x = np.arange(1, 10)
[1 2 3 4 5 6 7 8 9]
array([5, 8, 1, 4, 3, 2, 6, 9, 7])
In [47]:
print(x) # x has not been changed
[1 2 3 4 5 6 7 8 9]

Common distributions

Often we want to draw random numbers from a particular distribution. There are many distributions pre-defined in numpy, and they are all used following the same syntax:

To draw 100 random numbers from the Normal distribution with $\mu=1$ and $\sigma=2$:

In [38]:
np.random.normal(1, 2, 20)
array([-1.09822121,  0.07033124,  0.01888023,  3.00149139,  5.29702838,
        3.86481851,  0.73329333,  1.01154811, -0.33524162, -0.88123783,
        2.86161962,  1.09268534,  3.67347448,  1.56052056,  3.99326093,
       -0.36830327,  1.670602  , -1.25711053,  0.73254576, -0.57007851])

Similarly, we can draw 10 random numbers from the Poisson distribution with $\lambda=2$:

In [39]:
np.random.poisson(2, 10)
array([1, 1, 3, 2, 2, 2, 1, 4, 2, 4])

The basic principle is always the same. See the official documentation for a list of all available distributions.

A remark on Pseudo random numbers

Numpy uses the Mersenne Twister pseudo-random number generator to generate random numbers. The results are not actual random numbers since the process is deterministic, but for basically all purposes they simulate true random processes sufficiently closely. For example, you must generate $2^{19937}-1$ random numbers before the sequence starts repeating itself...

While the process as such is super complicated, think of it as a linear function that starts from some initial conditions and generates numbers that are hard to predict. The process is deterministic in the sense that for the same initial conditions, the same sequence of pseudo random numbers gets created.

This actually comes with a huge advantage: by setting the initial conditions for the Mersenne Twister, we can make sure that two random processes yield the same result:

In [31]:
array([0.88595682, 0.75826991, 0.3651358 , 0.52368633, 0.47492774])
In [32]:
array([0.61199834, 0.20761612, 0.8518246 , 0.61184611, 0.96438283])

The two results are obviously different...

But if we set the initial conditions (the 'seed') explicitly...

In [33]:
array([0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897])
In [34]:
array([0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897])

...we get exactly the same result!

This is useful when we want to verify ABM: if we run the model twice and the results differ we do not know whether the difference is due to different realizations of random processes, or due to a design change we made. By using the same seed for both runs we are able to isolate the effect of design changes.

By the way, 'real' random numbers is hard. An attempt is made by the quantumrandom project:

It interacts with the ANU Quantum Random Numbers Server which generates random numbers in real-time by measuring the quantum fluctuations of the vacuum.

In [ ]: