This page was generated from source/notebooks/day-1/NumPy.ipynb. Interactive online version: Binder badge

Introduction to NumPy

Introducing NumPy

NumPy is a library for Python designed for efficient scientific (numerical) computing. It is an essential library in Python that is used under the hood in many other modules. Here, we will get a sense of a few things NumPy can do.

1. To start using the NumPy module we will need to import it.

In [1]:
import numpy as np

The import library as syntax can be used to give the library a different name in memory. Since we may want to use NumPy many time, shortening numpy to np is helpful.

2. A common NumPy task is to create your own arrays to make a variable that has a range from one value to another.

A NumPy array is similar in concept to a Python list, but only contains data of one type. If we wanted to calculate the sin() of a variable x at 10 points from zero to 2 * pi, we could do the following.

In [2]:
x = np.linspace(0., 2 * np.pi, 10)
print(x)
[0.         0.6981317  1.3962634  2.0943951  2.7925268  3.4906585
 4.1887902  4.88692191 5.58505361 6.28318531]

In this case, x starts at zero and goes to 2 * pi in 10 increments. Alternatively, if we wanted to specify the size of the increments for a new variable x2, we could use the np.arange() function.

In [3]:
x2 = np.arange(0.0, 2 * np.pi, 0.5)
print(x2)
[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6. ]

In this case, x2 starts at zero and goes to the largest value that is smaller than 2 * pi by increments of 0.5. Both of these types of array options are useful in different situations.

3. To calculate the sine function values, we can simply use the np.sin() function.

In [4]:
sine = np.sin(x)
print(sine)
[ 0.00000000e+00  6.42787610e-01  9.84807753e-01  8.66025404e-01
  3.42020143e-01 -3.42020143e-01 -8.66025404e-01 -9.84807753e-01
 -6.42787610e-01 -2.44929360e-16]

Note that performing calculations on NumPy arrays will produce output in an array.

4. As before, we can check out the type of data in our arrays x and x2 using the type() function.

In [5]:
type(x)
Out[5]:
numpy.ndarray

OK, so we have something new here. NumPy has its own data types that are part of the module. In this case, our data is stored in an NumPy n-dimensional array.

5. How much data do we have in our x variable?

In [6]:
print(x.shape)
(10,)

10 rows of data, 1 column. In this case the single column value is suppressed. shape is a member or attribute of x, and is part of any NumPy ndarray. Printing x.shape tells us the size of the array.

6. We can also check the data type of our data columns by using x.dtype

In [7]:
print(x.dtype)
float64

OK, so it seems that all the data in our file is float data type, i.e., decimal numbers (stored with a precision of 64 bytes).

7. Like lists, we can find any value in an array by using it’s indices.

We can also extract parts of an array using index slicing. Perhaps we only want the first three values out of array x.

In [8]:
x[0:3]
Out[8]:
array([0.       , 0.6981317, 1.3962634])

Nice! Note that in this case, the range of index values for the first 3 rows is 0-3. The data extracted will start at 0 and go up to, but not include 3.

Useful functions

1. Like normal variables, array variables can also be used for various mathematical operations.

In [9]:
doublex = x * 2.0
print(doublex)
[ 0.          1.3962634   2.7925268   4.1887902   5.58505361  6.98131701
  8.37758041  9.77384381 11.17010721 12.56637061]

2. In addition to the attributes we saw prevously for NumPy ndarray variables, there are also many methods that are part of the ndarray data type.

In [10]:
print(x.mean())
print(doublex.mean())
3.141592653589793
6.283185307179586

No surprises here. If we think of variables as nouns, methods are verbs, actions for the variable values. NOTE: When using methods, you always include the parentheses () to be clear we are referring to a method and not an attribute. There are many other useful ndarray methods, such as x.min(), x.max(), and x.std() (standard deviation).

3. Methods can also act on part of an array.

In [11]:
print(x[0:5].mean())
1.3962634015954634

4. Zeros and ones.

It is pretty common that you will need to create arrays full of zeros or ones to store output from calculations. NumPy includes well-named functions for doing this.

In [12]:
zeros = np.zeros(10)
print(zeros)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
In [13]:
ones = np.ones(10)
print(ones)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

5. A word of caution, and the need for copies of arrays.

Unlike many data types in Python, assigning an existing NumPy array to a new variable does not create a copy of the array, but rather simply creates pointer to the original array. Consider the example below.

In [14]:
a = np.ones(10)
b = a
a += 4
print(a)
[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
In [15]:
print(b)
[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]

Oh no! Here, we can see that even after assigning the values of array a to array b changes to a will affect b. This is because array b is simply a reference to a. But what if we want to save the values of a to another array without having them change when a changes? For this we need to use np.copy().

In [16]:
c = np.copy(a)
a += 3
print(a)
[8. 8. 8. 8. 8. 8. 8. 8. 8. 8.]
In [17]:
print(c)
[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]

np.copy() creates a complete copy of the referenced array that is independent of its source. This is less efficient, so NumPy defaults to using pointers instead of making complete copies of arrays.

Exercise - Mean of the cosine

  • Create a NumPy array x3 with a range of -π to +π (inclusive) with 20 increments
  • Calculate the cosine of x3 and store it as cosine
  • What is the mean value of cosine?
  • Is this value what you expect?
  • What happens if you use a larger number of increments for x3?
In [18]:
# As was the case before, use this cell to complete the exercise