Introduction to NumPy¶
Introducing NumPy¶
NumPy is a library for Python designed for efficient scientific (numerical) computing. It is an essential library in Python that is used under the hood in many other modules. Here, we will get a sense of a few things NumPy can do.
1. To start using the NumPy module we will need to import
it.¶
In [1]:
import numpy as np
The import library as
syntax can be used to give the library a
different name in memory. Since we may want to use NumPy many time,
shortening numpy
to np
is helpful.
2. A common NumPy task is to create your own arrays to make a variable that has a range from one value to another.¶
A NumPy array is similar in concept to a Python list, but only contains
data of one type. If we wanted to calculate the sin()
of a variable
x
at 10 points from zero to 2 * pi, we could do the following.
In [2]:
x = np.linspace(0., 2 * np.pi, 10)
print(x)
[0. 0.6981317 1.3962634 2.0943951 2.7925268 3.4906585
4.1887902 4.88692191 5.58505361 6.28318531]
In this case, x
starts at zero and goes to 2 * pi in 10 increments.
Alternatively, if we wanted to specify the size of the increments for a
new variable x2
, we could use the np.arange()
function.
In [3]:
x2 = np.arange(0.0, 2 * np.pi, 0.5)
print(x2)
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. ]
In this case, x2
starts at zero and goes to the largest value that
is smaller than 2 * pi by increments of 0.5. Both of these types of
array options are useful in different situations.
3. To calculate the sine function values, we can simply use the np.sin()
function.¶
In [4]:
sine = np.sin(x)
print(sine)
[ 0.00000000e+00 6.42787610e-01 9.84807753e-01 8.66025404e-01
3.42020143e-01 -3.42020143e-01 -8.66025404e-01 -9.84807753e-01
-6.42787610e-01 -2.44929360e-16]
Note that performing calculations on NumPy arrays will produce output in an array.
4. As before, we can check out the type of data in our arrays x
and x2
using the type()
function.¶
In [5]:
type(x)
Out[5]:
numpy.ndarray
OK, so we have something new here. NumPy has its own data types that are part of the module. In this case, our data is stored in an NumPy n-dimensional array.
5. How much data do we have in our x
variable?¶
In [6]:
print(x.shape)
(10,)
10 rows of data, 1 column. In this case the single column value is
suppressed. shape
is a member or attribute of x
, and is part
of any NumPy ndarray
. Printing x.shape
tells us the size of the
array.
6. We can also check the data type of our data columns by using x.dtype
¶
In [7]:
print(x.dtype)
float64
OK, so it seems that all the data in our file is float data type, i.e., decimal numbers (stored with a precision of 64 bytes).
7. Like lists, we can find any value in an array by using it’s indices.¶
We can also extract parts of an array using index slicing. Perhaps we
only want the first three values out of array x
.
In [8]:
x[0:3]
Out[8]:
array([0. , 0.6981317, 1.3962634])
Nice! Note that in this case, the range of index values for the first 3
rows is 0-3. The data extracted will start at 0
and go up to, but
not include 3
.
Useful functions¶
1. Like normal variables, array variables can also be used for various mathematical operations.¶
In [9]:
doublex = x * 2.0
print(doublex)
[ 0. 1.3962634 2.7925268 4.1887902 5.58505361 6.98131701
8.37758041 9.77384381 11.17010721 12.56637061]
2. In addition to the attributes we saw prevously for NumPy ndarray
variables, there are also many methods that are part of the ndarray
data type.¶
In [10]:
print(x.mean())
print(doublex.mean())
3.141592653589793
6.283185307179586
No surprises here. If we think of variables as nouns, methods are
verbs, actions for the variable values. NOTE: When using methods,
you always include the parentheses ()
to be clear we are referring
to a method and not an attribute. There are many other useful
ndarray
methods, such as x.min()
, x.max()
, and x.std()
(standard deviation).
4. Zeros and ones.¶
It is pretty common that you will need to create arrays full of zeros or ones to store output from calculations. NumPy includes well-named functions for doing this.
In [12]:
zeros = np.zeros(10)
print(zeros)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
In [13]:
ones = np.ones(10)
print(ones)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
5. A word of caution, and the need for copies of arrays.¶
Unlike many data types in Python, assigning an existing NumPy array to a new variable does not create a copy of the array, but rather simply creates pointer to the original array. Consider the example below.
In [14]:
a = np.ones(10)
b = a
a += 4
print(a)
[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
In [15]:
print(b)
[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
Oh no! Here, we can see that even after assigning the values of array
a
to array b
changes to a
will affect b
. This is because
array b
is simply a reference to a
. But what if we want to save
the values of a
to another array without having them change when
a
changes? For this we need to use np.copy()
.
In [16]:
c = np.copy(a)
a += 3
print(a)
[8. 8. 8. 8. 8. 8. 8. 8. 8. 8.]
In [17]:
print(c)
[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
np.copy()
creates a complete copy of the referenced array that is
independent of its source. This is less efficient, so NumPy defaults to
using pointers instead of making complete copies of arrays.
Exercise - Mean of the cosine¶
- Create a NumPy array
x3
with a range of -π to +π (inclusive) with 20 increments - Calculate the cosine of
x3
and store it ascosine
- What is the mean value of
cosine
? - Is this value what you expect?
- What happens if you use a larger number of increments for
x3
?
In [18]:
# As was the case before, use this cell to complete the exercise