Introduction to NumPy¶
Introducing NumPy¶
NumPy is a library for Python designed for efficient scientific (numerical) computing. It is an essential library in Python that is used under the hood in many other modules. Here, we willl get a sense of a few things NumPy can do.
To start using the NumPy library we will need to
import
it.>>> import numpy as np
The
import library as
syntax can be used to give the library a different name in memory. Since we may want to use NumPy many time, shorteningnumpy
tonp
is helpful.A common NumPy task is to create your own arrays to make a variable that has a range from one value to another. If we wanted to calculate the
sin()
of a variablex
at 10 points from zero to 2 * pi, we could do the following.>>> x = np.linspace(0., 2 * np.pi, 10) >>> print(x) [ 0. 0.6981317 1.3962634 2.0943951 2.7925268 3.4906585 4.1887902 4.88692191 5.58505361 6.28318531] >>> y = np.sin(x) >>> print(y) [ 0.00000000e+00 6.42787610e-01 9.84807753e-01 8.66025404e-01 3.42020143e-01 -3.42020143e-01 -8.66025404e-01 -9.84807753e-01 -6.42787610e-01 -2.44929360e-16]
In this case,
x
starts at zero and goes to 2 * pi in 10 increments. Alternatively, if we wanted to specify the size of the increments for a new variablex2
, we could use thenp.arange()
function.>>> x2 = np.arange(0.0, 2 * np.pi, 0.5) >>> print(x2) [ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. ]
In this case,
x2
starts at zero and goes to the largest value that is smaller than 2 * pi by increments of 0.5. Both of these types of array options are useful in different situations.To calculate the sine function values, we can simply use the
np.sin()
function.>>> sine = np.sin(x) >>> print(sine) [ 0.00000000e+00 6.42787610e-01 9.84807753e-01 8.66025404e-01 3.42020143e-01 -3.42020143e-01 -8.66025404e-01 -9.84807753e-01 -6.42787610e-01 -2.44929360e-16]
Note that performing calculations on NumPy arrays will produce output in an array.
As before, we can check out the type of data in our arrays
x
andx2
using thetype()
function or the%whos
magic command in IPython.>>> type(x) numpy.ndarray
OK, so we have something new here. NumPy has its own data types that are part of the module. In this case, our data is stored in an NumPy n-dimensional array.
How much data do we have in our
x
variable?>>> print(x.shape) (10,)
10 rows of data, 1 column. In this case the single column value is suppressed.
shape
is a member or attribute ofx
, and is part of any NumPyndarray
. Printingx.shape
tells us the size of the array.We can also check the data type of our data columns by using
x.dtype
>>> print(x.dtype) float64
OK, so it seems that all the data in our file is float data type, i.e., decimal numbers (stored with a precision of 64 bytes).
Like lists, we can find any value in an array by using it’s indices. We can also extract parts of an array using index slicing. Perhaps we only want the first three values out of array
x
.>>> x[0:3] array([ 0. , 0.6981317, 1.3962634])
Nice! Note that in this case, the range of index values for the first 3 rows is 0-3. The data extracted will start at
0
and go up to, but not include3
.
Useful functions¶
Like normal variables, array variables can also be used for various mathematical operations.
>>> doublex = x * 2.0 >>> print(doublex) [ 0. 1.3962634 2.7925268 4.1887902 5.58505361 6.98131701 8.37758041 9.77384381 11.17010721 12.56637061]
In addition to the attributes we saw prevously for NumPy
ndarray
variables, there are also many methods that are part of thendarray
data type.>>> print(x.mean()) 3.14159265359 >>> print(doublex.mean()) 6.28318530718
No surprises here. If we think of variables as nouns, methods are verbs, actions for the variable values. NOTE: When using methods, you always include the parentheses
()
to be clear we are referring to a method and not an attribute. There are many other usefulndarray
methods, such asx.min()
,x.max()
, andx.std()
(standard deviation).Methods can also act on part of an array.
>>> print(x[0:5].mean()) 1.3962634016
Zeros and ones. It is pretty common that you will need to create arrays full of zeros or ones to store output from calculations. NumPy includes well-named functions for doing this.
>>> zeros = np.zeros(10) >>> ones = np.ones(10) >>> print(zeros) [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] >>> print(ones) [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
A word of caution, and the need for copies of arrays. Unlike many data types in Python, assigning an existing NumPy array to a new variable does not create a copy of the array, but rather simply creates pointer to the original array. Consider the example below.
>>> a = np.ones(10) >>> b = a >>> a += 4 >>> print(a) [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.] >>> print(b) [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
Oh no! Here, we can see that even after assigning the values of array
a
to arrayb
changes toa
will affectb
. This is because arrayb
is simply a reference toa
. But what if we want to save the values ofa
to another array without having them change whena
changes? For this we need to usenp.copy()
.>>> c = np.copy(a) >>> a += 3 >>> print(a) [ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.] >>> print(c) [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
np.copy()
creates a complete copy of the referenced array that is independent of its source. This is less efficient, so NumPy defaults to using pointers instead of making complete copies of arrays.
Exercise - Mean of the cosine
- Create a NumPy array
x3
with a range of -π to +π (inclusive) with 20 increments - Calculate the cosine of
x3
and store it ascosine
- What is the mean value of
cosine
? - Is this value what you expect?
- What happens if you use a larger number of increments for
x3
?