Welcome to the absolute beginner’s guide to NumPy! A comprehensive introduction to NumPy for complete beginners.
Welcome to the absolute beginner’s guide to NumPy!NumPy (Numerical Python) is an open source Python library that’s widely used in science and engineering. The NumPy library contains multidimensional array data structures, such as the homogeneous, N-dimensional ndarray, and a large library of functions that operate efficiently on these data structures.
After installing NumPy (see Installation), it should be imported into Python code like:
import numpy as np
This widespread convention allows access to NumPy features with a short, recognizable prefix (np.) while distinguishing NumPy features from others that have the same name.
Text preceded by >>> or ... is input, the code that you would enter in a script or at a Python prompt. Everything else is output, the results of running your code. Note that >>> and ... are not part of the code and may cause an error if entered at a Python prompt.
Python lists are excellent, general-purpose containers. They can be “heterogeneous”, meaning that they can contain elements of a variety of types, and they are quite fast when used to perform individual operations on a handful of elements.Depending on the characteristics of the data and the types of operations that need to be performed, other containers may be more appropriate. By exploiting these characteristics, we can improve speed, reduce memory consumption, and offer a high-level syntax for performing a variety of common processing tasks.NumPy shines when there are large quantities of “homogeneous” (same-type) data to be processed on the CPU.
Speed
Operations on NumPy arrays are executed in compiled C code, making them much faster than pure Python
Memory Efficiency
NumPy arrays use less memory than Python lists for large datasets
Convenient Syntax
Express complex operations concisely without explicit loops
Ecosystem Integration
NumPy is the foundation for pandas, scikit-learn, and most scientific Python libraries
In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data.
A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called ndarray: it represents an “N-dimensional array”.
Most NumPy arrays have some restrictions:
All elements of the array must be of the same type of data
Once created, the total size of the array can’t change
The shape must be “rectangular”, not “jagged”; e.g., each row of a two-dimensional array must have the same number of columns
When these conditions are met, NumPy exploits these characteristics to make the array faster, more memory efficient, and more convenient to use than less restrictive data structures.
print(a[:3]) # First three elements: [10 2 3]print(a[2:5]) # Elements 2, 3, 4: [3 4 5]print(a[-2:]) # Last two elements: [5 6]
Important Difference: Slice indexing of a list copies the elements into a new list, but slicing an array returns a view: an object that refers to the data in the original array. The original array can be mutated using the view:
b = a[3:]print(b) # [4 5 6]b[0] = 40print(a) # [10 2 3 40 5 6] - original array changed!
An element of a 2D array can be accessed by specifying the index along each axis within a single set of square brackets, separated by commas:
print(a[1, 3]) # Row 1, Column 3: 8
In NumPy, a dimension of an array is sometimes referred to as an “axis”. This terminology helps disambiguate between the dimensionality of an array and the dimensionality of the data represented by the array.
# Create an empty array (content is random)np.empty(5)# array([3.14, 42., 1.5, 2.8, 9.1]) # values will vary
The function empty creates an array whose initial content is random and depends on the state of the memory. Use it only when you plan to fill every element afterwards.
4
Range of Elements
# Create an array with a range of elementsnp.arange(5)# array([0, 1, 2, 3, 4])np.arange(2, 9, 2) # start, stop, step# array([2, 4, 6, 8])
NumPy understands that the operation should happen with each cell. This concept is called broadcasting and is a powerful feature for working with arrays of different shapes.
data = np.array([[1, 2], [5, 3], [4, 6]])print(data.max()) # 6 (max of all elements)print(data.max(axis=0)) # [5 6] (max of each column)print(data.max(axis=1)) # [2 5 6] (max of each row)
Random number generation is important for many numerical and machine learning algorithms:
rng = np.random.default_rng() # Create a random number generator# Random floats in [0, 1)print(rng.random(3))# [0.63696169 0.26978671 0.04097352]# 2D array of random floatsprint(rng.random((3, 2)))# [[0.01652764 0.81327024]# [0.91275558 0.60663578]# [0.72949656 0.54362499]]# Random integersprint(rng.integers(5, size=(2, 4))) # Integers from 0 to 4# [[2 1 1 0]# [0 0 0 4]]
# Save as CSVcsv_arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])np.savetxt('new_file.csv', csv_arr)# Load from CSVloaded = np.loadtxt('new_file.csv')print(loaded)# [1. 2. 3. 4. 5. 6. 7. 8.]
The .npy and .npz files are smaller and faster to read than text files. Use text files when you need human-readable output or interoperability with other tools.
The ease of implementing mathematical formulas is one of the things that make NumPy widely used in the scientific Python community.For example, the mean square error formula: