Skip to main content

Understanding NumPy Arrays

The ndarray (N-dimensional array) is the fundamental data structure in NumPy. It provides a powerful, efficient way to work with homogeneous multidimensional data.

What is an ndarray?

An ndarray is a multidimensional container for items of the same type and size. The number of dimensions and items in an array is defined by its shape, which is a tuple of N non-negative integers that specify the sizes of each dimension.
import numpy as np

# Create a 1-D array
a = np.array([1, 2, 3, 4, 5, 6])
print(a)
# Output: array([1, 2, 3, 4, 5, 6])

# Create a 2-D array
b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(b.shape)
# Output: (2, 3)

Key Characteristics

NumPy arrays have several important restrictions:
  • All elements must be of the same data type
  • Once created, the total size cannot change
  • The shape must be “rectangular”, not “jagged”
These constraints allow NumPy to optimize memory usage and computation speed significantly compared to Python lists.

Array Attributes

Every ndarray has several important attributes that describe its structure:
AttributeDescription
ndarray.ndimNumber of dimensions (axes)
ndarray.shapeTuple indicating size of each dimension
ndarray.sizeTotal number of elements
ndarray.dtypeData type of elements
ndarray.itemsizeSize in bytes of each element
ndarray.dataBuffer containing actual array elements
import numpy as np

x = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])

print(f"Dimensions: {x.ndim}")
# Output: Dimensions: 3

print(f"Shape: {x.shape}")
# Output: Shape: (3, 4)

print(f"Size: {x.size}")
# Output: Size: 12

print(f"Data type: {x.dtype}")
# Output: Data type: int64

print(f"Item size: {x.itemsize} bytes")
# Output: Item size: 8 bytes

Creating Arrays

From Python Sequences

The most straightforward way to create an array is from a Python list or tuple:
import numpy as np

# From a list
a = np.array([1, 2, 3, 4, 5])

# From nested lists (2-D)
b = np.array([[1, 2], [3, 4], [5, 6]])

# Explicitly specify the data type
c = np.array([1, 2, 3], dtype=np.float64)
print(c)
# Output: array([1., 2., 3.])

Using Built-in Functions

NumPy provides many functions for creating arrays:
import numpy as np

# Create array of zeros
zeros = np.zeros((3, 4))

# Create array of ones
ones = np.ones((2, 3, 4))

# Create uninitialized array (faster, but contains garbage values)
empty = np.empty((2, 2))

# Create array with a range of values
arange = np.arange(0, 10, 2)  # Start, stop, step
# Output: array([0, 2, 4, 6, 8])

# Create array with evenly spaced values
linspace = np.linspace(0, 1, 5)  # Start, stop, num_points
# Output: array([0.  , 0.25, 0.5 , 0.75, 1.  ])

# Create identity matrix
identity = np.eye(3)
# Output:
# array([[1., 0., 0.],
#        [0., 1., 0.],
#        [0., 0., 1.]])

Array Dimensions Explained

Think of array dimensions as nested containers:
  • 1-D array: A list of values
  • 2-D array: A table (rows and columns)
  • 3-D array: A stack of tables
  • N-D array: Even more nested structures

Visualizing Dimensions

1-D Array (vector):
a = np.array([1, 5, 2, 0])
# Shape: (4,)
[1, 5, 2, 0]
2-D Array (matrix):
b = np.array([[1, 5, 2, 0],
              [8, 3, 6, 1],
              [1, 7, 2, 9]])
# Shape: (3, 4)
┌─────────────┐
│ 1  5  2  0 │
│ 8  3  6  1 │
│ 1  7  2  9 │
└─────────────┘
3-D Array (stack of matrices):
c = np.array([[[1, 2], [3, 4]],
              [[5, 6], [7, 8]]])
# Shape: (2, 2, 2)

Reshaping Arrays

You can change the shape of an array without changing its data:
import numpy as np

a = np.arange(12)
print(a)
# Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

# Reshape to 2-D
b = a.reshape(3, 4)
print(b)
# Output:
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11]])

# Reshape to 3-D
c = a.reshape(2, 3, 2)
print(c.shape)
# Output: (2, 3, 2)
The new shape must be compatible with the original shape. The total number of elements must remain the same.For example, you cannot reshape an array of 12 elements into shape (3, 5) because 3 × 5 = 15 ≠ 12.

Using -1 for Automatic Dimension Calculation

import numpy as np

a = np.arange(12)

# NumPy automatically calculates the missing dimension
b = a.reshape(3, -1)  # Becomes (3, 4)
c = a.reshape(-1, 2)  # Becomes (6, 2)

print(b.shape)  # Output: (3, 4)
print(c.shape)  # Output: (6, 2)

Views vs Copies

Understanding the difference between views and copies is crucial for efficient NumPy programming.

Views (Shallow Copy)

A view is a new array object that looks at the same data. Modifying the view modifies the original array:
import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = a[1:4]  # Slicing creates a view

print(b)  # Output: array([2, 3, 4])

b[0] = 99
print(a)  # Output: array([1, 99, 3, 4, 5])
Basic slicing always creates views, not copies. This is different from Python lists!

Copies (Deep Copy)

A copy is a new array with a copy of the data:
import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = a.copy()  # Explicit copy

b[0] = 99
print(a)  # Output: array([1, 2, 3, 4, 5]) - unchanged!
print(b)  # Output: array([99, 2, 3, 4, 5])

When to Use Copy

Always use .copy() when:
  • Extracting a small portion from a large array that you no longer need
  • You want to modify data without affecting the original
  • Working with array subsets that need independent lifecycles

Memory Layout

NumPy arrays use C-order (row-major) indexing by default:
import numpy as np

a = np.array([[1, 2, 3],
              [4, 5, 6]])

print(a.flags['C_CONTIGUOUS'])  # Output: True

# Elements are stored in memory as: [1, 2, 3, 4, 5, 6]
C-order vs Fortran-order:
  • C-order (row-major): Last index changes fastest. Default in NumPy.
  • Fortran-order (column-major): First index changes fastest. Used in Fortran and MATLAB.
You can specify the order when creating arrays:
a = np.array([[1, 2], [3, 4]], order='F')  # Fortran order

Flattening Arrays

Convert multidimensional arrays to 1-D:
import numpy as np

a = np.array([[1, 2, 3],
              [4, 5, 6]])

# Flatten (returns a copy)
flat = a.flatten()
print(flat)
# Output: array([1, 2, 3, 4, 5, 6])

# Ravel (returns a view if possible)
ravel = a.ravel()
print(ravel)
# Output: array([1, 2, 3, 4, 5, 6])

Practical Example: Image Data

Arrays are commonly used to represent images:
import numpy as np

# Create a simple 4x4 grayscale image (values 0-255)
image = np.array([[  0,  64, 128, 192],
                  [ 32,  96, 160, 224],
                  [ 64, 128, 192, 255],
                  [ 96, 160, 224, 255]], dtype=np.uint8)

print(f"Image shape: {image.shape}")  # (4, 4)
print(f"Data type: {image.dtype}")    # uint8

# RGB image would be 3-D: height × width × channels
rgb_image = np.zeros((100, 100, 3), dtype=np.uint8)
print(f"RGB image shape: {rgb_image.shape}")  # (100, 100, 3)

Next Steps

Now that you understand arrays, explore:
  • Data Types: Learn about NumPy’s rich type system
  • Indexing: Master array selection and slicing
  • Broadcasting: Understand how operations work on different shapes

Performance Tips

Best Practices:
  1. Pre-allocate arrays when possible (use zeros, ones, or empty)
  2. Use vectorized operations instead of loops
  3. Be mindful of views vs copies to avoid unnecessary memory usage
  4. Use appropriate data types (don’t use float64 when float32 suffices)
  5. Keep memory layout in mind for cache efficiency

Build docs developers (and LLMs) love