Testing NumPy

Overview

NumPy uses pytest as its testing framework. Every module and package should have thorough unit tests that exercise full functionality and handle edge cases.

Why Testing Matters

Well-designed tests with good coverage:

Catch bugs before they reach users
Make refactoring safer
Document expected behavior
Enable confident code changes

Running Tests

Using spin (Recommended)

The spin CLI is the easiest way to run tests:

# Run all tests
spin test -v

# Run specific module
spin test numpy/random

# Run specific file
spin test -t numpy/core/tests/test_multiarray.py

# Run specific test
spin test -t numpy/core/tests/test_multiarray.py::test_array_empty_like

# Run tests in parallel
spin test -p auto

The first spin test run will build NumPy if needed. Subsequent runs are faster.

Using pytest Directly

After building NumPy:

# From outside NumPy directory
pytest --pyargs numpy

# Specific module
pytest --pyargs numpy.random

# With verbose output
pytest --pyargs numpy -v

Don’t run pytest from the NumPy source directory - this can cause import errors. Use spin test or run from a different directory.

From Python

import numpy as np

# Run all tests
np.test()

# Run with verbose output
np.test(verbose=2)

# Include slow tests
np.test(label='full')

# Test specific module
np.random.test()

Test Organization

NumPy’s test structure:

numpy/
├── core/
│   ├── src/
│   ├── tests/
│   │   ├── test_multiarray.py
│   │   ├── test_numeric.py
│   │   └── test_*.py
│   └── __init__.py
├── random/
│   ├── tests/
│   │   ├── test_random.py
│   │   └── test_*.py
│   └── __init__.py
└── ...

Test File Convention

Tests live in tests/ subdirectories
Test files named test_<module>.py
Test functions named test_<description>()
Test classes named Test<Description>

Writing Tests

Basic Test Structure

import numpy as np
import pytest
from numpy.testing import assert_equal, assert_allclose

class TestMyFeature:
    def test_basic_functionality(self):
        """Test the basic case."""
        result = np.myfunction([1, 2, 3])
        expected = np.array([2, 4, 6])
        assert_equal(result, expected)
    
    def test_edge_cases(self):
        """Test boundary conditions."""
        # Empty input
        result = np.myfunction([])
        assert_equal(result, [])
        
        # Single element
        result = np.myfunction([1])
        assert_equal(result, [2])
    
    def test_error_handling(self):
        """Test that appropriate errors are raised."""
        with pytest.raises(ValueError, match="must be positive"):
            np.myfunction([-1])

Don’t add docstrings to test methods - they make test output harder to read. Use comments instead.

NumPy Testing Utilities

Comparing Arrays

from numpy.testing import (
    assert_equal,           # Exact equality
    assert_allclose,        # Close enough (with tolerance)
    assert_array_equal,     # Array equality
    assert_array_less,      # Element-wise less than
)

# Exact comparison
assert_equal(result, expected)

# Floating point comparison
assert_allclose(result, expected, rtol=1e-7, atol=1e-9)

# Check array properties too
assert_allclose(result, expected, strict=True)  # Also checks dtype and shape

# Element-wise less than
assert_array_less(a, b)  # All a[i] < b[i]

Testing Exceptions

# Modern pytest style (preferred)
with pytest.raises(ValueError, match="invalid value"):
    np.myfunction(bad_input)

# Legacy style (avoid for new tests)
from numpy.testing import assert_raises
assert_raises(ValueError, np.myfunction, bad_input)

Always use the match parameter with pytest.raises to verify the error message. This catches wrong exceptions with similar types.

Testing Warnings

# Modern pytest style (preferred)
with pytest.warns(DeprecationWarning, match="deprecated"):
    result = np.old_function()

# Legacy style (avoid for new tests)
from numpy.testing import assert_warns
assert_warns(DeprecationWarning, np.old_function)

Parametric Tests

Test multiple cases efficiently:

@pytest.mark.parametrize('dtype', [np.float32, np.float64])
@pytest.mark.parametrize('size', [3, 10, 100])
def test_solve(dtype, size):
    """Test solve for different sizes and dtypes."""
    rng = np.random.RandomState(42)
    A = rng.random((size, size)).astype(dtype)
    b = rng.random(size).astype(dtype)
    
    x = np.linalg.solve(A, b)
    
    # Check solution
    eps = np.finfo(dtype).eps
    assert_allclose(A @ x, b, rtol=eps*1e2, atol=0)
    assert x.dtype == dtype

This creates 6 test cases (2 dtypes × 3 sizes).

Random Test Data

Always seed random generators for reproducibility:

def test_with_random_data():
    """Test with seeded random data."""
    rng = np.random.RandomState(12345)  # Fixed seed
    data = rng.random(100)
    
    result = np.myfunction(data)
    assert result.shape == (100,)

Never use non-deterministic random data in tests. A test that passes sometimes and fails other times is worse than no test.

Using Hypothesis

For more sophisticated random testing:

import hypothesis
from hypothesis import given
from hypothesis import strategies as st
from hypothesis.extra import numpy as npst

@given(arr=npst.arrays(dtype=np.float64, shape=st.integers(1, 100)))
def test_sum_properties(arr):
    """Test properties that should hold for any input."""
    result = np.sum(arr)
    
    # Sum should be finite for finite inputs
    if np.all(np.isfinite(arr)):
        assert np.isfinite(result)
    
    # Sum of positive numbers is positive
    if np.all(arr >= 0):
        assert result >= 0

Setup and Teardown

For tests that need setup/cleanup:

class TestWithSetup:
    def setup_method(self):
        """Called before each test method."""
        self.data = np.random.RandomState(42).random(100)
        self.temp_file = tempfile.mktemp()
    
    def teardown_method(self):
        """Called after each test method."""
        if os.path.exists(self.temp_file):
            os.remove(self.temp_file)
    
    def test_something(self):
        """Test using setup data."""
        result = np.myfunction(self.data)
        assert len(result) == 100

For thread-safe tests, call setup/teardown explicitly instead of using fixtures. See thread safety section.

Testing C Extensions

For testing NumPy’s C API:

from numpy.testing import extbuild

def test_c_extension():
    """Test custom C extension."""
    c_code = '''
    #include <Python.h>
    #include <numpy/arrayobject.h>
    
    static PyObject* my_function(PyObject* self, PyObject* args) {
        PyArrayObject* input;
        if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &input)) {
            return NULL;
        }
        // ... implementation ...
        Py_RETURN_NONE;
    }
    
    static PyMethodDef methods[] = {
        {"my_function", my_function, METH_VARARGS, ""},
        {NULL, NULL, 0, NULL}
    };
    '''
    
    module = extbuild.build_and_import_extension(
        'test_module',
        c_code,
        include_dirs=[np.get_include()]
    )
    
    arr = np.array([1, 2, 3])
    module.my_function(arr)

Test Markers

Slow Tests

Mark time-consuming tests:

@pytest.mark.slow
def test_expensive_operation():
    """This test takes a long time."""
    large_array = np.random.random((10000, 10000))
    result = np.linalg.svd(large_array)

Run slow tests:

# Skip slow tests (default)
spin test

# Include slow tests
spin test -m full

Skip and XFail

# Skip test unconditionally
@pytest.mark.skip(reason="Not implemented yet")
def test_future_feature():
    pass

# Skip conditionally
@pytest.mark.skipif(sys.platform == 'win32',
                    reason="Windows not supported")
def test_unix_feature():
    pass

# Expected to fail
@pytest.mark.xfail(reason="Known bug gh-12345")
def test_known_issue():
    assert broken_function() == expected

Platform-Specific Tests

@pytest.mark.skipif(not sys.platform.startswith('linux'),
                    reason="Linux-only test")
def test_linux_specific():
    pass

Thread-Safe Tests

NumPy CI uses pytest-run-parallel to test thread safety:

# Run tests in parallel threads
spin test -p auto

# Run with specific thread count
spin test -p 4

# Skip thread-unsafe tests
spin test -p auto -- --skip-thread-unsafe=true

Writing Thread-Safe Tests

Do:

class TestThreadSafe:
    def test_with_explicit_setup(self):
        """Thread-safe test."""
        # Create fresh state for each test
        rng = np.random.RandomState(42)
        data = rng.random(100)
        
        result = np.myfunction(data)
        assert result.shape == (100,)

Don’t:

# Not thread-safe - uses class-level fixture
class TestNotThreadSafe:
    @pytest.fixture(autouse=True)
    def setup(self):
        self.data = np.random.random(100)
    
    def test_something(self):
        # Multiple threads could access self.data simultaneously
        result = np.myfunction(self.data)

Thread-Unsafe Tests

Some tests are inherently thread-unsafe:

@pytest.mark.thread_unsafe(reason="Modifies sys.stdout")
def test_output_capture():
    """Test that modifies global state."""
    old_stdout = sys.stdout
    try:
        sys.stdout = io.StringIO()
        np.myfunction()
        output = sys.stdout.getvalue()
    finally:
        sys.stdout = old_stdout

Thread-unsafe categories:

Modifying sys.stdout, sys.stderr
Mutating global state (docstrings, modules)
Tests requiring lots of memory
Using thread-unsafe pytest fixtures (monkeypatch, capsys)

Test Coverage

Measure Coverage

# Run tests with coverage
spin test --coverage

# View HTML report
firefox build/coverage/index.html

Coverage Goals

Target: 100% Statement Coverage

Tests should cover:

All code paths (if/else branches)
Success cases
Error conditions
Edge cases (empty inputs, boundary values)
Type variations

Example: Comprehensive Coverage

def test_function_coverage():
    """Comprehensive test covering all paths."""
    # Success case
    result = np.myfunction([1, 2, 3])
    assert_equal(result, [2, 4, 6])
    
    # Edge case: empty
    result = np.myfunction([])
    assert_equal(result, [])
    
    # Edge case: single element
    result = np.myfunction([1])
    assert_equal(result, [2])
    
    # Different dtypes
    for dtype in [np.int32, np.float32, np.float64]:
        arr = np.array([1, 2, 3], dtype=dtype)
        result = np.myfunction(arr)
        assert result.dtype == dtype
    
    # Error cases
    with pytest.raises(ValueError, match="negative"):
        np.myfunction([-1])
    
    with pytest.raises(TypeError, match="unsupported type"):
        np.myfunction("string")

Running Doctests

Doctests are code examples in docstrings:

def add(a, b):
    """
    Add two numbers.
    
    Examples
    --------
    >>> add(1, 2)
    3
    >>> add(-1, 1)
    0
    """
    return a + b

Run doctests:

# Install scipy-doctest
pip install scipy-doctest

# Run all doctests
spin check-docs -v

# Run for specific module
spin check-docs numpy/linalg

# Run with pattern matching
spin check-docs -- -k 'det and not slogdet'

Doctests run in a clean environment with import numpy as np already executed.

Advanced Testing

Testing Type Annotations

# Run mypy type checks
spin mypy

Type tests are in typing/tests/.

Memory Testing

For testing memory leaks:

import tracemalloc

def test_no_memory_leak():
    """Test that repeated calls don't leak memory."""
    tracemalloc.start()
    
    # Warmup
    for _ in range(10):
        np.myfunction(np.random.random(1000))
    
    # Measure
    tracemalloc.clear_traces()
    snapshot1 = tracemalloc.take_snapshot()
    
    for _ in range(100):
        np.myfunction(np.random.random(1000))
    
    snapshot2 = tracemalloc.take_snapshot()
    
    # Compare
    stats = snapshot2.compare_to(snapshot1, 'lineno')
    total_diff = sum(stat.size_diff for stat in stats)
    
    # Allow some growth, but not proportional to iterations
    assert total_diff < 1_000_000  # 1 MB

Benchmark Tests

import timeit

def test_performance():
    """Test that operation meets performance target."""
    setup = "import numpy as np; a = np.random.random((1000, 1000))"
    time = timeit.timeit("np.sum(a)", setup=setup, number=100)
    
    # Should complete in less than 1 second for 100 iterations
    assert time < 1.0

For comprehensive benchmarking, use asv:

pip install asv
cd numpy
asv run --python=same

Best Practices

Test Before Fixing

Write a failing test that reproduces the bug, then fix it. This prevents regressions.

One Assert Per Test

Keep tests focused. If a test needs many assertions, split it into multiple tests.

Descriptive Names

Test names should describe what they test: test_empty_array_returns_empty_result

Test the Interface

Test public APIs, not internal implementation details. This allows refactoring.

Avoid Testing NumPy with NumPy

Use simple Python operations in assertions when possible, not NumPy functions that might also have bugs.

Clean Test Data

Clean up temporary files, reset global state, and avoid test interdependencies.

Fast by Default

Keep most tests fast (less than 0.1s). Mark slow tests with @pytest.mark.slow.

Use Fixtures Sparingly

Prefer explicit setup methods for thread-safety. Use fixtures only when necessary.

Continuous Integration

NumPy runs tests automatically on:

Multiple Python versions (3.10, 3.11, 3.12, etc.)
Multiple platforms (Linux, macOS, Windows)
Multiple architectures (x86-64, ARM, etc.)
With different compilers (GCC, Clang, MSVC)

Your PR must pass all CI checks before merging.

Troubleshooting

Tests fail only in CI

Check Python version differences
Look for platform-specific issues
Verify random seeds are set
Check for race conditions (thread-safety)

Import errors when running tests

Don’t run pytest from NumPy source directory
Use spin test or change directories first
Rebuild NumPy: spin build

Tests pass locally but coverage is low

Run spin test --coverage to see what’s missing
Check that error paths are tested
Verify different input types are tested

Segfault in tests

Run under debugger: spin gdb -t path/to/test.py
Build with debug symbols: spin build --clean -- -Dbuildtype=debug
Check for array dimension mismatches

Next Steps

Contributing Guide

Learn the full contribution workflow

Building NumPy

Set up your development environment

pytest Documentation

Learn more about pytest features

NumPy Testing Guide

Official NumPy testing reference

Advanced Topics

C API

F2Py

Development

​Overview

Why Testing Matters

​Running Tests

​Using spin (Recommended)

​Using pytest Directly

​From Python

​Test Organization

Test File Convention

​Writing Tests

​Basic Test Structure

​NumPy Testing Utilities

​Comparing Arrays

​Testing Exceptions

​Testing Warnings

​Parametric Tests

​Random Test Data

​Using Hypothesis

​Setup and Teardown

​Testing C Extensions

​Test Markers

​Slow Tests

​Skip and XFail

​Platform-Specific Tests

​Thread-Safe Tests

​Writing Thread-Safe Tests

​Thread-Unsafe Tests

​Test Coverage

​Measure Coverage

​Coverage Goals

Target: 100% Statement Coverage

​Example: Comprehensive Coverage

​Running Doctests

​Advanced Testing

​Testing Type Annotations

​Memory Testing

​Benchmark Tests

​Best Practices

Test Before Fixing

One Assert Per Test

Descriptive Names

Test the Interface

Avoid Testing NumPy with NumPy

Clean Test Data

Fast by Default

Use Fixtures Sparingly

​Continuous Integration

​Troubleshooting

​Next Steps

Contributing Guide

Building NumPy

pytest Documentation

NumPy Testing Guide

Build docs developers (and LLMs) love

Overview

Running Tests

Using spin (Recommended)

Using pytest Directly

From Python

Test Organization

Writing Tests

Basic Test Structure

NumPy Testing Utilities

Comparing Arrays

Testing Exceptions

Testing Warnings

Parametric Tests

Random Test Data

Using Hypothesis

Setup and Teardown

Testing C Extensions

Test Markers

Slow Tests

Skip and XFail

Platform-Specific Tests

Thread-Safe Tests

Writing Thread-Safe Tests

Thread-Unsafe Tests

Test Coverage

Measure Coverage

Coverage Goals

Example: Comprehensive Coverage

Running Doctests

Advanced Testing

Testing Type Annotations

Memory Testing

Benchmark Tests

Best Practices

Continuous Integration

Troubleshooting

Next Steps