Skip to main content

Overview

NumPy uses pytest as its testing framework. Every module and package should have thorough unit tests that exercise full functionality and handle edge cases.

Why Testing Matters

Well-designed tests with good coverage:
  • Catch bugs before they reach users
  • Make refactoring safer
  • Document expected behavior
  • Enable confident code changes

Running Tests

The spin CLI is the easiest way to run tests:
# Run all tests
spin test -v

# Run specific module
spin test numpy/random

# Run specific file
spin test -t numpy/core/tests/test_multiarray.py

# Run specific test
spin test -t numpy/core/tests/test_multiarray.py::test_array_empty_like

# Run tests in parallel
spin test -p auto
The first spin test run will build NumPy if needed. Subsequent runs are faster.

Using pytest Directly

After building NumPy:
# From outside NumPy directory
pytest --pyargs numpy

# Specific module
pytest --pyargs numpy.random

# With verbose output
pytest --pyargs numpy -v
Don’t run pytest from the NumPy source directory - this can cause import errors. Use spin test or run from a different directory.

From Python

import numpy as np

# Run all tests
np.test()

# Run with verbose output
np.test(verbose=2)

# Include slow tests
np.test(label='full')

# Test specific module
np.random.test()

Test Organization

NumPy’s test structure:
numpy/
├── core/
│   ├── src/
│   ├── tests/
│   │   ├── test_multiarray.py
│   │   ├── test_numeric.py
│   │   └── test_*.py
│   └── __init__.py
├── random/
│   ├── tests/
│   │   ├── test_random.py
│   │   └── test_*.py
│   └── __init__.py
└── ...

Test File Convention

  • Tests live in tests/ subdirectories
  • Test files named test_<module>.py
  • Test functions named test_<description>()
  • Test classes named Test<Description>

Writing Tests

Basic Test Structure

import numpy as np
import pytest
from numpy.testing import assert_equal, assert_allclose

class TestMyFeature:
    def test_basic_functionality(self):
        """Test the basic case."""
        result = np.myfunction([1, 2, 3])
        expected = np.array([2, 4, 6])
        assert_equal(result, expected)
    
    def test_edge_cases(self):
        """Test boundary conditions."""
        # Empty input
        result = np.myfunction([])
        assert_equal(result, [])
        
        # Single element
        result = np.myfunction([1])
        assert_equal(result, [2])
    
    def test_error_handling(self):
        """Test that appropriate errors are raised."""
        with pytest.raises(ValueError, match="must be positive"):
            np.myfunction([-1])
Don’t add docstrings to test methods - they make test output harder to read. Use comments instead.

NumPy Testing Utilities

Comparing Arrays

from numpy.testing import (
    assert_equal,           # Exact equality
    assert_allclose,        # Close enough (with tolerance)
    assert_array_equal,     # Array equality
    assert_array_less,      # Element-wise less than
)

# Exact comparison
assert_equal(result, expected)

# Floating point comparison
assert_allclose(result, expected, rtol=1e-7, atol=1e-9)

# Check array properties too
assert_allclose(result, expected, strict=True)  # Also checks dtype and shape

# Element-wise less than
assert_array_less(a, b)  # All a[i] < b[i]

Testing Exceptions

# Modern pytest style (preferred)
with pytest.raises(ValueError, match="invalid value"):
    np.myfunction(bad_input)

# Legacy style (avoid for new tests)
from numpy.testing import assert_raises
assert_raises(ValueError, np.myfunction, bad_input)
Always use the match parameter with pytest.raises to verify the error message. This catches wrong exceptions with similar types.

Testing Warnings

# Modern pytest style (preferred)
with pytest.warns(DeprecationWarning, match="deprecated"):
    result = np.old_function()

# Legacy style (avoid for new tests)
from numpy.testing import assert_warns
assert_warns(DeprecationWarning, np.old_function)

Parametric Tests

Test multiple cases efficiently:
@pytest.mark.parametrize('dtype', [np.float32, np.float64])
@pytest.mark.parametrize('size', [3, 10, 100])
def test_solve(dtype, size):
    """Test solve for different sizes and dtypes."""
    rng = np.random.RandomState(42)
    A = rng.random((size, size)).astype(dtype)
    b = rng.random(size).astype(dtype)
    
    x = np.linalg.solve(A, b)
    
    # Check solution
    eps = np.finfo(dtype).eps
    assert_allclose(A @ x, b, rtol=eps*1e2, atol=0)
    assert x.dtype == dtype
This creates 6 test cases (2 dtypes × 3 sizes).

Random Test Data

Always seed random generators for reproducibility:
def test_with_random_data():
    """Test with seeded random data."""
    rng = np.random.RandomState(12345)  # Fixed seed
    data = rng.random(100)
    
    result = np.myfunction(data)
    assert result.shape == (100,)
Never use non-deterministic random data in tests. A test that passes sometimes and fails other times is worse than no test.

Using Hypothesis

For more sophisticated random testing:
import hypothesis
from hypothesis import given
from hypothesis import strategies as st
from hypothesis.extra import numpy as npst

@given(arr=npst.arrays(dtype=np.float64, shape=st.integers(1, 100)))
def test_sum_properties(arr):
    """Test properties that should hold for any input."""
    result = np.sum(arr)
    
    # Sum should be finite for finite inputs
    if np.all(np.isfinite(arr)):
        assert np.isfinite(result)
    
    # Sum of positive numbers is positive
    if np.all(arr >= 0):
        assert result >= 0

Setup and Teardown

For tests that need setup/cleanup:
class TestWithSetup:
    def setup_method(self):
        """Called before each test method."""
        self.data = np.random.RandomState(42).random(100)
        self.temp_file = tempfile.mktemp()
    
    def teardown_method(self):
        """Called after each test method."""
        if os.path.exists(self.temp_file):
            os.remove(self.temp_file)
    
    def test_something(self):
        """Test using setup data."""
        result = np.myfunction(self.data)
        assert len(result) == 100
For thread-safe tests, call setup/teardown explicitly instead of using fixtures. See thread safety section.

Testing C Extensions

For testing NumPy’s C API:
from numpy.testing import extbuild

def test_c_extension():
    """Test custom C extension."""
    c_code = '''
    #include <Python.h>
    #include <numpy/arrayobject.h>
    
    static PyObject* my_function(PyObject* self, PyObject* args) {
        PyArrayObject* input;
        if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &input)) {
            return NULL;
        }
        // ... implementation ...
        Py_RETURN_NONE;
    }
    
    static PyMethodDef methods[] = {
        {"my_function", my_function, METH_VARARGS, ""},
        {NULL, NULL, 0, NULL}
    };
    '''
    
    module = extbuild.build_and_import_extension(
        'test_module',
        c_code,
        include_dirs=[np.get_include()]
    )
    
    arr = np.array([1, 2, 3])
    module.my_function(arr)

Test Markers

Slow Tests

Mark time-consuming tests:
@pytest.mark.slow
def test_expensive_operation():
    """This test takes a long time."""
    large_array = np.random.random((10000, 10000))
    result = np.linalg.svd(large_array)
Run slow tests:
# Skip slow tests (default)
spin test

# Include slow tests
spin test -m full

Skip and XFail

# Skip test unconditionally
@pytest.mark.skip(reason="Not implemented yet")
def test_future_feature():
    pass

# Skip conditionally
@pytest.mark.skipif(sys.platform == 'win32',
                    reason="Windows not supported")
def test_unix_feature():
    pass

# Expected to fail
@pytest.mark.xfail(reason="Known bug gh-12345")
def test_known_issue():
    assert broken_function() == expected

Platform-Specific Tests

@pytest.mark.skipif(not sys.platform.startswith('linux'),
                    reason="Linux-only test")
def test_linux_specific():
    pass

Thread-Safe Tests

NumPy CI uses pytest-run-parallel to test thread safety:
# Run tests in parallel threads
spin test -p auto

# Run with specific thread count
spin test -p 4

# Skip thread-unsafe tests
spin test -p auto -- --skip-thread-unsafe=true

Writing Thread-Safe Tests

Do:
class TestThreadSafe:
    def test_with_explicit_setup(self):
        """Thread-safe test."""
        # Create fresh state for each test
        rng = np.random.RandomState(42)
        data = rng.random(100)
        
        result = np.myfunction(data)
        assert result.shape == (100,)
Don’t:
# Not thread-safe - uses class-level fixture
class TestNotThreadSafe:
    @pytest.fixture(autouse=True)
    def setup(self):
        self.data = np.random.random(100)
    
    def test_something(self):
        # Multiple threads could access self.data simultaneously
        result = np.myfunction(self.data)

Thread-Unsafe Tests

Some tests are inherently thread-unsafe:
@pytest.mark.thread_unsafe(reason="Modifies sys.stdout")
def test_output_capture():
    """Test that modifies global state."""
    old_stdout = sys.stdout
    try:
        sys.stdout = io.StringIO()
        np.myfunction()
        output = sys.stdout.getvalue()
    finally:
        sys.stdout = old_stdout
Thread-unsafe categories:
  • Modifying sys.stdout, sys.stderr
  • Mutating global state (docstrings, modules)
  • Tests requiring lots of memory
  • Using thread-unsafe pytest fixtures (monkeypatch, capsys)

Test Coverage

Measure Coverage

# Run tests with coverage
spin test --coverage

# View HTML report
firefox build/coverage/index.html

Coverage Goals

Target: 100% Statement Coverage

Tests should cover:
  • All code paths (if/else branches)
  • Success cases
  • Error conditions
  • Edge cases (empty inputs, boundary values)
  • Type variations

Example: Comprehensive Coverage

def test_function_coverage():
    """Comprehensive test covering all paths."""
    # Success case
    result = np.myfunction([1, 2, 3])
    assert_equal(result, [2, 4, 6])
    
    # Edge case: empty
    result = np.myfunction([])
    assert_equal(result, [])
    
    # Edge case: single element
    result = np.myfunction([1])
    assert_equal(result, [2])
    
    # Different dtypes
    for dtype in [np.int32, np.float32, np.float64]:
        arr = np.array([1, 2, 3], dtype=dtype)
        result = np.myfunction(arr)
        assert result.dtype == dtype
    
    # Error cases
    with pytest.raises(ValueError, match="negative"):
        np.myfunction([-1])
    
    with pytest.raises(TypeError, match="unsupported type"):
        np.myfunction("string")

Running Doctests

Doctests are code examples in docstrings:
def add(a, b):
    """
    Add two numbers.
    
    Examples
    --------
    >>> add(1, 2)
    3
    >>> add(-1, 1)
    0
    """
    return a + b
Run doctests:
# Install scipy-doctest
pip install scipy-doctest

# Run all doctests
spin check-docs -v

# Run for specific module
spin check-docs numpy/linalg

# Run with pattern matching
spin check-docs -- -k 'det and not slogdet'
Doctests run in a clean environment with import numpy as np already executed.

Advanced Testing

Testing Type Annotations

# Run mypy type checks
spin mypy
Type tests are in typing/tests/.

Memory Testing

For testing memory leaks:
import tracemalloc

def test_no_memory_leak():
    """Test that repeated calls don't leak memory."""
    tracemalloc.start()
    
    # Warmup
    for _ in range(10):
        np.myfunction(np.random.random(1000))
    
    # Measure
    tracemalloc.clear_traces()
    snapshot1 = tracemalloc.take_snapshot()
    
    for _ in range(100):
        np.myfunction(np.random.random(1000))
    
    snapshot2 = tracemalloc.take_snapshot()
    
    # Compare
    stats = snapshot2.compare_to(snapshot1, 'lineno')
    total_diff = sum(stat.size_diff for stat in stats)
    
    # Allow some growth, but not proportional to iterations
    assert total_diff < 1_000_000  # 1 MB

Benchmark Tests

import timeit

def test_performance():
    """Test that operation meets performance target."""
    setup = "import numpy as np; a = np.random.random((1000, 1000))"
    time = timeit.timeit("np.sum(a)", setup=setup, number=100)
    
    # Should complete in less than 1 second for 100 iterations
    assert time < 1.0
For comprehensive benchmarking, use asv:
pip install asv
cd numpy
asv run --python=same

Best Practices

Test Before Fixing

Write a failing test that reproduces the bug, then fix it. This prevents regressions.

One Assert Per Test

Keep tests focused. If a test needs many assertions, split it into multiple tests.

Descriptive Names

Test names should describe what they test: test_empty_array_returns_empty_result

Test the Interface

Test public APIs, not internal implementation details. This allows refactoring.

Avoid Testing NumPy with NumPy

Use simple Python operations in assertions when possible, not NumPy functions that might also have bugs.

Clean Test Data

Clean up temporary files, reset global state, and avoid test interdependencies.

Fast by Default

Keep most tests fast (less than 0.1s). Mark slow tests with @pytest.mark.slow.

Use Fixtures Sparingly

Prefer explicit setup methods for thread-safety. Use fixtures only when necessary.

Continuous Integration

NumPy runs tests automatically on:
  • Multiple Python versions (3.10, 3.11, 3.12, etc.)
  • Multiple platforms (Linux, macOS, Windows)
  • Multiple architectures (x86-64, ARM, etc.)
  • With different compilers (GCC, Clang, MSVC)
Your PR must pass all CI checks before merging.

Troubleshooting

  • Check Python version differences
  • Look for platform-specific issues
  • Verify random seeds are set
  • Check for race conditions (thread-safety)
  • Don’t run pytest from NumPy source directory
  • Use spin test or change directories first
  • Rebuild NumPy: spin build
  • Run spin test --coverage to see what’s missing
  • Check that error paths are tested
  • Verify different input types are tested
  • Run under debugger: spin gdb -t path/to/test.py
  • Build with debug symbols: spin build --clean -- -Dbuildtype=debug
  • Check for array dimension mismatches

Next Steps

Contributing Guide

Learn the full contribution workflow

Building NumPy

Set up your development environment

pytest Documentation

Learn more about pytest features

NumPy Testing Guide

Official NumPy testing reference

Build docs developers (and LLMs) love