Data

backtesting.py uses standard OHLCV (Open, High, Low, Close, Volume) data stored as a pandas DataFrame. Understanding how data is structured and how it is accessed inside a strategy is essential for building correct strategies.

Required format

Pass a pd.DataFrame with at least these columns:

Column	Description
`Open`	Bar opening price
`High`	Bar high price
`Low`	Bar low price
`Close`	Bar closing price
`Volume`	Trade volume (optional, but recommended)

Column names are case-sensitive. If Volume is unavailable, it defaults to NaN internally. The DataFrame index should be a pd.DatetimeIndex. A monotonic pd.RangeIndex is also accepted but produces a warning.

import pandas as pd

df = pd.DataFrame({
    'Open':   [100.0, 101.0, 102.0],
    'High':   [105.0, 106.0, 107.0],
    'Low':    [99.0,  100.0, 101.0],
    'Close':  [103.0, 104.0, 105.0],
    'Volume': [1000,  1200,  1100],
}, index=pd.date_range('2024-01-01', periods=3, freq='D'))

NaN values in the OHLC columns will cause Backtest.__init__ to raise a ValueError. Strip them with df.dropna() or fill them with df.interpolate() before passing to Backtest.

Built-in test data

Two sample datasets are bundled for examples and testing:

from backtesting.test import GOOG, EURUSD

Dataset	Description
`GOOG`	Daily OHLCV data for Google (Alphabet) stock, 2004–2013
`EURUSD`	Hourly EUR/USD forex data, April 2017–February 2018
`BTCUSD`	Monthly BTC/USD data, 2012–2024

All three are pd.DataFrame objects with DatetimeIndex ready to pass directly to Backtest.

from backtesting.test import GOOG, EURUSD, BTCUSD

Loading custom data

From CSV
Resample to another timeframe
Missing Open/High/Low

import pandas as pd

df = pd.read_csv(
    'data/AAPL.csv',
    index_col='Date',
    parse_dates=True,
)
# Ensure column names match
df = df.rename(columns={
    'open': 'Open',
    'high': 'High',
    'low': 'Low',
    'close': 'Close',
    'volume': 'Volume',
})
df = df.dropna()

from backtesting.lib import OHLCV_AGG

# Resample hourly data to 4-hour bars
df_4h = df.resample('4H', label='right').agg(OHLCV_AGG).dropna()

# If only Close and Volume are available, replicate Close
df['Open'] = df['High'] = df['Low'] = df['Close']

Accessing data in a strategy

Inside a strategy, self.data provides access to the OHLCV columns plus the index.

Column access

def next(self):
    open_price  = self.data.Open[-1]
    high_price  = self.data.High[-1]
    low_price   = self.data.Low[-1]
    close_price = self.data.Close[-1]
    volume      = self.data.Volume[-1]

Each column is a NumPy array-like. Index [-1] is the current (most recent) bar. Index [-2] is the prior bar.

Index access

def next(self):
    current_time = self.data.index[-1]  # pd.Timestamp or int

The `.s` accessor — get a pandas Series

Append .s to any column to get a pd.Series with the proper DatetimeIndex:

def init(self):
    close_series = self.data.Close.s  # pd.Series
    self.sma = self.I(lambda s: s.rolling(20).mean().values, close_series)

The .s accessor is available on all OHLCV columns and on indicator arrays returned by self.I().

The `.df` accessor — get the full DataFrame

Access the underlying DataFrame including any extra columns you passed in:

def init(self):
    df = self.data.df  # full pd.DataFrame
    self.sentiment = self.I(lambda: df['sentiment'].values)

The `.pip` property

self.data.pip returns the smallest price unit of change, computed as 10 ** -decimal_places based on the Close column. Useful for setting stop-loss distances:

def next(self):
    pip = self.data.pip
    self.buy(sl=self.data.Close[-1] - 50 * pip)

Bar-by-bar revelation

Data access behaves differently in init() versus next(), which is the key mechanism that prevents look-ahead bias.

In init()

self.data arrays are available at full length — all bars from start to end. This is required so indicator libraries can compute their rolling windows.

def init(self):
    # full Close array available here
    self.sma = self.I(SMA, self.data.Close, 20)

In next()

self.data arrays are sliced to the current bar. Only the current bar and all prior bars are visible. The last element [-1] is always the current bar.

def next(self):
    # only sees data up to current bar
    current = self.data.Close[-1]
    prior   = self.data.Close[-2]

The same bar-by-bar slicing applies to all indicator arrays declared with self.I(). This ensures the strategy can only act on information that was available at the time, making the simulation realistic.

Extra columns

You can include additional columns in your DataFrame for use as signals or features. They are accessible via self.data.df or directly as attributes:

# Add a sentiment score column
df['Sentiment'] = sentiment_values

bt = Backtest(df, MyStrategy)

Inside the strategy:

def init(self):
    self.sentiment = self.I(lambda: self.data.df['Sentiment'].values)

def next(self):
    if self.sentiment[-1] > 0.7:
        self.buy()

Handling missing data

# Remove rows with any NaN in OHLC columns
df = df.dropna(subset=['Open', 'High', 'Low', 'Close'])

# Fill gaps with linear interpolation
df[['Open', 'High', 'Low', 'Close']] = (
    df[['Open', 'High', 'Low', 'Close']].interpolate()
)

# Sort by date if not already sorted
df = df.sort_index()

Always sort your data by date before passing it to Backtest. If the index is not monotonically increasing, Backtest will sort it automatically and emit a warning.

Get Started

Core Concepts

Guides

Resources

Required format

Built-in test data

Loading custom data

Accessing data in a strategy

Column access

Index access

The `.s` accessor — get a pandas Series

The `.df` accessor — get the full DataFrame

The `.pip` property

Bar-by-bar revelation

In init()

In next()

Extra columns

Handling missing data

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Resources

​Required format

​Built-in test data

​Loading custom data

​Accessing data in a strategy

​Column access

​Index access

​The .s accessor — get a pandas Series

​The .df accessor — get the full DataFrame

​The .pip property

​Bar-by-bar revelation

In init()

In next()

​Extra columns

​Handling missing data

Build docs developers (and LLMs) love

Required format

Built-in test data

Loading custom data

Accessing data in a strategy

Column access

Index access

The `.s` accessor — get a pandas Series

The `.df` accessor — get the full DataFrame

The `.pip` property

Bar-by-bar revelation

Extra columns

Handling missing data