Skip to main content
backtesting.py uses standard OHLCV (Open, High, Low, Close, Volume) data stored as a pandas DataFrame. Understanding how data is structured and how it is accessed inside a strategy is essential for building correct strategies.

Required format

Pass a pd.DataFrame with at least these columns:
ColumnDescription
OpenBar opening price
HighBar high price
LowBar low price
CloseBar closing price
VolumeTrade volume (optional, but recommended)
Column names are case-sensitive. If Volume is unavailable, it defaults to NaN internally. The DataFrame index should be a pd.DatetimeIndex. A monotonic pd.RangeIndex is also accepted but produces a warning.
import pandas as pd

df = pd.DataFrame({
    'Open':   [100.0, 101.0, 102.0],
    'High':   [105.0, 106.0, 107.0],
    'Low':    [99.0,  100.0, 101.0],
    'Close':  [103.0, 104.0, 105.0],
    'Volume': [1000,  1200,  1100],
}, index=pd.date_range('2024-01-01', periods=3, freq='D'))
NaN values in the OHLC columns will cause Backtest.__init__ to raise a ValueError. Strip them with df.dropna() or fill them with df.interpolate() before passing to Backtest.

Built-in test data

Two sample datasets are bundled for examples and testing:
from backtesting.test import GOOG, EURUSD
DatasetDescription
GOOGDaily OHLCV data for Google (Alphabet) stock, 2004–2013
EURUSDHourly EUR/USD forex data, April 2017–February 2018
BTCUSDMonthly BTC/USD data, 2012–2024
All three are pd.DataFrame objects with DatetimeIndex ready to pass directly to Backtest.
from backtesting.test import GOOG, EURUSD, BTCUSD

Loading custom data

import pandas as pd

df = pd.read_csv(
    'data/AAPL.csv',
    index_col='Date',
    parse_dates=True,
)
# Ensure column names match
df = df.rename(columns={
    'open': 'Open',
    'high': 'High',
    'low': 'Low',
    'close': 'Close',
    'volume': 'Volume',
})
df = df.dropna()

Accessing data in a strategy

Inside a strategy, self.data provides access to the OHLCV columns plus the index.

Column access

def next(self):
    open_price  = self.data.Open[-1]
    high_price  = self.data.High[-1]
    low_price   = self.data.Low[-1]
    close_price = self.data.Close[-1]
    volume      = self.data.Volume[-1]
Each column is a NumPy array-like. Index [-1] is the current (most recent) bar. Index [-2] is the prior bar.

Index access

def next(self):
    current_time = self.data.index[-1]  # pd.Timestamp or int

The .s accessor — get a pandas Series

Append .s to any column to get a pd.Series with the proper DatetimeIndex:
def init(self):
    close_series = self.data.Close.s  # pd.Series
    self.sma = self.I(lambda s: s.rolling(20).mean().values, close_series)
The .s accessor is available on all OHLCV columns and on indicator arrays returned by self.I().

The .df accessor — get the full DataFrame

Access the underlying DataFrame including any extra columns you passed in:
def init(self):
    df = self.data.df  # full pd.DataFrame
    self.sentiment = self.I(lambda: df['sentiment'].values)

The .pip property

self.data.pip returns the smallest price unit of change, computed as 10 ** -decimal_places based on the Close column. Useful for setting stop-loss distances:
def next(self):
    pip = self.data.pip
    self.buy(sl=self.data.Close[-1] - 50 * pip)

Bar-by-bar revelation

Data access behaves differently in init() versus next(), which is the key mechanism that prevents look-ahead bias.

In init()

self.data arrays are available at full length — all bars from start to end. This is required so indicator libraries can compute their rolling windows.
def init(self):
    # full Close array available here
    self.sma = self.I(SMA, self.data.Close, 20)

In next()

self.data arrays are sliced to the current bar. Only the current bar and all prior bars are visible. The last element [-1] is always the current bar.
def next(self):
    # only sees data up to current bar
    current = self.data.Close[-1]
    prior   = self.data.Close[-2]
The same bar-by-bar slicing applies to all indicator arrays declared with self.I(). This ensures the strategy can only act on information that was available at the time, making the simulation realistic.

Extra columns

You can include additional columns in your DataFrame for use as signals or features. They are accessible via self.data.df or directly as attributes:
# Add a sentiment score column
df['Sentiment'] = sentiment_values

bt = Backtest(df, MyStrategy)
Inside the strategy:
def init(self):
    self.sentiment = self.I(lambda: self.data.df['Sentiment'].values)

def next(self):
    if self.sentiment[-1] > 0.7:
        self.buy()

Handling missing data

# Remove rows with any NaN in OHLC columns
df = df.dropna(subset=['Open', 'High', 'Low', 'Close'])

# Fill gaps with linear interpolation
df[['Open', 'High', 'Low', 'Close']] = (
    df[['Open', 'High', 'Low', 'Close']].interpolate()
)

# Sort by date if not already sorted
df = df.sort_index()
Always sort your data by date before passing it to Backtest. If the index is not monotonically increasing, Backtest will sort it automatically and emit a warning.

Build docs developers (and LLMs) love