to_bass_dataset#

pymc_marketing.bass.data.to_bass_dataset(data)[source]#
pymc_marketing.bass.data.to_bass_dataset(data)
pymc_marketing.bass.data.to_bass_dataset(data)
pymc_marketing.bass.data.to_bass_dataset(data)
pymc_marketing.bass.data.to_bass_dataset(data)

Convert common data formats to the canonical xr.Dataset for the Bass model.

Parameters:
dataxr.Dataset, pd.DataFrame, pd.Series, or np.ndarray

Input data. See the table below for conversion rules.

Returns:
xr.Dataset

Dataset with at least an "observed" variable and a "T" coordinate. Additional coordinates from the input are preserved.

Notes

Conversion rules by input type:

Input

Behaviour

xr.Dataset

Returned as-is. "T" coord is auto-generated from np.arange(n) when missing.

pd.DataFrame

Column "observed" is treated as a single-product time series. Without that column, every numeric column is treated as a separate product (wide format). A column named "T" is used as the time coordinate.

pd.Series

Single-product time series.

np.ndarray 1D

Single-product time series.

np.ndarray 2D

Multi-product matrix (T, product) with auto-generated product labels "P0", "P1", …

np.ndarray 3D+

Raises ValueError; use xr.Dataset instead.

Examples

import numpy as np
import pandas as pd
from pymc_marketing.bass import to_bass_dataset

# Single product from a 1-D array
ds = to_bass_dataset(np.array([10, 25, 50, 80]))
# <xarray.Dataset>
# Dimensions:   (T: 4)
# Coordinates:  T (T) int64 0 1 2 3
# Data vars:    observed (T) int64 10 25 50 80

# Multiple products from a wide DataFrame
df = pd.DataFrame(
    {
        "product_A": [10, 25, 40],
        "product_B": [15, 30, 45],
    }
)
ds = to_bass_dataset(df)
# <xarray.Dataset>
# Dimensions:   (T: 3, product: 2)
# Coordinates:  T (T) int64 0 1 2, product (product) object ...
# Data vars:    observed (T, product) int64

# From a 2-D matrix
data = np.random.poisson(lam=100, size=(50, 3))
ds = to_bass_dataset(data)
# Dimensions:   (T: 50, product: 3)