Excess deaths due to COVID-19#

import logging

import pandas as pd
from pymc_extras.prior import Prior

import causalpy as cp

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'
seed = 42
logging.basicConfig(level=logging.INFO)

Load data#

df = (
    cp.load_data("covid")
    .assign(date=lambda x: pd.to_datetime(x["date"]))
    .set_index("date")
)

treatment_time = pd.to_datetime("2020-01-01")
df.head()

	temp	deaths	year	month	t	pre
date
2006-01-01	3.8	49124	2006	1	0	True
2006-02-01	3.4	42664	2006	2	1	True
2006-03-01	3.9	49207	2006	3	2	True
2006-04-01	7.4	40645	2006	4	3	True
2006-05-01	10.7	42425	2006	5	4	True

The columns are:

date + year: self explanatory
month: month, numerically encoded. Needs to be treated as a categorical variable
temp: average UK temperature (Celsius)
t: time
pre: boolean flag indicating pre or post intervention

Run the analysis#

In this example we are going to standardize the data. So we have to be careful in how we interpret the inferred regression coefficients, and the posterior predictions will be in this standardized space.

Note

The random_seed keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.

model = cp.pymc_models.LinearRegression(
    sample_kwargs={"random_seed": seed},
    priors={
        "beta": Prior(
            "Normal",
            mu=[42_000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            sigma=10_000,
            dims=["treated_units", "coeffs"],
        ),
        "y_hat": Prior(
            "Normal",
            sigma=Prior("HalfNormal", sigma=10_000, dims=["treated_units"]),
            dims=["obs_ind", "treated_units"],
        ),
    },
)

result = cp.InterruptedTimeSeries(
    df,
    treatment_time,
    formula="standardize(deaths) ~ 0 + standardize(t) + C(month) + standardize(temp)",
    model=model,
)

Initializing NUTS using jitter+adapt_diag...
INFO:pymc.sampling.mcmc:Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc.sampling.mcmc:Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, y_hat_sigma]
INFO:pymc.sampling.mcmc:NUTS: [beta, y_hat_sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.
INFO:pymc.sampling.mcmc:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.
Sampling: [beta, y_hat, y_hat_sigma]
INFO:pymc.sampling.forward:Sampling: [beta, y_hat, y_hat_sigma]
Sampling: [y_hat]
INFO:pymc.sampling.forward:Sampling: [y_hat]
Sampling: [y_hat]
INFO:pymc.sampling.forward:Sampling: [y_hat]
Sampling: [y_hat]
INFO:pymc.sampling.forward:Sampling: [y_hat]
Sampling: [y_hat]
INFO:pymc.sampling.forward:Sampling: [y_hat]

fig, ax = result.plot()
fig

INFO:causalpy.plot_utils:HDI intervals in plots use 'expectation' by default (model mean, excluding observation noise). For full predictive uncertainty including observation noise, use hdi_type='prediction'. To annotate plots with this information, use show_hdi_annotation=True.

../_images/3c99d54d827a70499a7c3451e4fc6e72682a1e16f769ea88015f8455c6e8a4a5.png

../_images/dfbc50f8f1f02eb961c99bf729856517bd404ef1aa20145885ca2dd86a4b8c86.png

result.summary()

==================================Pre-Post Fit==================================
Formula: standardize(deaths) ~ 0 + standardize(t) + C(month) + standardize(temp)
Model coefficients:
    C(month)[1]        1.6, 94% HDI [1.1, 2]
    C(month)[2]        -0.21, 94% HDI [-0.66, 0.23]
    C(month)[3]        0.26, 94% HDI [-0.12, 0.62]
    C(month)[4]        -0.037, 94% HDI [-0.32, 0.25]
    C(month)[5]        -0.15, 94% HDI [-0.45, 0.14]
    C(month)[6]        -0.2, 94% HDI [-0.61, 0.2]
    C(month)[7]        -0.015, 94% HDI [-0.52, 0.5]
    C(month)[8]        -0.41, 94% HDI [-0.86, 0.068]
    C(month)[9]        -0.44, 94% HDI [-0.81, -0.049]
    C(month)[10]       -0.06, 94% HDI [-0.34, 0.23]
    C(month)[11]       -0.37, 94% HDI [-0.69, -0.042]
    C(month)[12]       0.067, 94% HDI [-0.35, 0.48]
    standardize(t)     0.23, 94% HDI [0.16, 0.31]
    standardize(temp)  -0.45, 94% HDI [-0.75, -0.17]
    y_hat_sigma        0.56, 94% HDI [0.5, 0.62]

Effect Summary Reporting#

For decision-making, you often need a concise summary of the causal effect with key statistics. The effect_summary() method provides a decision-ready report with average and cumulative effects, HDI intervals, tail probabilities, and relative effects. This provides a comprehensive summary without manual post-processing.

Note

Note that in this example, the data has been standardized, so the effect estimates are in standardized units. When interpreting the results, keep in mind that the effects are relative to the standardized scale of the outcome variable.

# Generate effect summary for the full post-period
stats = result.effect_summary()
stats.table

INFO:causalpy.plot_utils:Effect size HDIs use 'expectation' by default (model mean, excluding observation noise). For full predictive uncertainty including observation noise, use hdi_type='prediction'.

	mean	median	hdi_lower	hdi_upper	p_gt_0	relative_mean	relative_hdi_lower	relative_hdi_upper
average	0.909767	0.909569	0.723887	1.099395	1.0	179.228024	90.209658	289.949146
cumulative	26.383244	26.377487	20.992726	31.882461	1.0	179.228028	90.209659	289.949154

# View the prose summary
print(stats.text)

Post-period (2020-01-01 00:00:00 to 2022-05-01 00:00:00), the average effect was 0.91 (95% HDI [0.72, 1.10]), with a posterior probability of an increase of 1.000. The cumulative effect was 26.38 (95% HDI [20.99, 31.88]); probability of an increase 1.000. Relative to the counterfactual, this equals 179.23% on average (95% HDI [90.21%, 289.95%]).

# You can also analyze a specific time window, e.g., the first 6 months of 2020
stats_window = result.effect_summary(
    window=(pd.to_datetime("2020-01-01"), pd.to_datetime("2020-06-30"))
)
stats_window.table

	mean	median	hdi_lower	hdi_upper	p_gt_0	relative_mean	relative_hdi_lower	relative_hdi_upper
average	1.983175	1.984556	1.788543	2.180481	1.0	291.671745	192.699140	416.684473
cumulative	11.899051	11.907338	10.731261	13.082884	1.0	291.671749	192.699142	416.684480