In search of the best thermos for mate with Thomas Bayes

Published

August 14, 2025

NOTE: This blogpost was automatically translated from this versión en español.

Code

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import preliz as pz
import pymc as pm
import xarray as xr

random_seed = sum(map(ord, "cooling"))


def compute_temperature(time, r, T_0, T_env):
    return T_env + (T_0 - T_env) * np.exp(-r * time)


def plot_estimated_curves(idata, x, y, T_env, color="C0", kind="mean", axes=None):
    if axes is None:
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    else:
        fig = axes[0].figure

    t_grid = np.linspace(0, 9, num=100)
    t_grid_xr = xr.DataArray(t_grid, coords={"__idx__": list(range(100))}, dims="__idx__")

    mu_transformed = idata.posterior["alpha"] - idata.posterior["beta"] * t_grid_xr

    if kind == "outcome":
        coords = mu_transformed.coords
        dims = mu_transformed.dims
        draws = np.random.normal(
            loc=mu_transformed, scale=idata.posterior["sigma"].to_numpy()[..., np.newaxis]
        )

        mu_transformed = xr.DataArray(
            draws,
            coords=coords,
            dims=dims
        )

    mu_original = np.exp(mu_transformed) + T_env
    mu_transformed_mean = mu_transformed.mean(("chain", "draw")).to_numpy()
    mu_transformed_ci50 = mu_transformed.quantile((0.25, 0.75), ("chain", "draw")).to_numpy()
    mu_transformed_ci90 = mu_transformed.quantile((0.05, 0.95), ("chain", "draw")).to_numpy()

    mu_original_mean = mu_original.mean(("chain", "draw")).to_numpy()
    mu_original_ci50 = mu_original.quantile((0.25, 0.75), ("chain", "draw")).to_numpy()
    mu_original_ci90 = mu_original.quantile((0.05, 0.95), ("chain", "draw")).to_numpy()


    axes[0].plot(t_grid, mu_original_mean, color=color)
    axes[0].fill_between(
        x=t_grid, y1=mu_original_ci50[0], y2=mu_original_ci50[1], alpha=0.5, color=color
    )
    axes[0].fill_between(
        x=t_grid, y1=mu_original_ci90[0], y2=mu_original_ci90[1], alpha=0.5, color=color
    )

    axes[1].plot(t_grid, mu_transformed_mean, color=color)
    axes[1].fill_between(
        x=t_grid, y1=mu_transformed_ci50[0], y2=mu_transformed_ci50[1], alpha=0.5, color=color
    )
    axes[1].fill_between(
        x=t_grid, y1=mu_transformed_ci90[0], y2=mu_transformed_ci90[1], alpha=0.5, color=color
    )

    axes[0].scatter(x, y, color="0.33");
    axes[1].scatter(x, np.log(y - T_env), color="0.33");

    axes[0].set(xlabel="Tiempo (horas)", ylabel="Temperatura (°C)");
    axes[1].set(xlabel="Tiempo (horas)", ylabel="$\\log(T - T_\\text{ambiente})$");

    axes[0].grid(ls="--")
    axes[1].grid(ls="--")
    axes[0].set_axisbelow(True)
    axes[1].set_axisbelow(True)

    return fig, axes

Introduction

A few weeks ago, while browsing the internet, I came across this post on the social network X. There, a farmer described an experiment he was about to conduct to measure the heat retention capacity of several thermoses he had on hand.

The experiment involved pouring water at 80 °C into each thermos and recording its temperature several times throughout the day. With the collected data, it would be possible to determine which thermos had the best—and which had the worst—heat retention capacity.

After several measurements, the author of the experiment shared the results obtained:

From these results, it can be concluded that the “nn tapa rosa” thermos was the worst performer, as the water inside lost heat considerably more quickly than in any other thermos.

Now, the inevitable question is: which thermos offers the best heat retention?

To answer this question, we will use a Bayesian model based on Newton’s Law of Cooling.

Newton’s Law of Cooling

Newton’s law of cooling states that the temperature of an object changes at a rate proportional to the difference between its temperature and the ambient temperature:

\[ \frac{dT(t)}{dt} = r \, (T_\text{env} - T(t)) \tag{1}\]

where $r$ is a cooling rate.

One solution to Equation 1 is:

\[ T(t) = T_\text{env} + (T(0)- T_\text{env}) \, e^{-rt} \tag{2}\]

This means that the temperature decays exponentially toward the ambient temperature as time passes.

It can also be noted that the logarithm of the difference between the temperature at time $t$ and the ambient temperature is a linear function:

\[ \log\left(T(t) - T_\text{env}\right) = \log\left(T(0) - T_\text{env}\right) - rt \]

which can be written more concisely as:

\[ \log\left(T(t) - T_\text{env}\right) = \alpha - \beta t \]

where $\alpha = \log\left(T(0) - T_\text{env}\right)$ and $\beta = r$.

The figure below shows the shape taken by Equation 2 for different cooling rates $r$ alongside its corresponding transformed version.

Code

T_0, T_env = 80, 24
time_grid = np.linspace(0, 24, num=200)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for r in (0.02, 0.05, 0.1, 0.2, 0.5, 1):
    axes[0].plot(
        time_grid,
        compute_temperature(time=time_grid, r=r, T_0=T_0, T_env=T_env),
        label=f"$r={r}$"
    );
    axes[1].plot(
        time_grid,
        np.log(compute_temperature(time=time_grid, r=r, T_0=T_0, T_env=T_env) - T_env),
        label=f"$r={r}$"
    );

axes[0].axhline(y=T_env, color="0.2", lw=1.5, ls="--")
axes[0].grid(zorder=-99, ls="--")
axes[1].grid(zorder=-99, ls="--")
axes[1].set(ylim=(-5, 4.4))

axes[0].legend()
axes[0].set(xlabel="Tiempo", ylabel="Temperatura");
axes[1].set(xlabel="Tiempo", ylabel="$\\log(T(t) - T_\\text{ambiente})$");

The higher the value of $r$, the worse the thermos’ ability to retain temperature. In other words, the best thermos will be the one with the lowest value of $r$ (assuming that $r > 0$).

Prior elicitation

In this article, we will work with models in the following way:

\[ \begin{aligned} \log(T(t_i) - T_\text{env}) \mid t_i &\sim \text{Normal}(\mu_i, \sigma^2) \\ \mu_i &= \alpha - \beta t_i \end{aligned} \]

That is, for a given time $t_i$, we assume that the logarithm of the difference between the water temperature and the ambient temperature follows a normal distribution.

The parameter of greatest interest is $\beta$, which represents the cooling rate of the water in the thermos. First, we know that its value must be positive, since the initial water temperature drops to the ambient temperature. Furthermore, based on the curves shown in Figure 1, we can assume that a reasonable range for this parameter is in the interval $(0, 1)$. This range implies that the water in the thermos would reach room temperature, at worst, about 5 hours after filling it.

Using PreliZ, we can obtain the parameters of a gamma distribution that satisfy our requirements.

pz.maxent(pz.Gamma(), lower=0.001, upper=1, mass=0.99);

Another unknown parameter in our model is $\sigma$, the conditional standard deviation. It is important to note that this deviation is not expressed in degrees Celsius, as it describes the variability of $\log(T(t_i) - T_\text{env})$, and not the variability of $T(t_i)$.

Looking at the right panel of Figure 1, we can see that the range of variation in the response covers only a few units. Therefore, in this case, we will opt for a moderately informative gamma distribution, which concentrates a high probability in the interval $(0.05, 0.3)$.

pz.maxent(pz.Gamma(), lower=0.05, upper=0.3, mass=0.95);

Finally, we could elicitate a prior distribution for $\alpha$. However, it is not a parameter with an intuitive interpretation.

What we can do is establish a prior for the initial temperature, which implicitly determines a prior for $\alpha$.

Since in our case we know the water temperature at $t = 0$, we will consider two approaches:

Fixed value for $T(0)$: in this case, $\alpha$ is fixed to $\log(T(0) - T_\text{env})$.
Informative distribution for $T(0)$: centered on the observed value. We use a normal distribution with mean $T(0)$ and standard deviation of $0.3$ °C, which is equivalent to saying that the initial temperature differs by at most one degree from the measured temperature.

Data

From the photo shared in X’s post, the following time and temperature values can be obtained:

T_env = 24
time = np.array([0, 180, 320, 500]) / 60 # en horas
data = {
    "stanley": np.array([80.0, 70.0, 65.0, 60.0]),
    "aluminio": np.array([80.0, 62.5, 57.5, 50.0]),
    "lumilagro": np.array([75.0, 65.0, 60.0, 55.0]),
    "nn-rosa": np.array([80.0, 47.5, 37.5, 30.0])
}

It should be noted that the value of the ambient temperature (T_env) is an assumption, as it does not appear in any of the posts.

Code

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
fig.subplots_adjust(bottom=0.2)

for i, (brand, temperatures) in enumerate(data.items()):
    axes[0].plot(time, temperatures, color=f"C{i}", lw=1)
    axes[1].plot(time, np.log(temperatures - T_env), color=f"C{i}", lw=1)
    axes[0].scatter(time, temperatures, color=f"C{i}", label=brand)
    axes[1].scatter(time, np.log(temperatures - T_env), color=f"C{i}", label=brand)

axes[0].set(xlabel="Tiempo (horas)", ylabel="Temperatura (°C)");
axes[1].set(xlabel="Tiempo (horas)", ylabel="$\\log(T(t) - T_\\text{ambiente})$");
axes[0].grid(ls="--")
axes[1].grid(ls="--")

axes[0].axhline(y=T_env, color="0.2", lw=1.5, ls="--")

axes[1].legend(
    loc="lower center",
    ncol=4,
    bbox_to_anchor=(0.5, 0.025),
    bbox_transform=fig.transFigure,
    handletextpad=0.1,
    columnspacing=1
);

At first glance, the “stanley” thermos maintains the highest temperatures at all times, while the one with the “nn tapa rosa” stands out for its poor performance. The “lumilagro”, on the other hand, performs better than the “aluminio” one: although it started with a lower initial temperature, it cooled down more slowly. Finally, it cannot be said with certainty whether the “stanley” thermos really outperforms the “lumilagro”, since, although its measurements were always higher, it also started with a higher temperature.

On the other hand, the right panel of Figure 2 shows a linear trend for each thermos, which is consistent with the use of a linear model on $\log(T(t) - T_\text{env})$.

Modelos

Model 1: A thermos + known intercept

Before we start working with a model that considers all brands together, let’s work with a model for the brand “stanley” only.

with pm.Model() as model_1:
1    T_0 = pm.Data("T_0", data["stanley"].item(0))
2    alpha = pm.Deterministic("alpha", np.log(T_0 - T_env))
    beta = pm.Gamma("beta", alpha=3.3, beta=9)
    sigma = pm.Gamma("sigma", alpha=6.2, beta=37)
    mu = alpha - beta * time
    pm.Normal("log(T - T_env)", mu=mu, sigma=sigma, observed=np.log(data["stanley"] - T_env))

display(model_1.to_graphviz())

with model_1:
3    idata_1 = pm.sample(random_seed=random_seed, target_accept=0.95)

1: It registers the initial temperature as data within the model so that it displays the corresponding node in the graph.
2: It registers alpha with a Deterministic so that its values are stored in the posterior group of InferenceData.
3: The same random_seed and a high target_accept value are always used to reduce the chance of divergence.

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

Next, we use az.summary() to obtain a summary of the marginal posteriors:

az.summary(idata_1, var_names=["beta", "sigma"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta	0.059	0.010	0.041	0.081	0.000	0.000	1421.0	1350.0	1.0
sigma	0.095	0.047	0.024	0.179	0.001	0.001	921.0	899.0	1.0

For both parameters, the effective sample size is large enough, and the chains converge and mix correctly. ¹

fig, axes = plot_estimated_curves(idata=idata_1, x=time, y=data["stanley"], T_env=T_env);

In the right panel of Figure 3, the fitted regression line is shown along with the 50% and 95% credible intervals on the transformed data scale, while the left panel shows the results on the original scale. The regression line fits the points well, although the uncertainty associated with its estimate increases over time. On the other hand, the absence of uncertainty at $t=0$ is explained by the fact that the intercept has a fixed value.

Model 2: One thermos + unknown intercept

In this second model, we still work with a single thermos. The difference is that instead of fixing the initial temperature to the observed value, we assign it a highly informative prior distribution. This way, we continue incorporating the information we already have, but we don’t force the regression line to pass through a fixed point.

with pm.Model() as model_2:
    T_0 = pm.Normal("T_0", mu=80, sigma=0.5)
    alpha = pm.Deterministic("alpha", np.log(T_0 - T_env))
    beta = pm.Gamma("beta", alpha=3.34, beta=12.8)
    sigma = pm.Gamma("sigma", alpha=6.2, beta=37)
    mu = pm.Deterministic("mu", alpha - beta * time)
    pm.Normal("log(T - T_env)", mu=mu, sigma=sigma, observed=np.log(data["stanley"] - T_env))

display(model_2.to_graphviz())

with model_2:
    idata_2 = pm.sample(random_seed=random_seed, target_accept=0.95)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [T_0, beta, sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

az.summary(idata_2, var_names=["T_0", "alpha", "beta", "sigma"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
T_0	79.993	0.508	79.056	80.967	0.010	0.009	2720.0	2448.0	1.0
alpha	4.025	0.009	4.008	4.042	0.000	0.000	2720.0	2448.0	1.0
beta	0.059	0.010	0.040	0.079	0.000	0.000	1891.0	1432.0	1.0
sigma	0.096	0.049	0.022	0.187	0.001	0.001	1399.0	1500.0	1.0

fig, axes = plot_estimated_curves(idata=idata_2, x=time, y=data["stanley"], T_env=T_env);

az.plot_forest([idata_1, idata_2], model_names=["Modelo 1", "Modelo 2"], var_names=["beta", "sigma"], combined=True, figsize=(6, 4));

The marginal posteriors for beta and sigma are practically identical to those from the first model. In other words, the conclusions we can draw about beta and sigma are similar to those obtained when using a fixed initial temperature. In addition, the effective sample sizes are consistently higher than those obtained previously.

Model 3: All brands

Now that we are familiar with the model for a single brand, we can extend it to work with all brands.

Instead of having a single T_0, alpha, and beta, we will have one for each brand. In PyMC, this is achieved using dims, which allows us to work with vectors of random variables instead of scalars.

1y = np.log(np.concatenate([temps for temps in data.values()]) - T_env)
times = np.tile(time, 4)
brand_idx = np.repeat(np.arange(4), 4)
coords = {
    "brand": list(data)
}

with pm.Model(coords=coords) as model_3:
    T_0 = pm.Normal("T_0", mu=[v.item(0) for v in data.values()], sigma=0.5, dims="brand")
    alpha = pm.Deterministic("alpha", np.log(T_0 - T_env), dims="brand")
    beta = pm.Gamma("beta", alpha=3.3, beta=12.8, dims="brand")
    sigma = pm.Gamma("sigma", alpha=6.2, beta=37)
    mu = pm.Deterministic("mu", alpha[brand_idx] - beta[brand_idx] * times)
    pm.Normal("log(T - T_env)", mu=mu, sigma=sigma, observed=y)

display(model_3.to_graphviz())

with model_3:
    idata_3 = pm.sample(random_seed=random_seed, target_accept=0.95)

1: One-dimensional arrays are created with the temperatures, times, and brand index for all brands. In addition, a coordinate dictionary is prepared so that dims="brand" can be used in the model.

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [T_0, beta, sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.

Let’s look at the posterior summary returned by az.summary().

az.summary(idata_3, var_names=["T_0", "beta", "sigma"])

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
T_0[stanley]	79.971	0.487	79.047	80.883	0.008	0.008	3452.0	2791.0	1.0
T_0[aluminio]	79.895	0.493	79.028	80.855	0.008	0.008	3578.0	3149.0	1.0
T_0[lumilagro]	74.971	0.483	74.097	75.889	0.008	0.007	4030.0	3046.0	1.0
T_0[nn-rosa]	79.933	0.487	79.015	80.797	0.008	0.008	3840.0	3018.0	1.0
beta[stanley]	0.056	0.005	0.047	0.067	0.000	0.000	2920.0	2203.0	1.0
beta[aluminio]	0.096	0.005	0.086	0.105	0.000	0.000	2860.0	2534.0	1.0
beta[lumilagro]	0.063	0.005	0.053	0.073	0.000	0.000	3685.0	2653.0	1.0
beta[nn-rosa]	0.269	0.005	0.259	0.278	0.000	0.000	3425.0	2750.0	1.0
sigma	0.050	0.014	0.028	0.075	0.000	0.000	2049.0	2109.0	1.0

The first thing that stands out is that the posterior means of the initial temperatures fall into two groups: one with values close to 80 and another around 75. This result makes sense, since the initial temperature was 80 °C for all thermoses except the Lumilagro, which was 75 °C.

The beta values for each thermos also show some heterogeneity. In this case, we can conclude that the “nn-rosa” thermos has the highest heat loss (largest beta value), although it’s not possible to determine with certainty which one offers the best heat retention.

Finally, we observe that the effective sample sizes range from 2000 to 4000, exceeding in all cases those obtained in the previous models. This suggests that the posterior has a geometry more accessible to the NUTS sampler used by PyMC.

Using the az.plot_forest() function, we can obtain a summary of the marginal distribution of beta for each brand:

az.plot_forest(idata_3, var_names="beta", combined=True, figsize=(6, 4));

As mentioned earlier, the “nn-rosa” shows the highest heat loss, followed by the “aluminio” thermos, and finally the “lumilagro” and “stanley” thermoses. At first glance, it might seem that the “stanley” has a better ability to retain heat, although this plot alone does not allow us to draw definitive conclusions.

Although the sigma parameter is not inherently relevant, it is interesting to explore its posterior distribution, as it provides a measure of the random uncertainty around our regression line.

az.plot_forest(
    [idata_2, idata_3],
    model_names=["Modelo 2", "Modelo 3"],
    var_names="sigma",
    combined=True,
    figsize=(6, 4)
);

By using the data from all thermoses in this third model, we obtain an estimate of sigma with considerably less uncertainty.

Finally, we can visualize the estimated curves for each thermos, both on their original scale and on the transformed scale:

for i, brand in enumerate(data):
    fig, axes = plot_estimated_curves(
        idata=idata_3.sel(brand=brand),
        x=time,
        y=data[brand],
        T_env=T_env,
        color=f"C{i}",
    )
    fig.suptitle(f"Marca: {brand}")

First, we can see that the uncertainty in the regression line for the “stanley” brand has decreased. In addition, it seems that for those thermoses whose water temperature approached the ambient temperature more quickly (i.e., those with lower heat retention capacity), the uncertainty is smaller. The clearest example of this behavior is the “nn-rosa” thermos, whose temperature was almost equal to the ambient temperature by the end of the experiment.

Model 4: All brands + partial pooling for $$

Finally, we will create a hierarchical model in which we assume that the cooling rates $\beta$ belong to a common population. By sharing information across thermoses in this way, we expect to obtain more precise and stable posteriors, reflecting both the individual characteristics of each thermos and the overall trend of the population.

with pm.Model(coords=coords) as model_4:
    T_0 = pm.Normal("T_0", mu=[v.item(0) for v in data.values()], sigma=0.5, dims="brand")
    alpha = pm.Deterministic("alpha", np.log(T_0 - T_env), dims="brand")

1    beta_mu = pm.Normal("beta_mu", mu=3.3 / 12.8, sigma=0.05)
2    beta_sigma = pm.InverseGamma("beta_sigma", alpha=4.2, beta=0.3)
    beta = pm.Gamma(
        "beta",
        alpha=(beta_mu / beta_sigma)**2,
        beta=beta_mu / (beta_sigma**2),
        dims="brand"
    )

    sigma = pm.Gamma("sigma", alpha=6.2, beta=37)
    mu = pm.Deterministic("mu", alpha[brand_idx] - beta[brand_idx] * times)
    pm.Normal("log(T - T_env)", mu=mu, sigma=sigma, observed=y)


with model_4:
3    idata_4 = pm.sample_prior_predictive(draws=1000, random_seed=random_seed)
    idata_4.extend(pm.sample(random_seed=random_seed, target_accept=0.95))

1: The beta parameters still have a Gamma prior with a common mean and standard deviation. In this case, however, these parameters are unknown and determined by the data.
2: The function pz.maxent(pz.InverseGamma(), lower=0.01, upper=0.2, mass=0.95) is used.
3: Samples from the prior are drawn to generate Figure 4.

Sampling: [T_0, beta, beta_mu, beta_sigma, log(T - T_env), sigma]
Initializing NUTS using jitter+adapt_diag...
/home/tomas/miniconda3/envs/pymc-env/lib/python3.12/site-packages/pytensor/tensor/elemwise.py:710: RuntimeWarning: invalid value encountered in log
  variables = ufunc(*ufunc_args, **ufunc_kwargs)
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [T_0, beta_mu, beta_sigma, beta, sigma]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.

Below are the marginal posteriors of $\beta$ and $\sigma$ for both the non-hierarchical and hierarchical models:

az.plot_forest(
    [idata_3, idata_4],
    model_names=["Modelo 3", "Modelo 4"],
    var_names=["beta", "sigma"],
    combined=True,
    figsize=(6, 4)
);

Visually, we can conclude that there are no differences between the marginal posteriors of the two models. In other words, the partial pooling provided by the hierarchical model is practically nonexistent.

To understand why no differences are observed between the models, we can examine the prior and posterior of the population mean and standard deviation of $\beta$ (beta_mu and beta_sigma).

az.plot_dist_comparison(idata_4, var_names=["beta_mu", "beta_sigma"], figsize=(12, 15));

In both cases, the posterior is very similar to the prior. This is because the available information to estimate beta_mu and beta_sigma is insufficient to obtain posteriors with low uncertainty. This result is expected, since the number of groups is very small (only 4). In situations like this, unless strong prior information is available, the hierarchical approach will not yield appreciable differences compared to a model with independent parameters.

Conclusions

Which is the best thermos?

The best thermos is the one with the lowest cooling rate $$.

Based on our model, we can obtain a probabilistic result as follows:

\[ P(\beta_j = \min\left\{\beta_1, \beta_2, \beta_3, \beta_4 \right\} ) \qquad \forall j \in {1, 2, 3, 4} \]

beta_argmin = idata_4.posterior["beta"].argmin(axis=-1)
p_min = [(beta_argmin == j).mean().item() for j in range(4)]
dict(zip(data, p_min))

{'stanley': 0.826, 'aluminio': 0.0, 'lumilagro': 0.174, 'nn-rosa': 0.0}

According to our model, there is an 82% probability that the “stanley” thermos is the one that best retains heat.

In practice, it is up to us to decide whether that probability is sufficient to conclude that “stanley” is indeed superior to “lumilagro”. For example, we could also consider the difference in degrees that the “stanley” manages to maintain above the “lumilagro” over time.

Although the result is not particularly surprising, it is also possible to determine probabilistically which is the worst thermos:

\[ P(\beta_j = \max\left\{\beta_1, \beta_2, \beta_3, \beta_4 \right\} ) \qquad \forall j \in {1, 2, 3, 4} \]

beta_argmax = idata_4.posterior["beta"].argmax(axis=-1)
p_max = [(beta_argmax == j).mean().item() for j in range(4)]
dict(zip(data, p_max))

{'stanley': 0.0, 'aluminio': 0.0, 'lumilagro': 0.0, 'nn-rosa': 1.0}

Appendix

Inference based on uniform priors

It is reasonable to ask whether the effort involved in specifying priors is worthwhile. Below, we fit the multi-brand model using uniform priors and compare the results with those obtained previously.

with pm.Model(coords=coords) as model_5:
    alpha = pm.Flat("alpha", dims="brand")
    beta = pm.Flat("beta", dims="brand")
    sigma = pm.Uniform("sigma", lower=0, upper=100)
    mu = pm.Deterministic("mu", alpha[brand_idx] - beta[brand_idx] * times)
    pm.Normal("y", mu=mu, sigma=sigma, observed=y)

    idata_5 = pm.sample(random_seed=random_seed, target_accept=0.99, progressbar=False)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.

az.plot_forest(
    [idata_4, idata_5],
    model_names=["Priors informativos", "Priors uniformes"],
    var_names=["alpha"],
    combined=True,
    figsize=(6, 4)
);

ax = az.plot_forest(
    [idata_4, idata_5],
    model_names=["Priors informativos", "Priors uniformes"],
    var_names=["beta"],
    combined=True,
    figsize=(6, 4)
)
ax[0].grid(axis="x", ls="--");

ax = az.plot_forest(
    [idata_4, idata_5],
    model_names=["Priors informativos", "Priors uniformes"],
    var_names=["sigma"],
    combined=True, figsize=(6, 4)
)
ax[0].grid(axis="x", ls="--");

beta_argmin = idata_5.posterior["beta"].argmin(axis=-1)
p_min = [(beta_argmin == j).mean().item() for j in range(4)]
dict(zip(data, p_min))

{'stanley': 0.74575, 'aluminio': 0.001, 'lumilagro': 0.25325, 'nn-rosa': 0.0}

beta_argmax = idata_5.posterior["beta"].argmax(axis=-1)
p_max = [(beta_argmax == j).mean().item() for j in range(4)]
dict(zip(data, p_max))

{'stanley': 0.0, 'aluminio': 0.0, 'lumilagro': 0.0, 'nn-rosa': 1.0}

Based on a model with uniform priors, we reach conclusions in the same direction, but with a higher level of uncertainty.

That’s all folks

Footnotes

En este blog no hacemos uso de los traceplots porque las cadenas siempre se mezclan bien y resulta suficiente usar el tamaño efectivo de muestra y el $\hat{R}$.↩︎