Fitting Survival Regression Models via Stan • survstan

The R package survstan can be used to fit right-censored survival data under independent censoring. The implemented models allow the fitting of survival data in the presence/absence of covariates. All inferential procedures are currently based on the maximum likelihood (ML) approach.

Installation

You can install the released version of survstan from CRAN with:

install.packages("survstan")

You can install the development version of survstan from GitHub with:

# install.packages("devtools")
devtools::install_github("fndemarqui/survstan")

Inference procedures

Let (t_i,δ_i) be the observed survival time and its corresponding failure indicator, i = 1, ⋯, n, and θ be a k × 1 vector of parameters. Then, the likelihood function for right-censored survival data under independent censoring can be expressed as:

$$ L(\boldsymbol{\theta}) = \prod_{i=1}^{n}f(t_{i}|\boldsymbol{\theta})^{\delta_{i}}S(t_{i}|\boldsymbol{\theta})^{1-\delta_{i}}. $$

The maximum likelihood estimate (MLE) of θ is obtained by directly maximization of log (L(θ)) using the rstan::optimizing() function. The function rstan::optimizing() further provides the hessian matrix of log (L(θ)), needed to obtain the observed Fisher information matrix, which is given by:

$$ \mathscr{I}(\hat{\boldsymbol{\theta}}) = -\frac{\partial^2}{\partial \boldsymbol{\theta}\boldsymbol{\theta}'} \log L(\boldsymbol{\theta})\mid_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}, $$

Inferences on θ are then based on the asymptotic properties of the MLE, $\hat{\boldsymbol{\theta}}$, that state that:

$$ \hat{\boldsymbol{\theta}} \asymp N_{k}(\boldsymbol{\theta}, \mathscr{I}^{-1}(\hat{\boldsymbol{\theta}})). $$

Baseline Distributions

Some of the most popular baseline survival distributions are implemented in the R package survstan. Such distributions include:

Exponential
Weibull
Lognormal
Loglogistic
Gamma,
Generalized Gamma (original Stacy’s parametrization)
Generalized Gamma (alternative Prentice’s parametrization)
Gompertz
Rayleigh
Birnbaum-Saunders (fatigue)

The parametrizations adopted in the package survstan are presented next.

Exponential Distribution

If T ∼ Exp(λ), then

f(t|λ) = λexp {−λt}I_[0, ∞)(t), where λ > 0 is the rate parameter.

The survival and hazard functions in this case are given by:

S(t|λ) = exp {−λt} and h(t|λ) = λ.

Weibull Distribution

If T ∼ Weibull(α,γ), then

$$ f(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}\exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\}I_{[0, \infty)}(t), $$ where α > 0 and γ > 0 are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

$$ S(t|\alpha, \gamma) = \exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\} $$ and $$ h(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}. $$

Lognormal Distribution

If T ∼ LN(μ,σ), then

$$ f(t|\mu, \sigma) = \frac{1}{\sqrt{2\pi}t\sigma}\exp\left\{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right\}I_{[0, \infty)}(t), $$ where − ∞ < μ < ∞ and σ > 0 are the mean and standard deviation in the log scale of T.

The survival and hazard functions in this case are given by:

$$S(t|\mu, \sigma) = \Phi\left(\frac{-log(t)+\mu}{\sigma}\right)$$ and $$h(t|\mu, \sigma) = \frac{f(t|\mu, \sigma)}{S(t|\mu, \sigma)},$$ where Φ(⋅) is the cumulative distribution function of the standard normal distribution.

Loglogistic Distribution

If T ∼ LL(α,γ), then

$$ f(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{\left[1 + \left(\frac{t}{\gamma}\right)^{\alpha}\right]^2}I_{[0, \infty)}(t), ~ \alpha>0, \gamma>0, $$

where α > 0 and γ > 0 are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

$$S(t|\alpha, \gamma) = \frac{1}{1+ \left(\frac{t}{\gamma}\right)^{\alpha}}$$ and $$ h(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{1 + \left(\frac{t}{\gamma}\right)^{\alpha}}. $$

Gamma Distribution

If T ∼ Gamma(α,λ), then

$$f(t|\alpha, \lambda) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)}t^{\alpha-1}\exp\left\{-\lambda t\right\}I_{[0, \infty)}(t),$$

where Γ(α) = ∫₀^∞u^α − 1exp { − u}du is the gamma function.

The survival function is given by

$$S(t|\alpha, \lambda) = 1 - \frac{\gamma^{*}(\alpha, \lambda t)}{\Gamma(\alpha)},$$ where γ^*(α,λt) is the lower incomplete gamma function, which is available only numerically. Finally, the hazard function is expressed as:

$$h(t|\alpha, \lambda) = \frac{f(t|\alpha, \lambda)}{S(t|\alpha, \lambda)}.$$

Generalized Gamma Distribution (original Stacy’s parametrization)

If T ∼ ggstacy(α,γ,κ), then

$$f(t|\alpha, \gamma, \kappa) = \frac{\kappa}{\gamma^{\alpha}\Gamma(\alpha/\kappa)}t^{\alpha-1}\exp\left\{-\left(\frac{t}{\gamma}\right)^{\kappa}\right\}I_{[0, \infty)}(t),$$ for α > 0, γ > 0 and κ > 0.

It can be show that the survival function can be expressed as:

S(t|α,γ,κ) = S_G(x|ν,1), where $x = \displaystyle\left(\frac{t}{\gamma}\right)^\kappa$, and F_G(⋅|ν,1) corresponds to the distribution function of a gamma distribution with shape parameter ν = α/γ and scale parameter equals to 1.

Finally, the hazard function is expressed as:

$$h(t|\alpha, \gamma, \kappa) = \frac{f(t|\alpha, \gamma, \kappa)}{S(t|\alpha, \gamma, \kappa)}.$$

Generalized Gamma Distribution (alternative Prentice’s parametrization)

If T ∼ ggprentice(μ,σ,φ), then

$$f(t | \mu, \sigma, \varphi) = \begin{cases} \frac{|\varphi|(\varphi^{-2})^{\varphi^{-2}}}{\sigma t\Gamma(\varphi^{-2})}\exp\{\varphi^{-2}[\varphi w - \exp(\varphi w)]\}I_{[0, \infty)}(t), & \varphi \neq 0 \\ \frac{1}{\sqrt{2\pi}t\sigma}\exp\left\{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right\}I_{[0, \infty)}(t), & \varphi = 0 \end{cases} $$ where $w = \frac{\log(t) - \mu}{\sigma}$, for − ∞ < μ < ∞, σ > 0 and − ∞ < φ < ∞$.

It can be show that the survival function can be expressed as:

$$ S(t|\mu, \sigma, \varphi) = \begin{cases} S_{G}(x|1/\varphi^2, 1), & \varphi > 0 \\ 1-S_{G}(x|1/\varphi^2, 1), & \varphi < 0 \\ S_{LN}(x|\mu, \sigma), & \varphi = 0 \end{cases} $$ where $x = \frac{1}{\varphi^2}\exp\{\varphi w\}$, S_G(⋅|1/φ²,1) is the distribution function of a gamma distribution with shape parameter 1/φ² and scale parameter equals to 1, and S_LN(x|μ,σ) corresponds to the survival function of a lognormal distribution with location parameter μ and scale parameter σ.

Finally, the hazard function is expressed as:

$$h(t|\alpha, \gamma, \kappa) = \frac{f(t|\alpha, \gamma, \kappa)}{S(t|\alpha, \gamma, \kappa)}.$$

Gompertz Distribution

If T ∼ Gamma(α,γ), then

$$f(t|\alpha, \lambda) = \alpha\exp\left\{\gamma x-\frac{\alpha}{\gamma}\left(e^{\gamma x} - 1\right)\right\}I_{[0, \infty)}(t).$$

The survival and hazard functions are given, respectively, by

$$S(t|\alpha, \lambda) = \exp\left\{-\frac{\alpha}{\gamma}\left(e^{\gamma x} - 1\right)\right\}.$$ and

$$h(t|\alpha, \lambda) = \alpha\exp\{\gamma x}.$$

Rayleigh Distribution

Let T ∼ rayleigh(σ), where σ > 0 is a scale parameter. Then, the density, survival and hazard functions are respectively given by:

$$f(t|\sigma) = \frac{x}{\sigma^2}\exp\left\{-\frac{x^2}{2\sigma^2}\right\},$$ $$S(t|\sigma) = \exp\left\{-\frac{x^2}{2\sigma^2}\right\}$$ and

$$h(t|\sigma) = \frac{x}{\sigma^2}.$$

Birnbaum-Saunders (fatigue) Distribution

If T ∼ fatigue(α,γ), then

$$ f(t|\alpha, \gamma) = \frac{\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}}{2 \alpha t}\phi\left(\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}\right)(t), ~ \alpha>0, \gamma>0, $$

where ϕ(⋅) is the probability density function of a standard normal distribution, α > 0 and γ > 0 are the shape and scale parameters, respectively.

The survival function in this case is given by:

$$ S(t|\alpha, \gamma) =\Phi\left(\sqrt{\frac{t}{\gamma}}-\sqrt{\frac{\gamma}{t}}\right)(t) $$,

where Φ(⋅) is the cumulative distribution function of a standard normal distribution. The hazard function is given by $$h(t|\mu, \sigma) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}. $$

Regression models

When covariates are available, it is possible to fit six different regression models with the R package survstan:

accelerated failure time (AFT) models;
proportional hazards (PH) models;
proportional odds (PO) models;
accelerated hazard (AH) models.
Yang and Prentice (YP) models.
extended hazard (EH) models.

The regression survival models implemented in the R package survstan are briefly described in the sequel. Denote by x a 1 × p vector of covariates, and let β and ϕ be p × 1 vectors of regression coefficients, and θ a vector of parameters associated with some baseline survival distribution. To ensure identifiability of the implemented regression models, we shall assume that the linear predictors xβ and xϕ do not include an intercept term.

Accelerate Failure Time Models

Accelerated failure time (AFT) models are defined as

T = exp {xβ}ν, where ν follows a baseline distribution with survival function S₀(⋅|θ) so that

f(t|θ,β,x) = e^−xβf₀(te^−xβ|θ) and

S(t|θ,β,x) = S₀(te^−xβ|θ).

Proportional Hazards Models

Proportional hazards (PH) models are defined as

h(t|θ,β,x) = h₀(t|θ)exp {xβ}, where h₀(t|θ) is a baseline hazard function so that

f(t|θ,β,x) = h₀(t|θ)exp {xβ−H₀(t|θ)e^xβ}, and

S(t|θ,β,x) = exp {−H₀(t|θ)e^xβ}.

Proportional Odds Models

Proportional Odds (PO) models are defined as

R(t|θ,β,x) = R₀(t|θ)exp {xβ}, where $\displaystyle R_{0}(t|\boldsymbol{\theta}) = \frac{1-S_{0}(t|\boldsymbol{\theta})}{S_{0}(t|\boldsymbol{\theta})} = \exp\{H_{0}(t|\boldsymbol{\theta})\}-1$ is a baseline odds function so that

$$ f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{h_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta} + H_{0}(t|\boldsymbol{\theta})\}}{[1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}]^2}. $$

and

$$ S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{1}{1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}}. $$

Accelerated Hazard Models

Accelerated hazard (AH) models can be defined as

h(t|θ,β,x) = h₀(t/e^xβ|θ)

so that

S(t|θ,β,x) = exp {−H₀(t/e^xβ|θ)e^xβ} and f(t|θ,β,x) = h₀(t/e^xβ|θ)exp {−H₀(t/e^xβ|θ)e^xβ}.

Extended hazard Models

The survival function of the extended hazard (EH) model is given by:

S(t|θ,β,ϕ) = exp {−H₀(t/e^xβ|θ)exp(x(β+ϕ))}.

The hazard and the probability density functions are then expressed as:

h(t|θ,β,ϕ) = h₀(t/e^xβ|θ)exp {xϕ} and

f(t|θ,β,ϕ) = h₀(t/e^xβ|θ)exp {xβ}exp {−H₀(t/e^xβ|θ)exp(x(β+ϕ))},

respectively.

The EH model includes the AH, AFT and PH models as particular cases when ϕ = 0, ϕ = − β, and β = 0, respectively.

Yang and Prentice Models

The survival function of the Yang and Prentice (YP) model is given by:

$$S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-\kappa_{L}}. $$

The hazard and the probability density functions are then expressed as:

$$h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \frac{\kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}}{\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]} $$ and

$$f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-(1+\kappa_{L})}, $$

respectively, where κ_S = exp {xβ} and κ_L = exp {xϕ}.

The YO model includes the PH and PO models as particular cases when ϕ = β and ϕ = 0, respectively.