Skip to contents

The R package survstan can be used to fit right-censored survival data under independent censoring. The implemented models allow the fitting of survival data in the presence/absence of covariates. All inferential procedures are currently based on the maximum likelihood (ML) approach.

Installation

You can install the released version of survstan from CRAN with:

install.packages("survstan")

You can install the development version of survstan from GitHub with:

# install.packages("devtools")
devtools::install_github("fndemarqui/survstan")

Inference procedures

Let (ti,δi)(t_{i}, \delta_{i}) be the observed survival time and its corresponding failure indicator, i=1,,ni=1, \cdots, n, and 𝛉\boldsymbol{\theta} be a k×1k \times 1 vector of parameters. Then, the likelihood function for right-censored survival data under independent censoring can be expressed as:

L(𝛉)=i=1nf(ti|𝛉)δiS(ti|𝛉)1δi. L(\boldsymbol{\theta}) = \prod_{i=1}^{n}f(t_{i}|\boldsymbol{\theta})^{\delta_{i}}S(t_{i}|\boldsymbol{\theta})^{1-\delta_{i}}.

The maximum likelihood estimate (MLE) of 𝛉\boldsymbol{\theta} is obtained by directly maximization of log(L(𝛉))\log(L(\boldsymbol{\theta})) using the rstan::optimizing() function. The function rstan::optimizing() further provides the hessian matrix of log(L(𝛉))\log(L(\boldsymbol{\theta})), needed to obtain the observed Fisher information matrix, which is given by:

(𝛉̂)=2𝛉𝛉logL(𝛉)𝛉=𝛉̂, \mathscr{I}(\hat{\boldsymbol{\theta}}) = -\frac{\partial^2}{\partial \boldsymbol{\theta}\boldsymbol{\theta}'} \log L(\boldsymbol{\theta})\mid_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}},

Inferences on 𝛉\boldsymbol{\theta} are then based on the asymptotic properties of the MLE, 𝛉̂\hat{\boldsymbol{\theta}}, that state that:

𝛉̂Nk(𝛉,1(𝛉̂)). \hat{\boldsymbol{\theta}} \asymp N_{k}(\boldsymbol{\theta}, \mathscr{I}^{-1}(\hat{\boldsymbol{\theta}})).

Baseline Distributions

Some of the most popular baseline survival distributions are implemented in the R package survstan. Such distributions include:

  • Exponential
  • Weibull
  • Lognormal
  • Loglogistic
  • Gamma,
  • Generalized Gamma (original Stacy’s parametrization)
  • Generalized Gamma (alternative Prentice’s parametrization)
  • Gompertz
  • Rayleigh
  • Birnbaum-Saunders (fatigue)

The parametrizations adopted in the package survstan are presented next.

Exponential Distribution

If TExp(λ)T \sim \mbox{Exp}(\lambda), then

f(t|λ)=λexp{λt}I[0,)(t), f(t|\lambda) = \lambda\exp\left\{-\lambda t\right\}I_{[0, \infty)}(t), where λ>0\lambda>0 is the rate parameter.

The survival and hazard functions in this case are given by:

S(t|λ)=exp{λt} S(t|\lambda) = \exp\left\{-\lambda t\right\} and h(t|λ)=λ. h(t|\lambda) = \lambda.

Weibull Distribution

If TWeibull(α,γ)T \sim \mbox{Weibull}(\alpha, \gamma), then

f(t|α,γ)=αγαtα1exp{(tγ)α}I[0,)(t), f(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}\exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\}I_{[0, \infty)}(t), where α>0\alpha>0 and γ>0\gamma>0 are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

S(t|α,γ)=exp{(tγ)α} S(t|\alpha, \gamma) = \exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\} and h(t|α,γ)=αγαtα1. h(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}.

Lognormal Distribution

If TLN(μ,σ)T \sim \mbox{LN}(\mu, \sigma), then

f(t|μ,σ)=12πtσexp{12(log(t)μσ)2}I[0,)(t), f(t|\mu, \sigma) = \frac{1}{\sqrt{2\pi}t\sigma}\exp\left\{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right\}I_{[0, \infty)}(t), where <μ<-\infty < \mu < \infty and σ>0\sigma>0 are the mean and standard deviation in the log scale of TT.

The survival and hazard functions in this case are given by:

S(t|μ,σ)=Φ(log(t)+μσ)S(t|\mu, \sigma) = \Phi\left(\frac{-log(t)+\mu}{\sigma}\right) and h(t|μ,σ)=f(t|μ,σ)S(t|μ,σ),h(t|\mu, \sigma) = \frac{f(t|\mu, \sigma)}{S(t|\mu, \sigma)}, where Φ()\Phi(\cdot) is the cumulative distribution function of the standard normal distribution.

Loglogistic Distribution

If TLL(α,γ)T \sim \mbox{LL}(\alpha, \gamma), then

f(t|α,γ)=αγ(tγ)α1[1+(tγ)α]2I[0,)(t),α>0,γ>0, f(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{\left[1 + \left(\frac{t}{\gamma}\right)^{\alpha}\right]^2}I_{[0, \infty)}(t), ~ \alpha>0, \gamma>0,

where α>0\alpha>0 and γ>0\gamma>0 are the shape and scale parameters, respectively.

The survival and hazard functions in this case are given by:

S(t|α,γ)=11+(tγ)αS(t|\alpha, \gamma) = \frac{1}{1+ \left(\frac{t}{\gamma}\right)^{\alpha}} and h(t|α,γ)=αγ(tγ)α11+(tγ)α. h(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{1 + \left(\frac{t}{\gamma}\right)^{\alpha}}.

Gamma Distribution

If TGamma(α,λ)T \sim \mbox{Gamma}(\alpha, \lambda), then

f(t|α,λ)=λαΓ(α)tα1exp{λt}I[0,)(t),f(t|\alpha, \lambda) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)}t^{\alpha-1}\exp\left\{-\lambda t\right\}I_{[0, \infty)}(t),

where Γ(α)=0uα1exp{u}du\Gamma(\alpha) = \int_{0}^{\infty}u^{\alpha-1}\exp\{-u\}du is the gamma function.

The survival function is given by

S(t|α,λ)=1γ*(α,λt)Γ(α),S(t|\alpha, \lambda) = 1 - \frac{\gamma^{*}(\alpha, \lambda t)}{\Gamma(\alpha)}, where γ*(α,λt)\gamma^{*}(\alpha, \lambda t) is the lower incomplete gamma function, which is available only numerically. Finally, the hazard function is expressed as:

h(t|α,λ)=f(t|α,λ)S(t|α,λ).h(t|\alpha, \lambda) = \frac{f(t|\alpha, \lambda)}{S(t|\alpha, \lambda)}.

Generalized Gamma Distribution (original Stacy’s parametrization)

If Tggstacy(α,γ,κ)T \sim \mbox{ggstacy}(\alpha, \gamma, \kappa), then

f(t|α,γ,κ)=κγαΓ(α/κ)tα1exp{(tγ)κ}I[0,)(t),f(t|\alpha, \gamma, \kappa) = \frac{\kappa}{\gamma^{\alpha}\Gamma(\alpha/\kappa)}t^{\alpha-1}\exp\left\{-\left(\frac{t}{\gamma}\right)^{\kappa}\right\}I_{[0, \infty)}(t), for α>0\alpha>0, γ>0\gamma>0 and κ>0\kappa>0.

It can be show that the survival function can be expressed as:

S(t|α,γ,κ)=SG(x|ν,1),S(t|\alpha, \gamma, \kappa) = S_{G}(x|\nu, 1), where x=(tγ)κx = \displaystyle\left(\frac{t}{\gamma}\right)^\kappa, and FG(|ν,1)F_{G}(\cdot|\nu, 1) corresponds to the distribution function of a gamma distribution with shape parameter ν=α/γ\nu = \alpha/\gamma and scale parameter equals to 1.

Finally, the hazard function is expressed as:

h(t|α,γ,κ)=f(t|α,γ,κ)S(t|α,γ,κ).h(t|\alpha, \gamma, \kappa) = \frac{f(t|\alpha, \gamma, \kappa)}{S(t|\alpha, \gamma, \kappa)}.

Generalized Gamma Distribution (alternative Prentice’s parametrization)

If Tggprentice(μ,σ,φ)T \sim \mbox{ggprentice}(\mu, \sigma, \varphi), then

f(t|μ,σ,φ)={|φ|(φ2)φ2σtΓ(φ2)exp{φ2[φwexp(φw)]}I[0,)(t),φ012πtσexp{12(log(t)μσ)2}I[0,)(t),φ=0f(t | \mu, \sigma, \varphi) = \begin{cases} \frac{|\varphi|(\varphi^{-2})^{\varphi^{-2}}}{\sigma t\Gamma(\varphi^{-2})}\exp\{\varphi^{-2}[\varphi w - \exp(\varphi w)]\}I_{[0, \infty)}(t), & \varphi \neq 0 \\ \frac{1}{\sqrt{2\pi}t\sigma}\exp\left\{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right\}I_{[0, \infty)}(t), & \varphi = 0 \end{cases} where w=log(t)μσw = \frac{\log(t) - \mu}{\sigma}, for <μ<-\infty < \mu < \infty, σ>0\sigma>0 and <φ<-\infty < \varphi < \infty$.

It can be show that the survival function can be expressed as:

S(t|μ,σ,φ)={SG(x|1/φ2,1),φ>01SG(x|1/φ2,1),φ<0SLN(x|μ,σ),φ=0 S(t|\mu, \sigma, \varphi) = \begin{cases} S_{G}(x|1/\varphi^2, 1), & \varphi > 0 \\ 1-S_{G}(x|1/\varphi^2, 1), & \varphi < 0 \\ S_{LN}(x|\mu, \sigma), & \varphi = 0 \end{cases} where x=1φ2exp{φw}x = \frac{1}{\varphi^2}\exp\{\varphi w\}, SG(|1/φ2,1)S_{G}(\cdot|1/\varphi^2, 1) is the distribution function of a gamma distribution with shape parameter 1/φ21/\varphi^2 and scale parameter equals to 1, and SLN(x|μ,σ)S_{LN}(x|\mu, \sigma) corresponds to the survival function of a lognormal distribution with location parameter μ\mu and scale parameter σ\sigma.

Finally, the hazard function is expressed as:

h(t|α,γ,κ)=f(t|α,γ,κ)S(t|α,γ,κ).h(t|\alpha, \gamma, \kappa) = \frac{f(t|\alpha, \gamma, \kappa)}{S(t|\alpha, \gamma, \kappa)}.

Gompertz Distribution

If TGamma(α,γ)T \sim \mbox{Gamma}(\alpha, \gamma), then

f(t|α,λ)=αexp{γtαγ(eγt1)}I[0,)(t).f(t|\alpha, \lambda) = \alpha\exp\left\{\gamma t-\frac{\alpha}{\gamma}\left(e^{\gamma t} - 1\right)\right\}I_{[0, \infty)}(t).

The survival and hazard functions are given, respectively, by

S(t|α,λ)=exp{αγ(eγt1)}.S(t|\alpha, \lambda) = \exp\left\{-\frac{\alpha}{\gamma}\left(e^{\gamma t} - 1\right)\right\}. and

$$h(t|\alpha, \lambda) = \alpha\exp\{\gamma t}.$$

Rayleigh Distribution

Let Trayleigh(σ)T \sim \mbox{rayleigh}(\sigma), where σ>0\sigma>0 is a scale parameter. Then, the density, survival and hazard functions are respectively given by:

f(t|σ)=xσ2exp{x22σ2},f(t|\sigma) = \frac{x}{\sigma^2}\exp\left\{-\frac{x^2}{2\sigma^2}\right\},S(t|σ)=exp{x22σ2}S(t|\sigma) = \exp\left\{-\frac{x^2}{2\sigma^2}\right\} and

h(t|σ)=xσ2.h(t|\sigma) = \frac{x}{\sigma^2}.

Birnbaum-Saunders (fatigue) Distribution

If Tfatigue(α,γ)T \sim \mbox{fatigue}(\alpha, \gamma), then

f(t|α,γ)=tγ+γt2αtϕ(tγ+γt)(t),α>0,γ>0, f(t|\alpha, \gamma) = \frac{\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}}{2 \alpha t}\phi\left(\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}\right)(t), ~ \alpha>0, \gamma>0,

where ϕ()\phi(\cdot) is the probability density function of a standard normal distribution, α>0\alpha>0 and γ>0\gamma>0 are the shape and scale parameters, respectively.

The survival function in this case is given by:

S(t|α,γ)=Φ(tγγt)(t) S(t|\alpha, \gamma) =\Phi\left(\sqrt{\frac{t}{\gamma}}-\sqrt{\frac{\gamma}{t}}\right)(t) ,

where Φ()\Phi(\cdot) is the cumulative distribution function of a standard normal distribution. The hazard function is given by h(t|μ,σ)=f(t|α,γ)S(t|α,γ).h(t|\mu, \sigma) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}.

Regression models

When covariates are available, it is possible to fit six different regression models with the R package survstan:

  • accelerated failure time (AFT) models;
  • proportional hazards (PH) models;
  • proportional odds (PO) models;
  • accelerated hazard (AH) models.
  • Yang and Prentice (YP) models.
  • extended hazard (EH) models.

The regression survival models implemented in the R package survstan are briefly described in the sequel. Denote by 𝐱\mathbf{x} a 1×p1\times p vector of covariates, and let 𝛃\boldsymbol{\beta} and 𝛟\boldsymbol{\phi} be p×1p \times 1 vectors of regression coefficients, and 𝛉\boldsymbol{\theta} a vector of parameters associated with some baseline survival distribution. To prevent identifiability issues, it is assumed that the linear predictors 𝐱𝛃\mathbf{x} \boldsymbol{\beta} and 𝐱𝛟\mathbf{x}\boldsymbol{\phi} do not include an intercept term.

Accelerate Failure Time Models

Accelerated failure time (AFT) models are defined as

T=exp{𝐱𝛃}ν, T = \exp\{\mathbf{x} \boldsymbol{\beta}\}\nu, where ν\nu follows a baseline distribution with survival function S0(|𝛉)S_{0}(\cdot|\boldsymbol{\theta}) so that

f(t|𝛉,𝛃,𝐱)=e𝐱𝛃f0(te𝐱𝛃|𝛉) f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = e^{-\mathbf{x} \boldsymbol{\beta}}f_{0}(te^{-\mathbf{x} \boldsymbol{\beta}}|\boldsymbol{\theta}) and

S(t|𝛉,𝛃,𝐱)=S0(te𝐱𝛃|𝛉). S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = S_{0}(t e^{-\mathbf{x} \boldsymbol{\beta}}|\boldsymbol{\theta}).

Proportional Hazards Models

Proportional hazards (PH) models are defined as

h(t|𝛉,𝛃,𝐱)=h0(t|𝛉)exp{𝐱𝛃}, h(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta}\}, where h0(t|𝛉)h_{0}(t|\boldsymbol{\theta}) is a baseline hazard function so that

f(t|𝛉,𝛃,𝐱)=h0(t|𝛉)exp{𝐱𝛃H0(t|𝛉)e𝐱𝛃}, f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}(t|\boldsymbol{\theta})\exp\left\{\mathbf{x} \boldsymbol{\beta} - H_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}\right\}, and

S(t|𝛉,𝛃,𝐱)=exp{H0(t|𝛉)e𝐱𝛃}. S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \exp\left\{ - H_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}\right\}.

Proportional Odds Models

Proportional Odds (PO) models are defined as

R(t|𝛉,𝛃,𝐱)=R0(t|𝛉)exp{𝐱𝛃}, R(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = R_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta}\}, where R0(t|𝛉)=1S0(t|𝛉)S0(t|𝛉)=exp{H0(t|𝛉)}1\displaystyle R_{0}(t|\boldsymbol{\theta}) = \frac{1-S_{0}(t|\boldsymbol{\theta})}{S_{0}(t|\boldsymbol{\theta})} = \exp\{H_{0}(t|\boldsymbol{\theta})\}-1 is a baseline odds function so that

f(t|𝛉,𝛃,𝐱)=h0(t|𝛉)exp{𝐱𝛃+H0(t|𝛉)}[1+R0(t|𝛉)e𝐱𝛃]2. f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{h_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta} + H_{0}(t|\boldsymbol{\theta})\}}{[1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}]^2}.

and

S(t|𝛉,𝛃,𝐱)=11+R0(t|𝛉)e𝐱𝛃. S(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = \frac{1}{1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}}.

Accelerated Hazard Models

Accelerated hazard (AH) models can be defined as

h(t|𝛉,𝛃,𝐱)=h0(t/e𝐱𝛃|𝛉)h(t|\boldsymbol{\theta}, \boldsymbol{\beta},\mathbf{x}) = h_{0}\left(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)

so that

S(t|𝛉,𝛃,𝐱)=exp{H0(t/e𝐱𝛃|𝛉)e𝐱𝛃}S(t|\boldsymbol{\theta}, \boldsymbol{\beta},\mathbf{x}) = \exp\left\{- H_{0}\left(t/ e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)e^{\mathbf{x}\boldsymbol{\beta}} \right\} and f(t|𝛉,𝛃,𝐱)=h0(t/e𝐱𝛃|𝛉)exp{H0(t/e𝐱𝛃|𝛉)e𝐱𝛃}.f(t|\boldsymbol{\theta}, \boldsymbol{\beta}, \mathbf{x}) = h_{0}\left(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)\exp\left\{- H_{0}\left(t/ e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta}\right)e^{\mathbf{x}\boldsymbol{\beta}} \right\}.

Extended hazard Models

The survival function of the extended hazard (EH) model is given by:

S(t|𝛉,𝛃,𝛟)=exp{H0(t/e𝐱𝛃|𝛉)exp(𝐱(𝛃+𝛟))}.S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \exp\left\{-H_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp(\mathbf{x}(\boldsymbol{\beta} + \boldsymbol{\phi}))\right\}.

The hazard and the probability density functions are then expressed as:

h(t|𝛉,𝛃,𝛟)=h0(t/e𝐱𝛃|𝛉)exp{𝐱𝛟}h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = h_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp\{\mathbf{x}\boldsymbol{\phi}\} and

f(t|𝛉,𝛃,𝛟)=h0(t/e𝐱𝛃|𝛉)exp{𝐱𝛃}exp{H0(t/e𝐱𝛃|𝛉)exp(𝐱(𝛃+𝛟))},f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = h_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp\{\mathbf{x}\boldsymbol{\beta}\}\exp\left\{-H_{0}(t/e^{\mathbf{x}\boldsymbol{\beta}}|\boldsymbol{\theta})\exp(\mathbf{x}(\boldsymbol{\beta}+ \boldsymbol{\phi}))\right\},

respectively.

The EH model includes the AH, AFT and PH models as particular cases when 𝛟=𝟎\boldsymbol{\phi} = \mathbf{0}, 𝛟=𝛃\boldsymbol{\phi} = -\boldsymbol{\beta}, and 𝛃=𝟎\boldsymbol{\beta} = \mathbf{0}, respectively.

Yang and Prentice Models

The survival function of the Yang and Prentice (YP) model is given by:

S(t|𝛉,𝛃,𝛟)=[1+κSκLR0(t|𝛉)]κL.S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-\kappa_{L}}.

The hazard and the probability density functions are then expressed as:

h(t|𝛉,𝛃,𝛟)=κSh0(t|𝛉)exp{H0(t|𝛉)}[1+κSκLR0(t|𝛉)]h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \frac{\kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}}{\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]} and

f(t|𝛉,𝛃,𝛟)=κSh0(t|𝛉)exp{H0(t|𝛉)}[1+κSκLR0(t|𝛉)](1+κL),f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-(1+\kappa_{L})},

respectively, where κS=exp{𝐱𝛃}\kappa_{S} = \exp\{\mathbf{x}\boldsymbol{\beta}\} and κL=exp{𝐱𝛟}\kappa_{L} = \exp\{\mathbf{x}\boldsymbol{\phi}\}.

The YO model includes the PH and PO models as particular cases when 𝛟=𝛃\boldsymbol{\phi} = \boldsymbol{\beta} and 𝛟=𝟎\boldsymbol{\phi} = \mathbf{0}, respectively.