The R package survstan can be used to fit right-censored survival data under independent censoring. The implemented models allow the fitting of survival data in the presence/absence of covariates. All inferential procedures are currently based on the maximum likelihood (ML) approach.
Installation
You can install the released version of survstan from CRAN with:
install.packages("survstan")
You can install the development version of survstan from GitHub with:
# install.packages("devtools")
devtools::install_github("fndemarqui/survstan")
Inference procedures
Let (ti,δi) be the observed survival time and its corresponding failure indicator, i = 1, ⋯, n, and θ be a k × 1 vector of parameters. Then, the likelihood function for right-censored survival data under independent censoring can be expressed as:
$$ L(\boldsymbol{\theta}) = \prod_{i=1}^{n}f(t_{i}|\boldsymbol{\theta})^{\delta_{i}}S(t_{i}|\boldsymbol{\theta})^{1-\delta_{i}}. $$
The maximum likelihood estimate (MLE) of θ is obtained by directly maximization of log (L(θ)) using the rstan::optimizing()
function. The function rstan::optimizing()
further provides the hessian matrix of log (L(θ)), needed to obtain the observed Fisher information matrix, which is given by:
$$ \mathscr{I}(\hat{\boldsymbol{\theta}}) = -\frac{\partial^2}{\partial \boldsymbol{\theta}\boldsymbol{\theta}'} \log L(\boldsymbol{\theta})\mid_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}, $$
Inferences on θ are then based on the asymptotic properties of the MLE, $\hat{\boldsymbol{\theta}}$, that state that:
$$ \hat{\boldsymbol{\theta}} \asymp N_{k}(\boldsymbol{\theta}, \mathscr{I}^{-1}(\hat{\boldsymbol{\theta}})). $$
Baseline Distributions
Some of the most popular baseline survival distributions are implemented in the R package survstan. Such distributions include:
- Exponential
- Weibull
- Lognormal
- Loglogistic
- Gamma
- Rayleigh
- Birnbaum-Saunders (fatigue)
The parametrizations adopted in the package survstan are presented next.
Exponential Distribution
If T ∼ Exp(λ), then
f(t|λ) = λexp {−λt}I[0, ∞)(t), where λ > 0 is the rate parameter.
The survival and hazard functions in this case are given by:
S(t|λ) = exp {−λt} and h(t|λ) = λ.
Weibull Distribution
If T ∼ Weibull(α,γ), then
$$ f(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}\exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\}I_{[0, \infty)}(t), $$ where α > 0 and γ > 0 are the shape and scale parameters, respectively.
The survival and hazard functions in this case are given by:
$$ S(t|\alpha, \gamma) = \exp\left\{-\left(\frac{t}{\gamma}\right)^{\alpha}\right\} $$ and $$ h(t|\alpha, \gamma) = \frac{\alpha}{\gamma^{\alpha}}t^{\alpha-1}. $$
Lognormal Distribution
If T ∼ LN(μ,σ), then
$$ f(t|\mu, \sigma) = \frac{1}{\sqrt{2\pi}t\sigma}\exp\left\{-\frac{1}{2}\left(\frac{log(t)-\mu}{\sigma}\right)^2\right\}I_{[0, \infty)}(t), $$ where − ∞ < μ < ∞ and σ > 0 are the mean and standard deviation in the log scale of T.
The survival and hazard functions in this case are given by:
$$S(t|\mu, \sigma) = \Phi\left(\frac{-log(t)+\mu}{\sigma}\right)$$ and $$h(t|\mu, \sigma) = \frac{f(t|\mu, \sigma)}{S(t|\mu, \sigma)},$$ where Φ(⋅) is the cumulative distribution function of the standard normal distribution.
Loglogistic Distribution
If T ∼ LL(α,γ), then
$$ f(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{\left[1 + \left(\frac{t}{\gamma}\right)^{\alpha}\right]^2}I_{[0, \infty)}(t), ~ \alpha>0, \gamma>0, $$
where α > 0 and γ > 0 are the shape and scale parameters, respectively.
The survival and hazard functions in this case are given by:
$$S(t|\alpha, \gamma) = \frac{1}{1+ \left(\frac{t}{\gamma}\right)^{\alpha}}$$ and $$ h(t|\alpha, \gamma) = \frac{\frac{\alpha}{\gamma}\left(\frac{t}{\gamma}\right)^{\alpha-1}}{1 + \left(\frac{t}{\gamma}\right)^{\alpha}}. $$
Gamma Distribution
If T ∼ Gamma(α,γ), then
$$f(t|\alpha, \lambda) = \frac{\lambda^{\alpha}}{\Gamma(\alpha)}t^{\alpha-1}\exp\left\{-\lambda t\right\}I_{[0, \infty)}(t),$$
where Γ(α) = ∫0∞uα − 1exp { − u}du is the gamma function.
The survival function is given by
$$S(t|\alpha, \lambda) = 1 - \frac{\gamma^{*}(\alpha, \gamma t)}{\Gamma(\alpha)},$$ where γ*(α,γt) is the lower incomplete gamma function, which is available only numerically. Finally, the hazard function is expressed as:
$$h(t|\alpha, \lambda) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}.$$
Rayleigh Distribution
Let T ∼ rayleigh(σ), where σ > 0 is a scale parameter. Then, the density, survival and hazard functions are respectively given by:
$$f(t|\sigma) = \frac{x}{\sigma^2}\exp\left\{-\frac{x^2}{2\sigma^2}\right\},$$ $$S(t|\sigma) = \exp\left\{-\frac{x^2}{2\sigma^2}\right\}$$ and
$$h(t|\sigma) = \frac{x}{\sigma^2}.$$
Birnbaum-Saunders (fatigue) Distribution
If T ∼ fatigue(α,γ), then
$$ f(t|\alpha, \gamma) = \frac{\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}}{2 \alpha t}\phi\left(\sqrt{\frac{t}{\gamma}}+\sqrt{\frac{\gamma}{t}}\right)(t), ~ \alpha>0, \gamma>0, $$
where ϕ(⋅) is the probability density function of a standard normal distribution, α > 0 and γ > 0 are the shape and scale parameters, respectively.
The survival function in this case is given by:
$$ S(t|\alpha, \gamma) =\Phi\left(\sqrt{\frac{t}{\gamma}}-\sqrt{\frac{\gamma}{t}}\right)(t) $$,
where Φ(⋅) is the cumulative distribution function of a standard normal distribution. The hazard function is given by $$h(t|\mu, \sigma) = \frac{f(t|\alpha, \gamma)}{S(t|\alpha, \gamma)}. $$
Regression models
When covariates are available, it is possible to fit four different regression models with the R package survstan:
- accelerated failure time (AFT) models;
- proportional hazards (PH) models;
- proportional odds (PO) models;
- accelerated hazard (AH) models.
- Yang and Prentice (YP) models.
Let x be a 1 × p vector of covariates, β a p × 1 of regression coefficients, and θ a vector of parameters associated with some baseline survival distribution, and denote by Θ = (θ,β)T the full vector of parameters. Here, to ensure identifiability, in all regression structures the linear predictor xβ does not include a intercept term.
The regression survival models implemented in the R package survstan are briefly described in the sequel.
Accelerate Failure Time Models
Accelerated failure time (AFT) models are defined as
T = exp {xβ}ν, where ν follows a baseline distribution with survival function S0(⋅|θ) so that
f(t|Θ,x) = e−xβf0(te−xβ|θ) and
S(t|Θ,x) = S0(te−xβ|θ).
Proportional Hazards Models
Proportional hazards (PH) models are defined as
h(t|Θ,x) = h0(t|θ)exp {xβ}, where h0(t|θ) is a baseline hazard function so that
f(t|Θ,x) = h0(t|θ)exp {xβ−H0(t|θ)exβ}, and
S(t|Θ,x) = exp {−H0(t|θ)exβ}.
Proportional Odds Models
Proportional Odds (PO) models are defined as
R(t|Θ,x) = R0(t|θ)exp {xβ}, where $\displaystyle R_{0}(t|\boldsymbol{\theta}) = \frac{1-S_{0}(t|\boldsymbol{\theta})}{S_{0}(t|\boldsymbol{\theta})} = \exp\{H_{0}(t|\boldsymbol{\theta})\}-1$ is a baseline odds function so that
$$ f(t|\boldsymbol{\Theta}, \mathbf{x}) = \frac{h_{0}(t|\boldsymbol{\theta})\exp\{\mathbf{x} \boldsymbol{\beta} + H_{0}(t|\boldsymbol{\theta})\}}{[1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}]^2}. $$
and
$$ S(t|\boldsymbol{\Theta}, \mathbf{x}) = \frac{1}{1 + R_{0}(t|\boldsymbol{\theta})e^{\mathbf{x} \boldsymbol{\beta}}}. $$
Accelerated Hazard Models
Accelerated hazard (AH) models are defined as
h(t|Θ,x) = h0(texβ|θ)
so that
S(t|Θ,x) = exp {−H0(texβ|θ)e−xβ} and f(t|θ,x) = h0(texβ|θ)exp {−H0(texβ|θ)e−xβ}.
Yang and Prentice Models
The survival function of the Yang and Prentice (YP) model is given by:
$$S(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-\kappa_{L}}. $$
The hazard and the probability density functions are then expressed as:
$$h(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \frac{\kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}}{\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]} $$ and
$$f(t|\boldsymbol{\theta},\boldsymbol{\beta}, \boldsymbol{\phi}) = \kappa_{S}h_{0}(t|\boldsymbol{\theta})\exp\{H_{0}(t|\boldsymbol{\theta})\}\left[1+\frac{\kappa_{S}}{\kappa_{L}}R_{0}(t|\boldsymbol{\theta})\right]^{-(1+\kappa_{L})}, $$
respectively, where κS = exp {xβ} and κL = exp {xϕ}.