Survival analysis was designed for longitudinal data on the occurrence of events. An event can be defined as a qualitative change, or a transition from one discrete state to another, that can be situated in time. For example, a marriage is a transition from the state of being unmarried to the state of being married. Likewise, a promotion consists of the transition from a job at one level to a job at a higher level.
To apply survival analysis, you need to know more than just who is married and who is not married, for example. You also need to know when the change occurred. That is, you need to be able to situate the event in time. Ideally, the transitions occur almost instantaneously and you know the exact times at which they occur. Some events or transitions may take a little time, however, and the onset may be unknown or ambiguous. For example, if the event you are studying is a political revolution, you may not now the exact date it began, but you know the year. This is alright as long as the interval in which the event occurred is short relative to the overall duration of the event.
For survival analysis, the best observation plan is prospective. That is, you begin observing a set of individuals at some well-defined point in time and you then follow them for some substantial period of time, recording the times at which the events of interest occur.
Survival analysis is frequently used with retrospective data, however, in which people are asked to recall the dates of events like marriages, child births, promotions, etc. This is fine as long as you recognize the potential limitations. First, people may make substantial errors in recalling the time of events and they may forget some events entirely. They may also have trouble providing accurate information on time-dependent covariates. Finally, the sample of people who are actually interviewed may be a biased subsample of those who may have been at risk of the event. For example, people who have died or moved away will not be included.
Why Use Survival Analysis?
Survival data have two features that are difficult to handle with other statistical methods: censoring and time-dependent covariates. Censoring is a form of missing data in which the survival times of some subjects are not observed, usually because the event of interest does not take place for these subjects before termination of the study. Say, for example, that you are looking at a group of inmates that were released from prison and followed for one year to determine factors that contribute to re-arrest. The prisoners who were not arrested during the one-year follow-up are referred to as censored. While other statistical methods make it difficult to incorporate censored cases and time-dependent covariates, all methods of survival analysis allow for both.
Methods Used In Survival Analysis
There are many different methods used to conduct survival analyses, which can a little confusing. Sometimes the methods are complementary and can be used in conjunction with each other. Other times, it often happens that two or more methods may seem attractive for a given analysis and the researcher may be hard-pressed to find a good reason for choosing one over another. Some of the methods for conducting a survival analysis include:
- Life Tables: Records the pattern of mortality with age for some population and provides a basis for calculating the expectation of life at various ages. Life tables are a good place to start the study of survival analysis because they are largely straightforward.
- Kaplan-Meier estimators: This is the most common method of estimating the survival function. Also known as the product-limit estimator, it is simple and intuitive when there are no censored data.
- Exponential regression: When the distribution of survival time is skewed and you have exponentially distributed errors, the exponential regression model is a good beginning point. This model assumes that the survival time distribution is exponential, and contingent on the values of a set of independent variables.
- Log-normal regression: In this model, it is assumed that the survival times (or log survival times) come from a normal distribution; the resulting model is basically identical to the ordinary multiple regression model.
- Cox proportional-hazards regression: The proportional hazard model is the most general of the regression models because it is not based on any assumptions concerning the nature or shape of the underlying survival distribution. The model assumes that the underlying hazard rate (rather than survival time) is a function of the covariates; no assumptions are made about the nature or shape of the hazard function.
Allison, P.D. (1995). Survival Analysis Using SAS: A Practical Guide. North Carolina: SAS Institute.
Hosmer, D.W. and Lemeshow, S. (1999). Applied Survival Analysis: Regression Modeling of Time to Event Data. New York, NY: John Wiley & Sons.