Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Event History Analysis: Understanding the Concept, Types, Applications, and Models, Lecture notes of History

An overview of event history analysis, also known as survival analysis or duration analysis. It covers the methods and applications of event history analysis, including types of event history data, special features, and censoring. The document also introduces continuous-time models and event history modelling with covariates. An example of gender effects on age at first partnership is provided.

Typology: Lecture notes

2021/2022

Uploaded on 09/07/2022

nabeel_kk
nabeel_kk 🇸🇦

4.6

(66)

1.3K documents

1 / 185

Toggle sidebar

Related documents


Partial preview of the text

Download Event History Analysis: Understanding the Concept, Types, Applications, and Models and more Lecture notes History in PDF only on Docsity! Discrete-time Event History Analysis LECTURES Fiona Steele and Elizabeth Washbrook Centre for Multilevel Modelling University of Bristol 16-17 July 2013 What is event history analysis? Methods for the analysis of length of time until the occurrence of some event. The dependent variable is the duration until event occurrence. Event history analysis also known as: Survival analysis (especially in biostatistics and when events are not repeatable) Duration analysis Hazard modelling 3 / 183 Examples of applications Health. Age at death; duration of hospital stay Demography. Time to first birth (from when?); time to first marriage; time to divorce; time living in same house or area Economics. Duration of an episode of employment or unemployment Education. Time to leaving full-time education (from end of compulsory schooling); time to exit from teaching profession 4 / 183 Types of event history data Dates of start of exposure period and events, e.g. dates of start and end of an employment spell - Usually collected retrospectively - Sources include panel and cohort studies (partnership, birth, employment and housing histories) Current status data from panel study, e.g. current employment status at each year - Collected prospectively 5 / 183 Types of censoring Line starts when individual becomes at risk of event. Arrowhead indicates time that event occurs. i = 1 start and end time known i = 2 end time outside observation period, i.e. right-censored i = 3 start time outside observation period, i.e. left-truncated i = 4 start and end time outside observation period Right-censoring is the most common form of incomplete observation, and is straightforward to deal with using EHA. 8 / 183 Right-censoring Right-censoring is the most common form of censoring. Durations are right-censored if the event has not occurred by the end of the observation period. - E.g. in a study of divorce, most respondents will still be married when last observed Excluding right-censored observations (e.g. still married) leads to bias and may drastically reduce sample size Usually assume censoring is non-informative 9 / 183 Right-censoring: Non-informative assumption We retain right-censored observations under the assumption that censoring is non-informative, i.e. event times are independent of censoring mechanism (like the ‘missing at random’ assumption). Assume individuals are not selectively withdrawn from the sample because they are more or less likely to experience an event. May be questionable in experimental research, e.g. if more susceptible individuals were selectively withdrawn (or dropped out) from a ‘treatment’ group. 10 / 183 The hazard function A key quantity in EHA is the hazard function: h(t) = lim ∆t→0 Pr(t ≤ T < t + ∆t|T ≥ t) ∆t where the numerator is the probability that an event occurs during a very small interval of time [t, t + ∆t), given that no event occurred before time t. We divide by the width of the interval, ∆t, to get a rate. h(t) is also known as the transition rate, the instantaneous risk, or the failure rate. 13 / 183 The survivor function Another useful quantity in EHA is the survivor function: S(t) = Pr(T ≥ t) the probability that an individual does not have the event before t, or ‘survives’ until at least t. Its complement is the cumulative distribution function: F (t) = 1− S(t) = Pr(T < t) the probability that an individual has the event before t. 14 / 183 Non-parametric estimation of h(t) Group time so that t is now an interval of time (duration may already by grouped, e.g. in months or years). r(t) is number at ‘risk’ of experiencing event at start of interval t d(t) is number of events (’deaths’) observed during t w(t) is number of censored cases (’withdrawals’) in interval t The life table (or actuarial) estimator of h(t) is ĥ(t) = d(t) r(t)− w(t) Note. Assumes censoring times are spread uniformly across interval t. Some estimators have r(t)− 0.5w(t) as the denominator, or ignore censored cases. 15 / 183 Example of interpretation Event is partnering for the first time. ’Survival’ here is remaining single. h(16) = 0.02 so 2% partnered before age 17 h(20) = 0.13 so, of those who were unpartnered at their 20th birthday, 13% partnered before age 21 S(20) = 0.77 so 77% had not partnered by age 20 18 / 183 Hazard of 1st partnership If an individual has not partnered by their late 20s, their chance of partnering declines thereafter. 19 / 183 Survivor function: Probability of remaining unpartnered Note that the survivor function will always decrease with time. The hazard function may go up and down. 20 / 183 The Cox proportional hazards model The most commonly applied model is the Cox model which: Makes no assumptions about the shape of the hazard function Treats time as continuous Assumes that the effects of covariates are constant over time (although this can be modified) 23 / 183 The Cox proportional hazards model hi (t) is the hazard for individual i at time t xi is a vector of covariates (for now assumed fixed over time) with coefficients β h0(t) is the baseline hazard, i.e. the hazard when xi = 0 The Cox model can be written: hi (t) = h0(t) exp(βxi ) or sometimes as: log hi (t) = log h0(t) + βxi An individual’s hazard depends on t through h0(t) which is left unspecified, so no need to make assumptions about the shape of the hazard. 24 / 183 Cox model: Interpretation (1) hi (t) = h0(t) exp(βxi ) Covariates have a multiplicative effect on the hazard. For each 1-unit increase in x the hazard is multiplied by exp(β). To see this, consider a binary x coded 0 and 1: xi = 0 =⇒ hi (t) = h0(t) xi = 1 =⇒ hi (t) = h0(t) exp(β) So exp(β) is the ratio of the hazard for x = 1 to the hazard for x = 0, called the relative risk or hazard ratio. 25 / 183 The proportional hazards assumption Consider a model with a single covariate x and two individuals with different values denoted by x1 and x2. The proportional hazards model is written: hi (t) = h0(t) exp(βxi ) So the ratio of the hazards for individual 1 to individual 2 is: h1(t) h2(t) = exp(βx1) exp(βx2) which does not depend on t. i.e. the effect of x is the same at all durations t. 28 / 183 Example of (a) proportional and (b) non-proportional hazards for binary x 29 / 183 Estimation of the Cox model All statistical software packages have in-built procedures for estimating the Cox model. The input data are each individual’s duration yi and censoring indicator δi . The data are restructured before estimation (although this is hidden from the user), and the Cox model is then estimated using Poisson regression. We will look at this data restructuring to better understand the model and its relationship with the discrete-time approach. But note that you do not have to do this restructuring yourself! 30 / 183 Results from fitting Cox model Hazard of partnering at age t is (1.48−1)×100 = 48% higher for women than for men (i.e. W partner quicker than M) Being in full-time education decreases the hazard by (1− 0.36)× 100 = 64% 33 / 183 Discrete-time Models 34 / 183 Discrete-time data In social research, event history data are usually collected: retrospectively in a cross-sectional survey, where dates are recorded to the nearest month or year, OR prospectively in waves of a panel study (e.g. annually) Both give rise to discretely-measured durations. Also called interval-censored because we only know that an event occurred at some point during an interval of time. 35 / 183 Discrete-time hazard function Denote by pti the probability that individual i has an event during interval t, given that no event has occurred before the start of t. pti = Pr(yti = 1|yt−1,i = 0) pti is a discrete-time approximation to the continuous-time hazard function hi (t). Call pti the discrete-time hazard function. 38 / 183 Discrete-time logit model After expanding the data fit a binary response model to yti , e.g. a logit model: log ( pti 1− pti ) = αDti + βxti pti is the probability of an event during interval t Dti is a vector of functions of the cumulative duration by interval t with coefficients α xti is a vector of covariates (time-varying or constant over time) with coefficients β 39 / 183 Modelling the time-dependency of the hazard Changes in pti with t are captured in the model by αDti , the baseline hazard function. Dti has to be specified by the user. Options include: Polynomial of order p αDti = α0 + α1t + . . .+ αpt p Step function αDti = α1D1 + α2D2 + . . .+ αqDq where D1, . . . ,Dq are dummies for time intervals t = 1, . . . , q and q is the maximum observed event time. If q large, categories may be grouped to give a piecewise constant hazard model. 40 / 183 Comparison of Cox and logit estimates for age at 1st partnership Cox Logit Variable β̂ se(β̂) β̂ se(β̂) Female 0.394 0.093 0.468 0.102 Fulltime(t) −1.031 0.190 −1.133 0.197 Same substantive conclusions, but: Cox estimates are effects on log scale, and exp(β) are hazards ratios (relative risks) Logit estimates are effects on log-odds scale, and exp(β) are hazard-odds ratios 43 / 183 When will Cox and logit estimates be similar? In general, Cox and logit estimates will get closer as the hazard function becomes smaller because: log(h(t)) ≈ log ( h(t) 1−h(t) ) as h(t)→ 0. The discrete-time hazard will get smaller as the width of the time intervals become smaller. A discrete-time model with a complementary log-log link, log(− log(1− pt)) , is an approximation to the Cox proportional hazards model, and the coefficients are directly comparable. 44 / 183 Duration effects fitted as a quadratic Approximating step function by a quadratic leads to little change in estimated covariate effects. Estimates from step function model were 0.468 (SE = 0.102) for Female and −1.133 (SE = 0.197) for Fulltime. 45 / 183 Predicted log-odds of partnering: Proportional gender effects 48 / 183 Predicted log-odds of partnering: Non-proportional gender effects 49 / 183 2. Multilevel models for recurrent events and unobserved heterogeneity 50 / 183 Consequences of unobserved heterogeneity If there are individual-specific unobserved factors that affect the hazard, the observed form of the hazard function at the aggregate population level will tend to be different from the individual-level hazards. For example, even if the hazards of individuals in a population are constant over time, the population hazard (averaged across individuals) will be time-dependent, typically decreasing. This may be explained by a selection effect operating on individuals. 53 / 183 Selection effect of unobserved heterogeneity If a population is heterogeneous in its susceptibility to experiencing an event, high risk individuals will tend to have the event first, leaving behind lower risk individuals. Therefore as t increases the population is increasingly depleted of those individuals most likely to experience the event, leading to a decrease in the population hazard. Because of this selection, we may see a decrease in the population hazard even if individual hazards are constant (or even increasing). 54 / 183 Illustration of selection for constant individual hazards 55 / 183 Estimation of discrete-time model with unobserved heterogeneity We can view the person-period dataset as a 2-level structure with time intervals (t) nested within individuals (i) The discrete-time logit model with a random effect ui to capture unobserved heterogeneity between individuals is an example of a 2-level random intercept logit model The model can be fitted using routines/software for multilevel binary outcomes, e.g. Stata xtlogit 58 / 183 Results from analysis of 1st partnership without (1) and with (2) unobserved heterogeneity Model 1 Model 2 Est. (SE) Est. (SE) t 0.367 (0.056) 0.494 (0.122) t2 −0.018 (0.003) −0.020 (0.004) Female 0.469 (0.102) 0.726 (0.215) Fulltime −1.128 (0.186) −1.187 (0.208) Cons −3.368 (0.237) −4.134 (0.646) σu − − 0.920 (0.400) Likelihood-ratio test statistic for test of H0 : σu = 0 is 3.74 on 1 d.f., p=0.027 (one-sided test as σu must be non-negative). 59 / 183 More on comparing coefficients from random effects and single-level logit models In our analysis of age at 1st partnership, we saw that the positive effect of age (‘duration’) was understated if unobserved heterogeneity was ignored (as in Model 1). Note also, however, that the effects of Female and Fulltime have also changed. In both cases, the magnitude of the coefficients has increased after accounting for unobserved heterogeneity. This can be explained by a scaling effect. 60 / 183 Scaling Effect of Introducing ui (3) Denote by βRE the coefficient from a random effects model, and βSL the coefficient from the corresponding single-level model. The approximate relationship between these coefficients (for a logit model) is: βRE = βSL √ σ2 u + 3.29 3.29 Replace 3.29 by 1 to get expression for relationship between probit coefficients. Note that the same relationship would hold for duration effects α if there was no selection effect. In general, both selection and scaling effects will operate on α. 63 / 183 Time to 1st partnership: Interpretation of coefficients from the frailty model For a given individual the odds of entering a partnership at age t when in FT education are exp(−1.19) = 0.30 times the odds when not in FT education. - This interpretation is useful because Fulltime is time-varying within an individual For 2 individuals with the same random effect value the odds are exp(0.73) = 2.08 times higher for a woman than for a man - This interpretation is less useful, but we can ‘average out’ random effect to obtain population-averaged predicted probabilities 64 / 183 Population-averaged predicted probabilities The probability of an event in interval t for individual i is: pti = exp(αDti + βxti + ui ) 1 + exp(αDti + βxti + ui ) where we substitute estimates of α, β, and ui to get predicted probabilities. Rather than calculating probabilities for each record ti , however, we often want predictions for specific values of x. We do this by ’averaging out’ the individual unobservables ui . 65 / 183 Multilevel event history data Multilevel event history data arise when events are repeatable (e.g. births, partnership dissolution) or individuals are organised in groups. Suppose events are repeatable, and define an episode as a continuous period for which an individual is at risk of experiencing an event, e.g. Event Episode duration Birth Duration between birth k − 1 and birth k Marital dissolution Duration of marriage Denote by yij the duration of episode i of individual j , which is fully observed if an event occurs (δij = 1) and right-censored if not (δij = 0). 68 / 183 Data structure: the person-period-episode file individual j episode i yij δij 1 1 2 1 1 2 3 0 ↓ individual j episode i t ytij 1 1 1 0 1 1 2 1 1 2 1 0 1 2 2 0 1 2 3 0 69 / 183 Problem with analysing recurrent events We cannot assume that the durations of episodes from the same individual are independent. There may be unobserved individual-specific factors (i.e. constant across episodes) which affect the hazard of an event for all episodes, e.g. ‘taste for stability’ may influence risk of leaving a job. The presence of such unobservables, and failure to account for them in the model, will lead to correlation between durations of episodes from the same individual. 70 / 183 Example: Women’s employment transitions Analyse duration of non-employment (unemployed or out of labour market) episodes - Event is entry (1st episode) or re-entry (2nd + episodes) into employment Data are subsample from British Household Panel Study (BHPS): 1399 women and 2284 episodes Durations grouped into years ⇒ 15,297 person-year records Baseline hazard is step function with yearly dummies for durations up to 9 years, then single dummy for 9+ years Covariates include time-varying indicators of number and age of children, age, marital status and characteristics of previous job (if any) 73 / 183 Multilevel logit results for transition to employment: Baseline hazard and unobserved heterogeneity Variable Est. (se) Duration non-employed (ref is < 1 year) [1,2) years −0.646* (0.104) [2,3) −0.934* (0.135) [3,4) −1.233* (0.168) [4,5) −1.099* (0.184) [5,6) −0.944* (0.195) [6,7) −1.011* (0.215) [7,8) −1.238* (0.249) [8,9) −1.339* (0.274) ≥ 9 years −1.785* (0.175) σu (SD of woman random effect) 0.662* (0.090) * p < 0.5 74 / 183 Multilevel logit results for transition to employment: Presence and age of children Variable Est. (se) Imminent birth (within 1 year) −0.842* (0.125) No. children age ≤ 5 yrs (ref=0) 1 child −0.212* (0.097) ≥ 2 −0.346* (0.143) No. children age > 5 yrs (ref=0) 1 child 0.251 (0.118) ≥ 2 0.446* (0.117) * p < 0.5 75 / 183 Analysing grouped intervals If we have grouped time intervals, we need to allow for different lengths of exposure time within these intervals. e.g. for any 6-month interval some individuals will have the event or be censored after the 1st month while others will be exposed for the full 6 months. Denote by ntij the exposure time in grouped interval t of episode i for individual j . (Note: Intervals do not need to be the same width.) Fit binomial logit model for grouped binary data, with response ytij and denominator ntij (e.g. using the binomial() option in the Stata xtmelogit command) 78 / 183 Example of grouped time intervals Suppose an individual is observed to have an event during the 17th month of exposure, and we group durations into six-month intervals (t). Instead of 17 monthly records we would have three six-monthly records: j i t ntij ytij 1 1 1 6 0 1 1 2 6 0 1 1 3 5 1 79 / 183 Software for Recurrent Events Essentially multilevel models for binary responses Mainstream software: e.g. Stata (xtlogit), SAS (PROC NLMIXED) Specialist multilevel modelling software: e.g. MLwiN (also via runmlwin in Stata), SABRE, aML 80 / 183 Examples of Multiple States Usually individuals will move in and out of different states over time, and we wish to model these transitions. Examples: Employment states: employed full-time, employed part-time, unemployed, out of the labour market Partnership states: marriage, cohabitation, single (not in co-residential union) We will begin with models for transitions between two states, e.g. non-employment (NE) ↔ employment (E) 83 / 183 Transition Probabilities for Two States Suppose there are two states indexed by s (s = 1, 2), and Stij indicates the state occupied by individual j during interval t of episode i . Denote by ytij a binary variable indicating whether any transition has occurred during interval t, i.e. from state 1 to 2 or from state 2 to 1. The probability of a transition from state s during interval t, given that no transition has occurred before the start of t is: pstij = Pr(ytij = 1|yt−1,ij = 0, Stij = s), s = 1, 2 Call pstij a transition probability or discrete-time hazard for state s. 84 / 183 Event History Model for Transitions between 2 States Multilevel two-state logit model: log ( pstij 1− pstij ) = αsDstij + βsxstij + usj , pstij is the probability of a transition from state s during interval t Dstij is a vector of functions of cumulative duration in state s by interval t with coefficients αs xstij a vector of covariates affecting the transition from state s with coefficients βs usj allows for unobserved heterogeneity between individuals in their probability of moving from state s. Assume uj = (u1j , u2j) ∼ bivariate normal. 85 / 183 Data Structure for Two-State Model (2) Convert episode-based file to discrete-time format with one record per interval t: t ytij Eij NEij EijAgeij NEijAgeij 1 0 1 0 16 0 2 0 1 0 16 0 3 1 1 0 16 0 1 0 0 1 0 19 2 0 0 1 0 19 Note: Eij a dummy for employment, NEij a dummy for non-employment. 88 / 183 Example: Non-Employment ↔ Employment corr(u1j , u2j)=0.59, se=0.13, so large positive residual correlation between E → NE and NE → E - Women with high chance of entering E tend to have a high chance of leaving E - Women with low chance of entering E tend to have a low chance of leaving E Positive correlation arises from two sub-groups: short spells of E and NE, and longer spells of both types 89 / 183 Comparison of Selected Coefficients for NE → E Only coefficients of covariates relating to employment history change: Single-state Multistate Ever worked 2.936 2.677 Previous job part-time −0.441 −0.460 So positive effect of ‘ever worked’ has weakened, and negative effect of ‘part-time’ has strengthened. 90 / 183 Autoregressive Models for Two States An alternative way of modelling transitions between states is to include the lagged response as a predictor rather than the duration in the current state. The response ytij now indicates the state occupied at the start of interval t rather than whether a transition has occurred, i.e. ytij = { 1 if in state 1 0 if in state 2 93 / 183 1st Order Autoregressive Model An AR(1) model for the probability that individual j is in state 1 at t, ptj is: log ( ptj 1− ptj ) = α + βxtj + γyt−1,j + uj α is an intercept term γ is the effect of the state occupied at t − 1 on the log-odds of being in state 1 at t uj ∼ N(0, σ2 u) is an individual-specific random effect 94 / 183 Interpretation of AR(1) Model Suppose states are employment and unemployment. Common to find those who have been unemployed in the past are more likely to be unemployed in the future. Three potential explanations: A causal effect of unemployment at t − 1 on being unemployed at t (state dependence γ) Unobserved heterogeneity, i.e. unmeasured individual characteristics affecting unemployment probability at all t (stable traits uj) Non-stationarity, e.g. seasonality (not in current model) The AR(1) model is commonly referred to as a state dependence model. 95 / 183
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved