Symposium on Implication Analysis 

Implication Analysis: A Pragmatic Proposal for Linking THeory and Data in the Social Sciences 


Stanley Lieberson and Joel Horwich 


Sociology and other social sciences struggle to emulate a model of scientific evidence that is often inappropriate. Not only do social researchers encounter special limits, but they are also handicapped by a distorted and idealized picture of practices in the hard sciences. Ironically, while often obliged to use data of lower quality, sociology employs standards for evaluating a theory that are not attained in the hard sciences. After a brief review of these obstacles, we describe a set of procedures for using empirical data to rigorously evaluate theories and hypotheses without resorting to the mimicking of “hard science.” The interaction between theory and evidence normally involves deriving implications from the theory (usually referred to as hypotheses) and then ascertaining how closely the empirical evidence meets these implications. The appropriateness of the implications is a key factor in the entire operation, linking as they do the data and the theory. The evaluation of a theory is no better than the theory’s implications (as generated by the investigator) coupled with the quality and appropriateness of the evidence. It is our impression, however, that because this step is insufficiently addressed, there are unnecessary problems in the evaluation of theories. We use the term “Implication Analysis” to describe our efforts to review and improve current procedures. 
Data Collection and Data Quality 

Measurement Error in Stylized and Diary Data on Time Use 


Man Yee Kan and Stephan Pudney 


We investigate the nature of measurement error in time use data. Analysis of “stylised” recall questionnaire estimates and diarybased estimates of housework time from the same respondents of a British survey gives evidence of systematic biases in the stylised estimates and large random errors in both types of estimates. We examine the effect of these measurement problems on three common types of statistical analyses in which the time use variable is used as: (i) a dependent variable, (ii) an explanatory variable, and (iii)a basis for crosstabulations. We develop methods to correct the biases induced by these measurement errors. 

Validation of a Diary Measure of Children's Physical Activities 


Sandra Hofferth, Gregaory J Welk, Margarita S. Treuth, Suzanne M. Randolph, Sally C. Curtin, and Richard Valliant 


This study compares levels of physical activity of 914yearold children from a selfreported time diary with those measured using an accelerometer. Children (N=92) wore an accelerometer for one weekend day and completed a 24hour time diary for that day. The time children spent in moderate to vigorous physical activity from time diaries was moderately highly correlated with the measured results. Selfreported and objective measures of intensity were correlated, but the correlation varied substantially by activity and characteristics of the family. This study provides empirical evidence to support the validity of timediary estimates when accelerometer data are not available. 
Methods fo Analyzing Social Network Data 

A Relational Event Framework for Social Action 


Carter T. Butts 


Social behavior over short time scales is frequently understood in terms of actions, which can be thought of as discrete events in which one individual emits a behavior directed at one or more other entities in his or her environment (possibly including him or herself). Here, we introduce a highly flexible framework for modeling actions within social
settings, which permits likelihoodbased inference for behavioral mechanisms with complex dependence. Examples are given for the parameterization of base activity levels, recency, persistence, preferential attachment, transitive/cyclic interaction, and participation shifts within the relational event framework. Parameter estimation is discussed both for data in which an exact history of events is available, and for data in which only event sequences are known. The utility of the framework is illustrated via an application to dynamic modeling of responder radio communications during the early hours of the World Trade Center disaster. 

Modeling Diffusion of Multiple Innovations via Multilevel Diffusion Curves: Payola in Pop Music Radio 


Gabriel Rossman, Ming Ming Chiu, and Joeri Mol 


Diffusion curve analysis can estimate whether an innovation spreads endogenously (indicated by a characteristic "scurve") or exogenously (indicated by a characteristic negative exponential curve). Current techniques for pooling information across multiple innovations require a twostage analysis. In this paper, we develop multilevel diffusion curve analysis, which is more statistically efficient and allows for more flexible specifications than do existing methods. To substantively illustrate this technique we use data on bribery in pop radio as an example of exogenous influence on diffusion. 
Statistical Methods 

A Diagnostic Routine for the Detection of Consequential Heterogeneity of Causal Effects 


Stephan L. Morgan and Jennifer J. Todd 


Leastsquares regression estimates of causal effects are conditionalvarianceweighted estimates of individuallevel causal effects. In this article, we extract from the literature on counterfactual causality a simple ninestep routine to determine
whether or not the implicit weighting of regression has generated a misleading estimate of the average causal effect. The diagnostic routine is presented along with a detailed and original demonstration, using data from the 2002 and 2004 waves of the Education Longitudinal Study, for a contested but important causal effect in educational research: the effect of Catholic schooling, in comparison to public schooling, on the achievement of high school students in the United States. 

Four Useful Finite Mixture Models for Regression Analyses of PanelData with a Categorical Dependent Variable 


Kazuo Yamaguchi 


This paper describes and contrasts two useful ways to employ a latent class variable as a mixture variable in regression analyses of panel data with a categorical dependent variable. One way is to model unobserved heterogeneity in the trajectory, or change in the distribution, of the dependent variable. Two models that accomplish this are the latent trajectory model and latent growth curve model for a categorical dependent variable having ordered categories. Each latent class here represents a distinct trajectory of the dependent variable. The latent trajectory model introduces covariate effects on both the "intercept" and the "slope" of growth in logit which may vary among latent classes. The other useful way is to model unobserved heterogeneity in the state dependence of the dependent variable. Two models that accomplish this are introduced for a simultaneous analysis of response probability and response stability, and the latent class variable is employed to distinguish two latent populations that differ in the stability of responses over time. One of them is the switching multinomial logit model with a timelagged dependent variable as its separation indicator, and the other is the moverstayer regression model. By applying these four models to empirical data, this paper demonstrated the usefulness of these models for paneldata analyses. Example programs for specifying these models based on the program LEM are also provided. 

Outliers, Leverage Observations and Influential Cases in Factor Analysis: Minimizing Their Effect Using Robust Procedures 


KeHai Yuan and Xiaoling Zhong 


Parallel to the development in regression diagnosis, this paper defines good and bad leverage observations in factor analysis. Outliers are observations that deviate from the factor model, not from the center of the data cloud. The effect of each kind of outlying observations on the normal distributionbased maximumlikelihood estimator and the associated likelihood ratio statistic are studied through analysis. The distinction between outliers and leverage observations also clarifies the roles of three robust procedures based on different Mahalanobis distances. All the robust procedures are designed to minimize the effect of certain outlying observations. Only the robust procedure with a residualbased distance properly controls the effect of outliers. Empirical results illustrate the strength/weakness of each procedure and support those obtained in analysis. The relevance of the results to general structural equation models is discussed and formulas are provided.


Multiple Imputation of Incomplete Coteagorical Data Using Latent Class Analysis 


Jeroen K. Vermunt, Joost R. van Ginkel, L. Andries can der Ark, and Klaas Sijtsma 


We propose using latent class analysis as an alternative to loglinear analysis for the multiple imputation of incomplete categorical data. Similar to loglinear models, latent class models can be used to describe complex association structures between the variables used in the imputation model. However, unlike loglinear models, latent class models can be used to build large imputation models containing more than a few categorical variables. To obtain imputations reflecting uncertainty about the unknown model parameters, we use a nonparametric bootstrap procedure as an alternative to the more common full Bayesian approach. The proposed multiple imputation method, which is implemented in the Latent GOLD software for latent class analysis, is illustrated with two examples. In a simulated data example, we compare the new method to wellestablished methods such as maximum likelihood estimation with incomplete data and multiple imputation using a saturated loglinear model. This example shows that the proposed method yields unbiased parameter estimates and standard errors. The second example concerns an application using a typical social sciences data set. It contains 79 variables which are all included in the imputation model. The proposed method is especially useful for such large data sets because standard methods for dealing with missing data in categorical variables break down when the number of variables is so large. 