A great many applications of statistics in the natural and social sciences aim at finding or confirming causal hypotheses. While the theory of experimental design is well developed, much of the data in epidemiology, economics and policy studies is a mix of observational and experimental data, often with little of the latter. While causal inferences from observational data have been the practical basis for a great deal of quantitative social science, the reliability of such inferences has been dismissed by theorists without much study. In seems likely that work in Philosophy at Carnegie Mellon is slowly helping to change both theory and practice. For fifteen years, Richard Scheines, Clark Glymour, and Peter Spirtes (aka "The TETRAD Group", sometimes including Kelly) have worked together and with students on the representation of causal hypotheses by "Bayes nets," on methods for finding such representations from data, and on the reliability limits of any possible methods. Their work has been controversial but, at last, is gaining notice in economics, biology, space physics, educational research and elsewhere. Much of the recent philosophical writing on causation (outside of Princeton) either addresses the TETRAD framework and results critically, or elaborates on one or another of its assumptions or theorems. In part because of their influence, the annual conference on Bayes networks, Uncertainty in Artificial Intelligence, has changed from principally discussing data compaction through conditional independence to include work on causal representation, inference and prediction.
The members of the TETRAD Group continue to work closely together, but have also evolved separate projects. The theoretical projects include work by Spirtes and Thomas Richardson, of the University of Washington, on novel representations of models in which there are unmeasured variables. The aim of this work is to improve methods of searching for such structures. Using quite different ideas, and assuming linear modeling, Spirtes is also studying, with Greg Cooper, how well the causal inferences of automated procedures applied to epidemiological data correspond to the opinions of expert physicians. Scheines has developed an algorithm for dividing a large set of variables into subsets, or clusters, each with a common unobserved cause. The algorithm is being implemented and tested by Wimberly. In combination with previous algorithms, Scheines aims to find a more reliable alternative to factor analysis.
Spirtes is collaborating with Larry Wasserman (in Statistics at Carnegie Mellon), Jamie Robins (at Harvard Public Health), and Scheines on a paper stating theorems concerning the possibilities and limits of causal inference from non-experimental data. They are also performing simulation studies of various non-Bayesian principles of causal inference, and what class of priors would lead Bayesians to the same conclusions.
Kelly's theory of causation, described in Chapter 14 of his book The Logic of Reliable Inquiry, involves no probability, and conceives of the world as a trajectory of states where a state is a value for every variable needed to describe the world at a given time. A causal structure constrains the set of possible trajectories one might see. Kelly, and others in the TETRAD group, are studying the relations of these ideas to the probabilistic representations used elsewhere.
Partly because of its applicability, partly because of the need to test ideas with concrete data, and partly because of the need for financial support, the TETRAD work has been tied closely to applications, including studies of the effects of alternative treatments of pneumonia patients, studies of naval readiness, studies of the effects of low-level lead exposure on children's intelligence, studies of relations between spectra and mineral composition, and other topics. Some of the "applications" are simply popularizations; for example, in collaboration with Steve Fienberg of Statistics, Glymour and Scheines are writing an illustrative case for the federal judiciary on the use of statistics for causal interference in legal disputes.
The Computational Systems Biology Group is an association of statisticians, computer scientists and biologists at Carnegie Mellon University, the University of Pittsburgh and the University of West Florida Institute for Human and Machine Cognition, investigating statistical, algorithmic, experimental design and biological issues surrounding the interpretation of expression data, especially with SAGE and microarray techniques.