Predicting Cognitive Performance in Open-ended Dynamic Tasks
A Modeling Comparison Challenge
Organized by: Christian Lebiere, Cleotilde Gonzalez, & Walter Warwick

Supported in part by the Army Research Laboratory

Submission deadline: May 20, 2009
Register ASAP
Winners announced electronically by June 1, 2009
Concluding Symposium to be held at the ICCM conference on July 24-26, 2009


:. Home .:
:. Results .:
:. Registration .:
:. The DSF Task .:
:. Method .:
:. Human Performance Data .:
:. References .:
:. FAQ .:
:. Contact Us .:

Update: Special issue of the Journal of General Artificial Intelligence

For those planning to submit to the special issue of the Journal of General Artificial Intelligence on model comparison for cognitive architecture and AGI, please note that submission deadline has been postponed from December 1st, 2009 to December 18th, 2009. December 18th is now the new submission deadline for articles to the special issue.

Submissions should be sent by December 18th, 2009 to Manuscripts should conform to the JAGI formatting guidelines that can be found at the journal's web site,, and should not exceed 20 pages of total length. Manuscripts will be submitted to a traditional anonymous peer-review process with publication of accepted contributions expected by summer 2010. Authors will be required to provide the final camera-ready, formatted and copy-edited manuscript. Inquiries regarding this special issue can be sent to or directly to any of the special issue editors at the addresses below.

Special Issue Editors

Christian Lebiere
Psychology Department, Carnegie Mellon University

Cleotilde Gonzalez
Social and Decision Sciences Department, Carnegie Mellon University

Walter Warwick
MA&D Operation, Alion Science and Technology

Announcement: Winners

Congratulations to the top 3 models of our comparison challenge!

Reitter, Iglesias, & Halbruegge

Thank you all whom took the time to design and implement your models, hooked them up to the simulation, and wrote up some very interesting descriptions of your results. The models exhibited both a comforting set of common principles but also a fascinating diversity of approaches and emphases. We feel that this kind of comparative work toward a common goal using the same constraints is essential to the future of cognitive modeling.

Each model was tested under five new conditions under two broad sets of manipulations:

1) a sequence condition under which the environmental inputs/outputs followed a non-smooth but largely predictable sequence. This condition was inspired by experiments in the field of sequence learning in cognitive psychology. This general condition was instantiated into three separate conditions to test the model's robustness to noise and scalability to sequence complexity:

  1. a repeating sequence of length 2
  2. the same sequence of length 2 with binary noise added
  3. a repeating sequence of length 4

2) a feedback delay condition in which the user inputs/outputs were delayed in their effect. This condition was inspired by well-known results in the field of control systems. Again, this general condition was instantiated into two separate conditions to test the model's scalability to feedback delay:

  1. a delay of 2 in the application of user inputs/outputs (as compared to the delay of 1 in the standard condition, i.e. user i/o taking effect in the next cycle)
  2. a delay of 3 in user input/outputs

Fairly conventional measures were used to determine the top 3 participants, with the expectation that more interesting and original analyses can be presented at the symposium and in follow-up papers. The Root Mean Square Error and correlation (R2) measures were computed over a range of potential values (bonus, user input and output, tank amount, discrepancy to goal) at the individual run level. Due to the strong correlation between the various values, one, the discrepancy to the goal, was selected as the focus. This resulted in 10 quantitative measures (i.e. RMSE and R2 over 5 distinct conditions). Each model was rank-ordered over those 10 measures, with 1 being best and 9 worst, and summed those rankings over all 10 measures to establish an overall ranking.

Personal invitations have been extended to the top 3 participants to present their models and any other thoughts about model comparison and evaluation in the DSF Challenge at the ICCM symposium on July 25. There are also plans to secure a special issue of a cognitive modeling journal focused on model evaluation and comparison in the context of the DSF and similar challenges. The call for paper will be open and extended especially to all of you who contributed models to this modeling challenge.

Thank you all again for your submissions. We hope to see many of you at the symposium in Manchester.

Best regards,
The DSF Challenge Organizers


Model comparison is becoming increasingly common in computational cognitive modeling. The methodology is straightforward: model comparisons invite the independent development of distinct computational approaches to simulate human performance on a well-defined task. Typically, the benchmarks of the comparison are goodness-of-fit measures to human data that are calculated for the various models. Although the quantitative measures might suggest that model comparisons produce "winners," the real focus of model comparison is, or at least should be, on understanding in some detail how the different modeling "architectures" have been applied to the common task, which constraints the architecture imposed upon the task model, and in turn how the human performance data (in)validated the architectural theoretical assumptions.

In this spirit, we are pleased to announce a new model comparison effort that will illuminate the general features of cognitive models as they are applied to control problems in dynamic environments. The simulated task, the Dynamic Stocks and Flows (DSF), was developed by Cleotilde Gonzalez and Varun Dutt at the Dynamic Decision Making Laboratory and has been used to investigate human performance under conditions of dynamic complexity (Dutt & Gonzalez, 2007; Gonzalez & Dutt, 2007). The task integrates a number of basic cognitive requirements, such as action selection, learning in its control aspect, decisions from experience, prediction and projection, and learning of event sequences in its incorporation of external information streams. Thus, this task has broad implications for understanding the nature of complex cognition

Participants of this challenge are invited to develop computational models that simulate human performance on the DSF task in a variety of conditions. The goal here is not to produce a model of optimal behavior but, rather, a model that can predict actual behavior of human participants, including mistakes and limitations, as they learn to control the DSF. Moreover, as we describe in the Methods section, modelers will only have access to a subset of the observed data in developing their model, meaning that the predictions made by a model must generalize to new conditions that will test the model’s generality and scalability as a function of task complexity.

On the basis of the quantitative goodness-of-fit to these transfer conditions, the single, best fitting model to the novel conditions of the task will be identified, and a representative of the development team will be invited to present their work as part of a model comparison symposium at the 2009 International Conference on Cognitive Modeling. In addition, the authors of two other models that capture important aspects of human performance on the task will also be invited to the symposium. The exact criteria we use to select those models for presentation are given in the Methods section. We will cover basic travel expenses to that conference for all three presenters, up to $2,000 per person. Finally, participants may be invited to prepare manuscripts for publication in a Special Issue of a journal devoted to the topic of model comparison.

Click here to see our letter of invitation to participate.


Navigating this site

1. The registration page will provide you with all the information you need to register for and to submit to the challenge.

2. The DSF task documentation page includes information about the DSF task itself.

3. The methods page presents the overall process for participating in this challenge.

4. The data page contains both the human performance data and protocols for the calibration of your model.

5. The references page contains papers that provide additional details of the task or modeling comparison methodology.

6. The FAQ page contains answers to frequently asked questions from participants.

7. The contact page provides the contact information for asking questions or reporting problems with the web site or with the task environment.

Organized in part by the Dynamic Decision Making Laboratory, a part of the Social and Decision Sciences Department at Carnegie Mellon University. For updates and comments regarding this website, please email