The End of Theory

There’s an excellent article over on Wired right now with interesting implications for our field. The End of Theory reads in part:

“All models are wrong, but some are useful.” So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all…

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves…

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise…

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

Let’s temporarily assume Chris is right, for the sake of argument. Could it be that educational research is finally on the brink of making an inch of forward progress? Do mediating educational technologies provide us with the opportunities to capture enough data that we could eventually do this “new kind of research?” Could access to this kind of data finally be the killer app for high technology in education?

Elsewhere in the article, Chris says,

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.

Amazon, of course, doesn’t ever ask you to explicitly state your preferences for genres of book. Netflix doesn’t ask for explicit information about your taste in movies. And Google doesn’t need semantic analysis do determine which page is better than another. Is there a time coming when access to a sufficient quantity of educational activity and performance data will finally stomp out the petri dish of poorly informed opinion that is the vast majority of educational research? Would you care if I couldn’t classify your learning style or aptitude a la Cronbach and Snow or your intelligence type a la Gardner if I could consistently give you educational experiences that you found enjoyable and effective? I suspect not.

5 thoughts on “The End of Theory”

  1. I like this idea of using data to design improved educational experiences for learners. Like others, guessing about what might or might not work seems like educational malpractice at times.

    Your closing question is intriguing. Learners probably don’t care, initially, about why a particular educational experience is effective or enjoyable, just that it is. But, at some point, is it important (for both the learner and the designer) to understand why a particular pedagogy, tool, experience, etc. is working?

    I don’t know for sure, but intentionality seems to have some value here, particularly if we want learners to become self-directed and develop the ability to drive their own learning in meaningful ways.

    I would argue for a balanced approach: more emphasis on using data to drive good design, but without abandoning theory and intentionality. I love Google, but I don’t think the “we don’t know why. . .and we don’t care” principle has perfect application to educational design.

  2. This is really about the power of massive amounts of data, where all correlations become significant. For marketing purposes, understanding may not matter because the mistakes will be predictably small.

    Nevertheless, when my husband accidentally used Amazon’s “one click” purchasing when I was the one logged in, Amazon suddenly saw me as a person who not only reads literature about culture and eduction, but also one who buys parts for power tools. I was barraged for months with recommendations for power drills, saws, and carpentry equipment, but also included were interesting book titles, so in a way, they can’t lose, though I did feel somewhat mis-identified as a customer.

    Some questions remain for me:

    Are educational goals as clear-cut as sales and marketing goals?
    Does data-driven education imply endless testing for students?
    Does data-drive education mean that we are more concerned with the quanitative result of the group rather than the depth of the learning experience for an individual?

  3. Kia ora David.

    “Petabytes allow us to say: ‘Correlation is enough.’ We can stop looking for models.”

    It depends entirely on what we want to use the model for. ‘The model’ is used for a vast number of purposes, all of which have different and valid reasons. I think one has to define the use of the model as well as the model itself, to have a valid reason for appraising or criticising it.

    In considering the learner, for instance, the cognitive apprenticeship theory allows us to be aware that though some learned experts may not need the model any more, the learner is still coming to grips with it. Even a poor model can be used appropriately when it comes to pedagogy, the definitive being that it works so that the next learning step can be approached.

    So I wonder at your generic statement about models. Even considered in context, I suspect that by eschewing the model out of hand, we could be losing the plot or part of it.

    Catchya later
    from Middle-earth

  4. Amazon, of course, doesn’t ever ask you to explicitly state your preferences for genres of book. Netflix doesn’t ask for explicit information about your taste in movies. And Google doesn’t need semantic analysis do determine which page is better than another.

    I think we should collect both; raw data and peoples’ explicit preferences. And people are probably doing so. After all it’s more data to collect, just a different “type”, so to speak.

    You’re mental model of what you want is an important information, whether it’s correct or not.

  5. For some reason this doesn’t sit well with me. Please tell me if I’m just being paranoid…

    If we begin focusing on correlation only don’t we risk empowering those who understand and work with these patterns of correlation and disempowering everyone else.

    I was just thinking of television when it was first developed. Everyone thought that TV would revolutionize education. In a matter of a few decades commercial interests were able to manipulate the content and medium to an ends that served their bottom line first.

Comments are closed.