Apr 2, 2009 3:09 PM

Computer Program Self-Discovers Laws of Physics

In just over a day, a powerful computer program accomplished a feat that took physicists centuries to complete: extrapolating the laws of motion from a pendulum’s swings. Developed by Cornell researchers, the program deduced the natural laws without a shred of knowledge about physics or geometry. The research is being heralded as a potential breakthrough […]

In just over a day, a powerful computer program accomplished a feat that took physicists centuries to complete: extrapolating the laws of motion from a pendulum's swings.

Developed by Cornell researchers, the program deduced the natural laws without a shred of knowledge about physics or geometry.

The research is being heralded as a potential breakthrough for science in the Petabyte Age, where computers try to find regularities in massive datasets that are too big and complex for the human mind and its standard computational tools.

"One of the biggest problems in science today is moving forward and finding the underlying principles in areas where there is lots and lots of data, but there's a theoretical gap. We don't know how things work," said Hod Lipson, the Cornell University computational researcher who co-wrote the program. "I think this is going to be an important tool."

Condensing rules from raw data has long been considered the province of human intuition, not machine intelligence. It could foreshadow an age in which scientists and programs work as equals to decipher datasets too complex for human analysis.

Lipson's program, co-designed with Cornell computational biologist Michael Schmidt and described in a paper published Thursday in Science, may represent a breakthrough in the old, unfulfilled quest to use artificial intelligence to discover mathematical theorems and scientific laws:

Half a century ago, IBM's Herbert Gelernter authored a program that purportedly rediscovered Euclid's geometry theorems, but critics said it relied too much on programmer-supplied rules.
In the 1970s, Douglas Lenat's Automated Mathematician automatically generated mathematical theorems, but they proved largely useless.
Stanford University's Dendral project, was started in 1965 and used for two decades to extrapolate possible structures for organic molecules from chemical measurements gathered by NASA spacecraft. But it was ultimately unable to assess the likelihood of the various answers that it generated.
The $100,000 Leibniz Prize, established in the 1980s, was promised to the first program to discover a theorem that "profoundly affects" math. It was never claimed.

But now artificial intelligence experts say Lipson and Schmidt may have fulfilled the field's elusive promise.

Unlike the Automated Mathematician and its heirs, their program is primed only with a set of simple, basic mathematical functions and the data it's asked to analyze. Unlike Dendral and its counterparts, it can winnow possible explanations into a likely few. And it comes at an opportune moment — scientists have vastly more data than theories to describe it.

Lipson and Schmidt designed their program to identify linked factors within a dataset fed to the program, then generate equations to describe their relationship. The dataset described the movements of simple mechanical systems like spring-loaded oscillators, single pendulums and double pendulums — mechanisms used by professors to illustrate physical laws.

The program started with near-random combinations of basic mathematical processes — addition, subtraction, multiplication, division and a few algebraic operators.

Initially, the equations generated by the program failed to explain the data, but some failures were slightly less wrong than others. Using a genetic algorithm, the program modified the most promising failures, tested them again, chose the best, and repeated the process until a set of equations evolved to describe the systems. Turns out, some of these equations were very familiar: the law of conservation of momentum, and Newton's second law of motion.

"It's a powerful approach," said University of Michigan computer scientist Martha Pollack, with "the potential to apply to any type of dynamical system." As possible fields of application, Pollack named environmental systems, weather patterns, population genetics, cosmology and oceanography. "Just about any natural science has the type of structure that would be amenable," she said.

Compared to laws likely to govern the brain or genome, the laws of motion discovered by the program are extremely simple. But the principles of Lipson and Schmidt's program should work at higher scales.

The researchers have already applied the program to recordings of individuals' physiological states and their levels of metabolites, the cellular proteins that collectively run our bodies but remain, molecule by molecule, largely uncharacterized — a perfect example of data lacking a theory.

Their results are still unpublished, but "we've found some interesting laws already, some laws that are not known," said Lipson. "What we're working on now is the next step — ways in which we can try to explain these equations, correlate them with existing knowledge, try to break these things down into components for which we have clues."

Lipson likened the quest to a "detective story" — a hint of the changing role of researchers in hybridized computer-human science. Programs produce sets of equations — describing the role of rainfall on a desert plateau, or air pollution in triggering asthma, or multitasking on cognitive function. Researchers test the equations, determine whether they're still incomplete or based on flawed data, use them to identify new questions, and apply them to messy reality.

The Human Genome Project, for example, produced a dataset largely impervious to traditional analysis. The function of nearly every gene depends on the function of other genes, which depend on still more genes, which change with time and place. The same level of complexity confronts researchers studying the body's myriad proteins, the human brain and even ecosystems.

"The rules are mathematical formulae that capture regularities in the system," said Pollack, "but the scientist needs to interpret those regularities. They need, for example, to explain" why an animal population is affected by changes in rainfall, and what might be done to protect it.

Michael Atherton, a cognitive scientist who recently predicted that computer intelligence would not soon supplant human artistic and scientific insight, said that the program "could be a great tool, in the same way visualization software is: It helps to generate perspectives that might not be intuitive."

However, said Atherton, "the creativity, expertise, and the recognition of importance is still dependent on human judgment. The main problem remains the same: how to codify a complex frame of reference."

"In the end, we still need a scientist to look at this and say, this is interesting," said Lipson.

Humans are, in other words, still important.

Citations: "Distilling Free-Form Natural Laws from Experimental Data." By Michael Schmidt and Hod Lipson. Science, Vol. 324, April 3, 2009.

"Automating Science." By David Waltz and Bruce Buchanan. Science*, Vol. 324, April 3, 2009.*

See Also:

Image: Science*

Brandon Keim's Twitter stream and Del.icio.us feed; Wired Science on Facebook.