Tuesday, March 6, 2018

Artificial intelligence vs data science - A Parallel between Chomsky's Cognitive Science vs Skinner’s Behavioral Science

AI is facing the same challenge today from data sciences that was faced by Chomsky’s Cognitive Psychology when it confronted Skinner’s Behavioral Psychology in early 1950s. The basic issue is whether “intelligence” can simply be understood by observing the input-output relations over a black-box representation of the mind? Is the success of Google data science in predicting outputs usefully from deep statistical analysis of big-data of billions of similar input-output relations a sign of artificial intelligence? Does accuracy of proving its machine generated conjectures through big data analytics can actually replace the need of humans to do science, i.e. to develop hypothesis, design experiments, observe results and develop theories? This talk focuses on some of the questions raised by Chomsky about the recent advancements in AI and draws parallel with clash of Cognitive Psychology with the Behavioral Psychology in the 1950. This talk would illustrate through this parallel the need for AI to understand and develop the internal representations of how humans intelligence work.

[Work in Progress]

Today there are predictions of Singularity event happening in 2040s wherein robots would be overtaking the humans in intelligence as well as in speech and locomotion. Would this event really signify robots acquiring intelligence as we humans understand it?

This post is triggered by Chomsky’s critique of data scientists adaptation of AI as exemplified in the recent Chomsky vs Norvig (CTO of Google) debate [1],[2]. It is also motivated by the prediction that there would be no need for any theory building and testing by humans [3].

AI vs Data Sciences: Chomsky’s Cognitive Science vs Skinner’s Behavioral Sciences

  1. Presented at ICOSST 2017, KICS UET Lahore, http://icosst.kics.edu.pk/2017/ 
  2. A modified form of this post was also presented as a keynote speech at the 2nd International Conference on Computer and Information Sciences at PAF KIET on March 26, 2018]

Who is Noam Chomsky

"Noam Chomsky is an American linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes described as "the father of modern linguistics," Chomsky is also a major figure in analytic philosophy and one of the founders of the field of cognitive science. He holds a joint appointment as Institute Professor Emeritus at the Massachusetts Institute of Technology (MIT) and laureate professor at the University of Arizona,[22][23] and is the author of over 100 books on topics such as linguistics, war, politics, and mass media. One of the most cited scholars in history, Chomsky has influenced a broad array of academic fields. He is widely recognized as a paradigm shifter who helped spark a major revolution in the human sciences, contributing to the development of a new cognitivistic framework for the study of language and the mind. In addition to his continued scholarly research, he remains a leading critic of U.S. foreign policy, neoliberalism and contemporary state capitalism, the Israeli–Palestinian conflict, and mainstream news media. His ideas have proved highly significant within the anti-capitalist and anti-imperialist movements." [From Wikipedia]

His ideas have found a wide application in diverse fields and can effortlessly switch across different fields.

Who is BF Skinner

"Skinner considered free will an illusion and human action dependent on consequences of previous actions. If the consequences are bad, there is a high chance the action will not be repeated; if the consequences are good, the probability of the action being repeated become stronger.[7] Skinner called this the principle of reinforcement. To strengthen behavior, Skinner used operant conditioning, and he considered the rate of response to be the most effective measure of response strength.

Skinner developed behavior analysis, the philosophy of that science he called radical behaviorism,[12] and founded a school of experimental research psychology—the experimental analysis of behavior. He imagined the application of his ideas to the design of a human community in his utopian novel, Walden Two,[13] and his analysis of human behavior culminated in his work, Verbal Behavior.[14] Skinner was a prolific author who published 21 books and 180 articles.[15][16] Contemporary academia considers Skinner a pioneer of modern behaviorism, along with John B. Watson and Ivan Pavlov. A June 2002 survey listed Skinner as the most influential psychologist of the 20th century. " [From Wikipedia]

Why they are important

Chomsky vs Skinner

Chomsky’s Cognitive Psychology

  • •Intrinsic Motivation
    • Remove external demotivators- Demming
  • Natural creativity
    • Self-learning, self-expression
  • Internal Human Mind Working
    • Language ability
  • Impulse; Not Habit
  • Cognitive Processor
    • Output is function of input, internal representations
    • Compiler program structures

BF Skinner’s Behavioral Psychology

  • • Extrinsic Motivation
    • Carrot and stick 
  • Conditioning 
    • Behavioral control
  • Non Human Organism Modes
    • Pigeons, rats, dogs
  • Habit
  • Behavior modification
    • Output is function of input
    • Based on history

Chomsky vs Skinner

Chomsky’s Language Conception

  • Complex internal representations
  • Encoding in genome
  • Maturation with right data into complex computational system
  • Cannot be usefully broken down into a set of associations
  • Language faculty
    • A genetic endowment like visual, immune, circulatory systems 
    • Approach: Similar to other more down-to-earth biological systems
BF Skinner’s Behaviorist Convention
  • Historical associations 
    • Stimulus => animal's response 
  • Empirical statistical analysis
  • Predicting future 
    • as a function of the past
  • Behaviorist associations fail to explain 
    • Richness of linguistic knowledge 
    • Endless creative use of language 
    • How children acquire it with exposure to only minimal language in environment

Impact of Behavioral Psychology

  • 20th Century management
    • Org behavior, HRM, Control, monitoring
  • 20th Century schooling
    • Text books, curriculum, grading, pedagogy
  • Problems: 
    • Value of book is inversely proportional to count of the word behavior occurs in the book Alfie Kohn
    •  Lack of creativity, innovation
    • Dumbing of students, employees

According to Chomsky:

Data Science with heavy use of statistical techniques to identify patterns in big data will not yield explanatory intelligence related scientific insights.

Eg. Google search "physicist Sir Isaac Newton"

New AI is unlikely to yield

“general principles about the nature of intelligent beings or about cognition”


  • Why learn anything if you can do a lookup!
  • Is human understanding necessary for making successful predictions? 
    • If “no,” then predictions are best made by churning mountains of data through powerful algorithms
    • Role of scientist may fundamentally change forever.
  • AIs attempt to use data science is like 
    • Students googling answers to math homework
    • Will such answers serve them well in the long term.
  • AI algorithms can successfully predict planets’ motion 
    • without ever discovering Kepler’s laws, 
    • Google can store all recorded positions of stars, planets in big data. 
  • Is science more than the accumulation of facts, producing predictions?

Fundamental Issues

  • Why vs How
    • – If we know “how”, why we want to know “why”?
    • – “Closing of American Mind”
    • – Simon Sinek ted talk on how leaders inspire
  • Death of theorizing? Death of scientific method
    • – Who needs theory, if the data can itself generate conjecture, and can itself statistically prove/disprove through statistical analysis of big data
  • Just because modeling internals of mind is difficult, we should find a work around!

References Links to be Added

[1] Chomsky on AI vs data science: 

Noam Chomsky on Where Artificial Intelligence Went Wrong

An extended conversation with the legendary linguist

[2] Norvig's rebuttal. Do go through the comments section of this link

On Chomsky and the Two Cultures of Statistical Learning


[3] End of Theory.


Illustration: Marian Bantjes "All models are wrong, but some are useful."

So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don't have to settle for wrong models. Indeed, they don't have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.

At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.

Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.

Speaking at the O'Reilly Emerging Technology Conference this past March, Peter Norvig, Google's research director, offered an update to George Box's maxim: "All models are wrong, and increasingly you can succeed without them."

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

The big target here isn't advertising, though. It's science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Now biology is heading in the same direction. The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.

In short, the more we learn about biology, the further we find ourselves from a model that can explain it.

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including IBM's Tivoli and open source versions of Google File System and MapReduce.11 Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.

Learning to use a "computer" of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?

Chris Anderson (canderson@wired.com) is the editor in chief of Wired.

1 comment: