Greg Chaitin wrote in 1982: “… develop formal definitions of intelligence and measures of its various components [using algorithmic information theory, a.k.a. Kolmogorov complexity]”.   We make a case for the relationship between information theory and intelligence.  More specifically, we begin by introducing the Bayesian information-theoretic notion of Minimum Message Length (MML) machine learning,
and then we repeat our earlier claim from 1997 that the inductive learning part of intelligence is the same as doing a two-part compression of data – inferring a relatively simple model that fits the observed data relatively well.  We then go on to outline how independent work by the presenter and Hernandez-Orallo in the 1990s was extended in the mid-2000s by Legg & Hutter to attempt a definition of intelligence, again based in algorithmic information theory.  Their proposal considers how an agent might perform in infinitely many environments, and proposes giving the environments different weights depending upon their complexity.  We raise some concerns with this proposed definition, and then outline subsequent work by Hernandez-Orallo and the presenter in the Artificial Intelligence journal in 2010 which we believe to be the first universal test of intelligence – a test which can be interrupted any time and return a finite estimate of intelligence.
Such tests should help us quantify progress as computer programs become increasingly intelligent, as we approach and perhaps pass the technological singularity.

Depending upon time, we will attempt to address some of the following issues from 2011 work by Hernandez-Orallo, the presenter and colleagues – primarily from the 4th Artificial General Intelligence conference held at Google, California, U.S.A. in August 2011.
We attempt to explain the difference between two-part compression from the MML principle of Wallace & Boulton (1968) (which learns a single model from data) and the predictive compression regime of Solomonoff (1964) (which is slightly better for prediction, but typically does not learn a model).  We also outline the Darwin-Wallace distribution over environments, suggesting that this evolution-based distribution of environments (where agents are more likely to share environments with other agents with whom they might have co-evolved) might be more appropriate than some suggested distributions of environments.  And we also attempt to say something about using our abovementioned new information-theoretic tests of intelligence to compare human and computer program intelligence – a comparison whose results we expect to change with further technological progress.

1 Response » to “Abstract – (Bayesian/Algorithmic) Information theory, one- and two-part compression, and measures of intelligence – David Dowe”

  1. Tim Cole says:

    Rather challenging appreciate it, I do believe your current visitors would certainly want more content along these lines carry on the excellent content.

Leave a Reply to Tim Cole