Assignment #2 (Solution Sketches)


  1. Comparisons of data mining methods


  2. As mentioned in class, stacking is a way of combining the performances of different classifiers and forming a "composite classifier" from several basic ones. Jeff Bradford has constructed one such classifer from three inducers (a modified decison tree inducer, IB and disc-naive-bayes). Here's his script. His results show that the IB algorithm yields 100% accuracy over the training set, so the ID3 stacking inducer always "selects" only this inducer to perform generalization. In a different case study, it is conceivable that no single classifier would be adjudged best and the decision tree constructed by ID3 would be truly "composite".


  3. This was a kind of open question and we wanted to get a feel for your ideas. The kind of mining depicted here is called "discovery" in AI circles and true enough, AM (Automated Mathematician) was one such program meant to do scientific discovery. To get a feel for the answer, lets revisit AM and how it does what it does. AM used a variety of general purpose techniques to identify concepts that are `interesting'. To this end, it has a set of heuristics that determine if something is worth investigating or not.

    For example, here is one such heuristic:

    If a function f(x,y) is interesting, consider what happens when x=y.

    If function f corresponds to multiplication, then this leads to discovering the concept of squaring. If f corresponds to division, then this leads to discovering that "A number divided by itself equals 1.". If f corresponds to subtraction, then this leads to discovering that "A number subtracted from itself equals nothing.", and so on.

    This para is verbatim from "Artificial Intelligence, by Rich and Kinight." In one run, AM discovered the concept of prime numbers. How did it do that? Having stumbled onto the natural numbers AM exploited operations such as addition, multiplication and their inverses. It created the concept of divisibility and noticed that some numbers had very few divisors. AM has a built-in heuristic that tells it to explore extreme cases. It attempted to list all numbers with zero divisors (finding none), one divisor (finding one: 1), and two divisors. AM was instructed to call the last concept "primes". Before pusrsuing this concept, AM went on to list numbers with three divisors, such as 49. AM tried to relate this property with other properties of 49, such as its being odd and a perfect square. AM generated other odd numbers and other perfect squares to test its hypothesis. A side effect of determining the equivalence of perfect squares with numbers with three divisors was to boost the "interestingness" rating of the divisor concept. This led AM to investigate ways in which a number could be broken down into factors. This led AM to investigate ways in which a number could be broken down into factors. AM then noticed that there was only one way to break a number down into prime factors (known as the Unique Factorization Theorem)......... [and finally, it hit upon] Goldbach's Conjecture which is widely believed to be true, but a proof of it has yet to be found in mathematics.

    Now, we ask ourselves, how do we replicate this behavior in a real life data mining system? First we need to ascertain the reason's for AM's success. One source of power, is obviously, the big lump of heuristics about what constitute interesting things. "The less obvious source of power is the natural relationship between number theoretical concepts and their compact representations in AM. AM worked by syntactically mutating old concept definitions - stored essentially as short LISP programs - in the hope of finding new, interesting concepts. It turns out that a mutation in a short LISP program very likely results in another well-formed LISP program. This accounts for AM's ability to generate so many novel concepts."

    Another way of saying the same is that AM is exploring the space of small possible LISP programs, and moreover concepts in number theory have succinct representation as LISP programs.

    This leads us to ask the question. What is best suited from our point of view? I feel genetic programming is best suited for the reasons stated above (and in fact AM can be viewed as a very primitive genetic programming exercise). Recall that genetic programming is different from genetic algorithms in that we are mutating and "crossing-over" actual programs which are the members of a population at any given point of time. ILP would also be fruitful provided you were able to enumerate all possible interesting concepts. This would also entail some kind of mathematical engine that keeps "feeding" input to your ILP engine.

    Note: These are just indicative solutions. More are possible if you can possibly determine ways of solving the "What is interesting" problem.


Return Home