Assignment 3: Classification and Clustering

Start date 28 February, due 7 March beginning of class.

In real life, the breakdown between clusters is not always clear. For example, when clustering news articles, an article may talk about the relationship between two topics (e.g., sale of a port management company and terrorism.) Items may well belong to both clusters.

Devise a clustering approach that takes this into account, putting an item into multiple clusters when necessary. You are welcome (indeed, encouraged) to do this by modifying an existing clustering method rather than developing something new.

Specifically, you should turn in:

  1. A conceptual description of your method,
  2. brief pseudocode for an algorithm implementing your method (this could be just a discussion of changes from a standard method you use as a base),
  3. advantages/limitations: What things would it work well on? When would it produce a poor clustering?, and
  4. a brief description of how you could validate the effectiveness of your method.

Exercises from the Book

Complete the following exercises from the book.

  1. 7.7 (b)
  2. 7.22
  3. 8.5
  4. 8.16

Turning in assignment

Electronic submission preferred. Please email to Pdf is the safest for capturing non-text. Hard copy is acceptable, please hand in at the beginning of class.

