This can be useful even if we do not know how social security numbers are assigned. Simply clustering along the high-order bits of a ``unique identifier'' is likely to group similar data elements. This similarity may be unknown. It may be chronological if the identifiers are assigned sequentially, or determine the source of the data element if individual data sources are given a ``batch'' of identifiers. The problem is that this grouping allows us to find similarities that would not be available otherwise (e.g. a high cancer rate among people with similar social security numbers), that lead us to look for further information (what else those people have in common?)
This can have security implications. For example, an organization that assigns telephone numbers sequentially based on location within a building could find its ``phone book'' mined to find out who is working on the same projects. Knowledge of the identifier assignment process is not necessary; simply finding a rule that a given group of people working on a known project can be determined from grouping on their telephone numbers can lead to the realization that people working on unknown projects can be guessed by grouping the telephone number.
The solution is to ensure that unique identifiers are assigned randomly; thus serving only as unique identifiers. This prevents meaningful grouping based on these identifiers, yet does not detract from their intended purpose.
This requires that we augment the data in non-obvious ways (otherwise it would be simple to reconstruct the original database).
This is just a start. Further work is needed to determine just what we can do to prevent unwanted data mining.