Purdue University Computer Science
Chris Clifton Photo

Christopher W. Clifton

Professor of Computer Science
Professor of Statistics (by courtesy)


Biographical Information


    Purdue University
    Department of Computer Sciences
    305 N. University Street
    West Lafayette, Indiana, 47907-2107
    Office Phone: +1 765 494-6005
    FAX:          +1 765 494-0739
    Office:       LWSN 2142F

For biographical and historical information see my Curriculum Vitae (also in pdf and postscript).

Notes to students pursuing a Ph.D., Master's, Independent Study, desiring admission to Purdue, or interested in a summer internship.

Office Hours

Due to the number of irregular demands on my time, I find that I often have to cancel office hours, sometimes on short (or without) notice. I am generally in from 8:30-17:00, except during class (see below) or departmental seminars. To make an appointment, send email with some good times for you and I'll pick one and reply. You may also drop by my office anytime (no guarantees I'll be in, but often I can take time if I am.)


Spring 2016: Data Mining & Machine Learning (CS39000-DM0).


Fall 2017: Web Information Search And Management (CS47300).


Fall 2016: Information Systems (CS34800).
Spring 2016: Information Retrieval (CS54701).
Spring 2013, Fall 2009: Advanced Information Assurance (CS62600).
Spring 2012, Fall 2007, Fall 2002: Database Systems (CS541).
Spring 2011, Spring 2010: Programming I (CS18000).
Fall 2010; Fall 2004; Spring, Fall 2003: Information Security (CS 52600).
Spring 2009: Distributed Database Systems (CS54200).
2007-2008: Honors Program (CS197, CS397).
Spring 2007, Fall 2005: Computer Architecture: (CS 250).
Spring 2006, Spring 2005: Data Mining (CS 590D).
Spring 2004: Introduction to Data Mining (CS 490D).
Spring 2002: Advanced Topics in Distributed Systems (CS603).
Fall 2001: Security Issues in Data Mining (CS590M).

(Course evalustions from past semesters)

Warning: This page is somewhat out of date, see my C.V. for more up-to-date information.

Current Students

See C.V. for completed students.

Current Areas of Research

Listings below give only a representative sample of publications, see C. V. for full publication list.

Privacy in Text and Search

Text and search have been shown to pose particular privacy challenges, for example the AOL query log anonymization failure. We are developing techniques to allow the identification of relevant texts while controlling disclosure of information, both on the part of those searching for information, and those providing content.

Selected Presentations and Publications

Mummoorthy Murugesan, Wei Jiang, Chris Clifton, Luo Si and Jaideep Vaidya, Efficient Privacy-Preserving Similar Document Detection, The VLDB Journal 19(4):257-275, VLDB Endowment, August 2010.

Wei Jiang, Mummoorthy Murugesan, Chris Clifton and Luo Si, t-Plausibility: Semantic Preserving Text Sanitization, the 2009 IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT-09), Vancouver, Canada, August 29-31, 2009.

Mummoorthy Murugesan and Chris Clifton, Providing Privacy through Plausibly Deniable Search, 2009 SIAM International Conference on Data Mining (SDM09), Sparks, Nevada, April 30-May 2, 2009.

Students and Collaborators

See the NSF-funded project Anonymizing Textual Data and its Impact on Utility, in addition to the following.

This work supported by the Air Force Office of Scientific Research under MURI award FA9550-08-1-0265 and the National Science Foundation under Grant No. 1012208.

Privacy-Preserving Data Mining

Data mining relies on the collection of massive amounts of data - but this often collides with privacy considerations. How do we mine data when privacy concerns limit access to the data? We are developing technology to address this in the distributed case: the data to be mined is contained at multiple sites, but the sites are unable to release the data. The solutions involve algorithms that share some information to calculate correct results, where the shared information can be shown not to disclose private data.

Selected Presentations and Publications:

Jaideep Vaidya, Chris Clifton, and Michael Zhu, Privacy-Preserving Data Mining, Springer-Verlag, 2006.

Tutorial Privacy, Security, and Data Mining, presented at the combined conference 13th European Conference on Machine Learning (ECML'02) and 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02), Helsinki, Finland, 19-23 August, 2002. A later version concentrating on Privacy Preserving Data Mining was given at The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24, 2003, Washington, D.C.
Tutorial slides are available (PDF), as is a short briefing (PDF).

Workshop on Privacy, Security, and Data Mining held at the International Conference on Data Mining, December 9-12, 2002, Maebashi City, Japan.

Jaideep Vaidya and Chris Clifton Privacy-Preserving Data Mining: Why, How, and What For?, IEEE Security & Privacy, New York, NY, November/December, 2004.

Jaideep Vaidya and Chris Clifton, Secure Set Intersection Cardinality with Application to Association Rule Mining, Journal of Computer Security 13(4), IOS Press, November 2005.

Murat Kantarcioglu and Chris Clifton, Privacy Preserving Data Mining of Association Rules on Horizontally Partitioned Data, Transactions on Knowledge and Data Engineering 16(9), IEEE Computer Society Press, Los Alamitos, CA, September 2004.

Chris Clifton, Murat Kantarcioglu, Jaideep Vaidya, Xiaodong Lin, and Michael Zhu, Tools for Privacy Preserving Distributed Data Mining, ACM SIGKDD Explorations 4(2), January 2003. Invited paper.

Jaideep Vaidya and Chris Clifton, Privacy-Preserving K-Means Clustering over Vertically Partitioned Data, The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24 - 27, 2003, Washington, D.C. Honorable Mention in Best Paper Competition.

Privacy Preserving Distributed Data Mining, project funded by the National Science Foundation Information Techology Research program, August 2003-August 2006.

Privacy Preserving Distributed Data Mining, project funded by the Purdue Research Foundation, August 2002-August 2004.

Slides from Dr. Rakesh Agrawal's Distinguished Lecture at Purdue, November 11, 2003. (Purdue only)

Students and Collaborators

List of resources in this field courtesy Stanley Oliveira. Stanford PORTIA project reading list.

This work supported by the National Science Foundation under Grant No. 0312357.

Privacy and Anonymity in Data Integration and Sharing

Shared, integrated data sets have considerable value - envision medical researchers having access to worldwide patient records, or transportation planners having access to business' strategic plans. Releasing this data raises significant privacy concerns. We are developing techniques to allow integration and anonymization of data from distributed sources, allowing combining and sharing data without exposing sensitive information.

Selected Presentations and Publications:

Hiding the Presence of Individuals from Shared Databases, with Mehmet Ercan Nergiz and Maurizio Atzori, 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, June 11-14, 2007.

Thoughts on k-Anonymization, with Mehmet Ercan Nergiz, Data and Knowledge Engineering 63(3), Elsevier Science, Amsterdam, December 2007. Invited article, expanded version of PDM06 paper.

Mehmet Ercan Nergiz, Chris Clifton, and Ahmet Erhan Nergiz, MultiRelational k-Anonymity The 23rd IEEE International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 16-20, 2007.

Wei Jiang and Chris Clifton, A Secure Distributed Framework for Achieving k-Anonymity, The VLDB Journal 15(4): Special Issue on Privacy-Preserving Data Management, VLDB Endowment, November 2006.

Chris Clifton, AnHai Doan, Ahmed Elmagarmid, Murat Kantarcioglu, Gunther Schadow, Dan Suciu, and Jaideep Vaidya, Privacy Preserving Data Integration and Sharing, The 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD'2004) June 13, 2004, Paris, France.

Chris Clifton, AnHai Doan, Ahmed Elmagarmid, Gunther Schadow and Dan Suciu, Privacy-Preserving Data Integration and Sharing project funded by the National Science Foundation Information Techology Research program, September 2004-August 2007, $1,000,000.

Students and Collaborators

This work supported by the National Science Foundation under Grant No. 0428168.

Data Mining of Text

Text poses new challenges for data mining. The lack of structure makes it more difficult to focus on the goals of a data mining project - the interesting results are lost in a flood of meaningless rules and correlations. Techniques are needed that help focus the text mining process, while still retaining the ability to discover surprising and unexpected results.

TopCat: Data Mining for Topic Identification in a Text Corpus, with Robert Cooley, 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases, Prague, Czech Republic, September 15-18, 1999. Lecture Notes in Artificial Intelligence 1704, Springer-Verlag.

Query Flocks: A Generalization of Association Rule Mining, with Dick Tsur, Jeffrey D. Ullman, Serge Abiteboul, Rajeev Motwani, Svetlozar Nestorov, and Arnon Rosenthal, in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, June 1-4, 1998, Seattle, WA. (Also in SIGMOD eproceedings (postscript).)

Data Mining on Text, with Rick Steinheiser, The Twenty-Second Annual International Computer Software and Applications Conference, Vienna, Austria, August 19-21, 1998. (Also in IEEE Digital Library.)

Students and Collaborators

Data Mining for Transportation, Distribution and Logistics

Logistics, ensuring that goods are available when and where they are needed, is both a highly fragmented industry and one where information technology is driving significant savings. We are investigating techniques to enable analysis and optimization with shared data, without disclosing sensitive and proprietary information.

Selected Presentations and Publications:

Chris Clifton, Ananth Iyer, Richard Cho, Wei Jiang, Murat Kantarcioglu, and Jaideep Vaidya, An Approach to Identifying Beneficial Collaboration Securely in Decentralized Logistics Systems, Manufacturing & Service Operations Management 10(1), INFORMS, Linthicum, Maryland, Winter 2008.

Wei Jiang, Jaideep Vaidya, Zahir Balaporia, Chris Clifton, and Brett Banich, Knowledge Discovery from Transportation Network Data, The 21st International Conference on Data Engineering (ICDE 2005), April 5-8, 2005, Tokyo, Japan. Best paper, Industrial Track.

A Prototype Integrated Transaction Data Analysis and Visualization Environment for the Transportation, Distribution and Logistics Sector, project funded by the e-Enterprise Center at Discovery Park, August 2002-May 2003.

Students and Collaborators

Data Mining for Healthcare

Healthcare is another area where information technology has potential to bring significant benefit to a large and growing industry. We are investigating data analysis techniques to address a variety of challenges in the healthcare industry.

Selected projects

David Ebert (Director), Alok Chaturvedi, William Cleveland, Chris Clifton, Ahmed Elmagarmid, and Marc Overhage, Purdue University Regional Visualization and Analytics Center, Department of Homeland Security, January 2006-January 2007, $750,000.

Ann Hendrich (Ascension Health), Marilyn Chow (Kaiser Permanente), Nelson Lee (Rapid Modeling), William Cleveland, Chris Clifton, Jason Abrevaya, A Multi-Site Study of How Medical Surgical Nurses Spend Their Time: A Baseline Study in Preparation for an Electronic Health Record and an Evidenced-based Nursing Unit Design, Robert Wood Johnson Foundation, April 2005-August 2006. (Purdue CS portion $91,29).

Clement J. McDonald (PI) et al., Chris Clifton (Purdue PI) A Center of Excellence in Medical Informatics to Provide an Advanced Infrastructure for Human Research: A Catalyst for Indiana Research, Indiana 21st Century Fund, August 2004-August 2006, $3,832,196 (Lead: Regenstrief Institute for Healthcare; Purdue portion $50,000.)

Students and Collaborators

Other Data Mining and Security Topics

Privacy issues in distributed data mining is only one area where data mining and security interact. Other areas of research include security concerns posed by data mining results (the data isn't private, but what might be learned from it is) and applications of data mining to security (e.g., intrusion detection).

Yücel Saygin, Vassilios S. Verykios, and Chris Clifton, Using Unknowns to Prevent Discovery of Association Rules, ACM SIGMOD Record 30(4): Special Section on Data Mining for Intrusion Detection and Threat Analysis, Daniel Barbará, editor, December 2001.

Chris Clifton and Gary Gengo, Developing Custom Intrusion Detection Filters Using Data Mining, 2000 Military Communications International Symposium (MILCOM2000), Los Angeles, California, October 22-25, 2000.

Using Sample Size to Limit Exposure to Data Mining, Journal of Computer Security 8 (4), IOS Press, November 2000. Invited paper.

Security and Privacy Implications of Data Mining, with Don Marks, ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, June 2, 1996.

See also the tutorial above.

Identity and Administration for Information-Level Data Access Control

Traditionally computer system security has concentrated on building a shell around the whole system. Once through the shell, little protection is provided. However, most damage is done by insiders: either maliciously, or through inadvertent misuse. The problem is that the capabilities people are granted far exceed their authority. In particular, databases provide complex mechanisms supporting fine-grained access control - but in practice applications are generally given access to everything in the database, and access control is done in application code. This results in duplication of effort, and makes it difficult to ensure that access control does what it should. We need to develop practical, administerable methods for managing identity and data access control.

A Privacy Preserving Credentialing System for Health Care, with Ahmet Erhan Nergiz, Proceedings of the Workshop on Secure Knowledge Management (SKM 2008), Dallas, Texas, November 3-4, 2008.

Models and Mechanisms for Data-Level Security, Joint Purdue / MITRE (Co-PI: Dr. Arnon Rosenthal) proposal to the National Science Foundation Digital Governments program (references), July 2001.

Students and Collaborators

Postdoctoral Opportunities

I do not currently have funding to support postdoctoral visitors, but see the Institute for Information Infrastructure Protection (I3P) fellowship program.

Potential Independent Study Projects

If you would like to pursue a independent study with me, please see the instructions on proposing such study. Occasionally I have specific projects that may be of interest, often involving collaboration with corporate partners. These are listed below:

Stampin'Up card making supplies: Shari Evans Williams

This page last modified 10 January, 2012

Valid XHTML 1.1