Purdue University - Department of Computer Science - GoBoiler Projects
Skip to main content

GoBoiler Projects

Purdue's 54 Computer Science faculty members run a PhD program covering 11 broad research areas.  The PhD program currently has around 240 PhD students from over 30 countries.  Several of the faculty propose the following list of potential projects for GoBoiler interns.  Please select one or more of these projects as potential work you would like to do if you are admitted to the program.

Each project link below indicates the Research Area - Faculty mentor name - Project title.

Graphics & Visualization

Prof. Daniel Aliaga

Cities are ecosystems of socio-economic entities which provide concentrated living, working, education, and entertainment options to its inhabitants. Hundreds of years ago, the significantly smaller population and the abundance of natural resources made city design, and even the functioning of cities in relation to their hinterland, quite straightforward. Unfortunately, that is not the case today with over 3.5 billion people in cities. Rather, cities, and urban spaces of all sizes, are extremely complex and their modeling is far from being solved. In this project, we aim to pull together CS, engineering, agricultural-economics, and social science to collectively exploit our unique opportunity to address this emerging problem. Research activities will span many fields and will perform cross-disciplinary research focused on the idea of designing and simulating the functioning of existing and future cities. Our desire is also to pool this knowledge, identify our unique strengths, and pursue large and ambitious computing projects.

Graphics & Visualization

Prof. Daniel Aliaga

Appearance editing offers a unique way to view visually altered objects with various appearances or visualizations. By carefully controlling how an object is illuminated using digital projectors, we obtain stereoscopic imagery for any number of observers with everything visible to the naked eye (i.e., no need for head-mounts or goggles). Such an ability is useful for various applications, including scientific visualization, virtual restoration of cultural heritage, and display systems. Our previous work has focused on virtual restoration of cultural heritage artifacts, on compensation compliancy, and on improving resolution and overall quality of appearance editing. Going forward we are looking to design and integrate mobile robots that carry the projectors and self-locate and self-organize so as to change the appearance of the target object. We have prototype robots already created using in-house 3D printing technology. Next, we need to improve control logic and various 3D reconstruction algorithms.

Information Security & Assurance

Prof. Jeremiah Blocki

In the last few years over a billion user passwords have been exposed to the dangerous threat of offline attacks through breaches at organizations like Yahoo!, Dropbox, LinkedIn, LastPass, AdultFriend Finder and Ashley Madison. Password hashing is a crucial `last line of defense’ against an offline attacker. An attacker who obtains the cryptographic hash of a user’s password can validate password guesses offline by comparing the hashes of likely password guesses with the stolen hash value. There is no way to lock the adversary out so the attacker is limited only by the cost of computing the password hash function millions/billions of times per use. A strong password hashing algorithm should have the property that (1) it is prohibitively expensive for the attacker to compute the function millions or billions of times (2) it can be computed on a standard personal computer in a reasonable amount of time so that users can still authenticate in a reasonable amount of time. Memory hard functions (MHFs) are a crucial cryptographic primitive in the design of key-derivation functions which transform a low-entropy secret (e.g., user password) into a cryptographic key. Data-Independent memory hard functions (iMHFs) are an important variant due to their natural resistance to side-channel attacks. Argon2, the winner of the password hashing competition, initially recommended the data-independent mode Argon2i for password hashing, but this recommendation was later changed in response to a series of space-time tradeoff attacks against Argon2i showing that the amortized Area-Time complexity of this function was significantly lower than initially believed (CRYPTO 2016). In this project students will be exposed to cutting edge research on the design and analysis of memory hard functions and will have the opportunity to help implement and evaluate state of the art constructions (e.g., EUROCRYPT 2017, CCS 2017, 2018). 

An ideal student should have a strong background in mathematics and theoretical computer science (e.g., graph theory, data-structures and algorithms) and should be comfortable writing code (e.g., C, C++, C#, Python). The project can be tailored to the student's strengths. One aspect of the research will involve running intensive computational experiments. Another aspect will involve modifying current implementations of memory hard functions and evaluating these implementations. For students with an exceptionally strong background in theoretical computer science there are several challenging open problems to work on.

Information Security & Assurance

Prof. Jeremiah Blocki

The well-documented human tendency to select low-entropy passwords (e.g., 123456, letmein, password) dramatically increases the risk of sustained online password attacks in which an adversary attempts to break into a user’s account by repeatedly attempting the most popular passwords from a dictionary. In fact, large-scale online password guessing attacks are wide-spread and continuously qualified as one of the top cyber-security risks. One of the standard defenses against online password attacks is to adopt a k-strikes policy in which a user is temporarily locked out after typing in the wrong password k-times in a row. Selecting the value of k induces a classic security/usability tradeoff. Set k to be too small and you will annoy legitimate users by locking them out of their account after they miss-type or miss-remember their password. Selecting k to be too large and we increase the risk of an online attack e.g., a sustained online attacker can now attempt more password guesses in a fixed time interval (e.g., one week, one month or one year).

In this project students will help to implement and evaluate several distribution-aware password lockout mechanisms that we have developed. The key insight behind the new ``distribution-aware” approach is that there is usually a vast difference between the mistakes a legitimate user might make when entering his/her password (e.g., typos, capitalization mistakes, mistaken number substitutions etc…) and the guesses an online attacker would like to attempt (e.g., 123456, password, letmein). Thus, a distribution aware password mechanism might choose not to lock the user out after a relatively large incorrect guesses provided that none of the corresponding passwords are less-popular. On the other hand a distribution aware password mechanism might lock the user’s account relatively quickly when it detects a sustained guessing attack with multiple different popular password guesses. An ideal student will have a strong background in mathematics and should be comfortable writing code (e.g., C, C++, C#, Python).

Networking and Operating Systems

Prof. Doug Comer

The Internet of Things (IoT) refers to the networked interconnection of devices, such as appliances, lighting systems, audio-video entertainment systems, andvehicles.  The motivation is to provide a way for users to interrogate and control such systems from any location (e.g., an owner can use a cell phone to control the lights in their house).  Many IoT devices will use low-power wireless network connections, such as the IEEE 802.15.4 standard.  The Wi-SUN alliance is developing a new set of protocols for use with IoT devices, and our project is building a full reference implementation of the protocols on the Xinu operating system. We are designing a system that is powerful as well as small, elegant, and power-aware.  In addition to assessing the protocols, we are looking for ways to make the protocol stack robust and adaptable to electrical noise and other radio interference.  One subproject is investigating a novel testbed design that allows us to assess wireless protocols in a way that makes all measurements reproducible, independent of radio-frequency background noise.

Networking and Operating Systems

Prof. Doug Comer

Data centers are used to provide public and private cloud services.  Companies like Amazon, Microsoft, Google, and Facebook's all have data centers.  Computers in a data centers are called "servers", and a Data Center network interconnects all servers with both the data center storage facility and the global Internet. Networking is complicated because Data Centers employ virtualization software, such as VMWare and Virtual Box, that allows virtual machines or containers to be moved from one server to another (to balance the computational load across servers).  Moving a VM causes a problem for Internet protocols because the protocols assume an IP address is assigned to a device that does not move. Our project is exploring new network addressing and routing mechanism that will allow a VM to retain its IP address when it moves without incurring extra overhead.  The scheme changes the interpretation of addresses inside the Data Center, but still provides end-to-end addressing when a host outside the Data Center communicates with a server inside the Data Center.

Networking and Operating Systems

Prof. Doug Comer

Many embedded systems are built around a System on Chip (SoC) -- a single VLSI chip that contains a processor, memory, and I/O interfaces.  We are working with software for SoC systems.  At present, we are using the Galileo board that employs Intel's Quark processor to provide an x86 architecture and BeagleBone boards (Black and Green) that each employ an SoC from TI to provide an ARM architecture.  We have ported the Xinu operating system to both platforms, and continue to expand the capabilities of the software.  Specifically, we are exploring external interfaces (such as GPIO capabilities), memory management (including page table management), and networked facilities, such as a remote file system.  We are also considering multicore systems.

Machine Learning

Prof. Dan Goldwasser and Prof. Jean Honorio

Abstract meaning representation (AMR) is a challenging problem in natural language processing, in which one aims to transform a given text into a rich graphical representation of its meaning. A rooted, directed, acyclic graph is a semantic formalism where nodes represent concepts, and labeled directed edges represent the relationships between them. When learning in a supervised setting, the learner receives a set of sentence/graph pairs for training. A successful learner then correctly predicts graphs for previously unseen sentences. Our project aims to analyze long texts, for which previously developed heuristics do not work well in practice.

Students working in this project will get knowledge of the relevant machine learning and natural language processing techniques used for parsing AMR sentences. Possible research activities include the design and evaluation of proper feature mappings, which are very important for guiding the learning process. Another activity would be the experimental comparison of the performance of different competing methods on several publicly available datasets.

Information Security & Assurance

Prof. Aniket Kate

Cryptocurrencies such as Bitcoin and Ethereum have emerged as a paradigm shift for the way payment systems work today. Cryptocurrencies rely on the blockchain, a technology that has been proven useful in a vast number of applications other than monetary transactions. Many companies today are tailoring the blockchain technology to their business logic and successfully developing applications for credit settlement networks, supply chain, IoT and beyond. However, these separate efforts are leading to incompatible individual systems. This contrasts with our highly interconnected world and it is inevitable to see that soon these blockchains will need to operate with each other, effectively forming a network of blockchains where transactions can flow through a sequence of blockchains, similar how the network of networks (i.e., the Internet) works today.

In this project, we plan to explore how to connect the different blockchains in a secure manner, a task that requires to research into many interesting and challenging problems. For instance, in such a scenario it is crucial to devise an scalable mechanism to find routes between two users enrolled in different blockchains. Moreover, we require an accountability mechanism for provable guarantees of delivery of funds in a transaction. In a bit more detail, we want a proof that convinces the sender of that funds are delivered to the receiver, while receiver cannot falsely claim that such delivery has not occurred. In summary, in this project, we will design and evaluate the tools required to move money the same way as the information moves today, therefore enabling the Internet of Value.

Information Security & Assurance

Prof. Aniket Kate

Today most people are susceptible of oversharing their personal information on social networks such as Facebook, Twitter, Instagram, and Snapchat. This oversharing raises numerous privacy concerns for these users. Therefore, most online social platforms offer mechanisms allowing users to withdraw their information. In fact a significant fraction of users exercise this right to be forgotten. However, the existing withdrawal mechanisms leave users more vulnerable to privacy attacks: due to the now popular "Streisand effect", a phenomenon whereby an attempt to hide some information has the unintended consequence of bringing particular attention of public to it, including curious friends, cyberstalking and even blackmailers, who can now focus only on the withdrawals to see which among those were sensitive.

In the past, our team has started to tackle this problem by introducing inactivity based withdrawals and even a more disruptive solution of interrupting the availability of non-withdrawn post to provide differential-privacy inspired privacy guarantees. Due to the importance of this problem and lack of much prior work in this field, there are still lots of work that needs to be done towards providing privacy for content deletions. For example, We have shown the privacy guarantees for social platforms such as Twitter, however, there is also a need for such systems in archival platforms e.g. archive.org. Another important study that we look forward to perform is a user behavioral study for the introduced systems to observe the effectiveness and practicality of these privacy systems in the eyes of users. In this project, we take the next few step towards making social network users forget things that they have forgotten.

Networking and Operating Systems

Unmanned aircrafts (aka. drones) are emerging to expand our activity scope from the ground to the sky. Connected drones will enable a large variety of applications such as infrastructure inspection, targeted surveillance, search-and-rescue operations, public safety, transport and logistics, and so on. These thrilling applications require that drones can fly over a wide area, best in an autonomous manner. However, drones are currently controlled by the controller at the user’s hand over WiFi, which limits the flying range (likely up to 1-2 km) and is not truly unmanned. 

In this project, we aim to make unmanned aircrafts truly unmanned. We propose to build cellular-enabled drones which connect with a remote controller service via cellular connectivity so that it can fly freely over the existing wide-area cellular coverage. To support real-time control, command and traffic delivery, we develop enhanced algorithms in both network (communication) and compute, which will first pursue ultra low-latency and high availability over dynamic radio channels through network protocol and function enhancement, and leverage edge computing and application optimization to tackle and tolerate latency and availability issues when the sole network solution is not sufficient. The outcome of this project will be an app which empowers unmanned aircrafts to fly unmanned (with a real-time monitor and controller remotely at the edge).  

Networking and Operating Systems

In the past, our team has developed an experimentation and data collection testbed called MI-LAB and has collected a huge amount of data of 4G network operations from a global-scale study over user smartphones. Such data records cellular network operations from 35+ 4G operators over 15 countries and convey rich information for network analytics. This dataset is the first, invaluable asset to the community which empowers students, professors and researchers to study real networks without requiring data and assistance from operators, and thus potentially accelerates the innovations in 4G and 5G networks. However, the MI-LAB testbed requires crowdsourcing of participating smartphones and faces with a big challenge on their privacy concerns. This calls for privacy-preserving techniques which extract sufficient information for network analytics, without leaking sensitive or private information including but not limited to personal identity, device identity, location, mobile data traffic and user activities. In this project, we are going to develop this essential technique and integrate it into the operational MI-LAB for a sustainable ecosystem of data collection, analytics and sharing.  

BioInformatics and Computational Biology

Prof. Alex Pothen

The human brain mapping project seeks to map neurons and synapses in the brain to understand how they are involved in the functions of the brain. There are a 100 billion neurons and 100 trillion synapses in the human brain, and current technology cannot map them individually. Nor can current algorithms model the neurons and the synapses at this scale. However, coarser models that can measure blood flow in the various regions of the brain can identify regions of the brain (voxels) that are activated under particular conditions. We are interested in taking two or more brain graphs (of the same individual under different conditions, or different individuals) created from functional MRI data, and align the subnetworks in them to each other. By aligning the subnetworks, we should be able to confidently identify regions of the brain that are responsible for particular functions. This problem involves a sophisticated mathematical formulation (integer linear programming) with matching algorithms in graphs. In this project, we will study an algorithm that we have proposed in earlier work (Supercomputing 2012) that is capable of aligning networks with hundreds of thousands of nodes and edges in a few seconds. We will implement this algorithm on a multithreaded desktop computer, create efficient implementations, and evaluate it on several brain graphs.

Machine Learning

Prof. Bruno Ribeiro

Deep learning has recently made great strides in solving increasingly complex tasks by simply mapping vector inputs to desired outputs. Still, the fundamental problem of building neural networks that can account for pre-defined input invariances of these vectors remains largely open. For set invariances, the learned model should give the same output regardless of any permutation of the input vector; for graph invariances, the set of all isomorphic graphs is an input invariance. The goal of this project is to make progress in a mathematical framework that will result in a novel deep learning methods that can encode invariances based on known relationships in the input data. The project will contemplate a variety of applications, including language understanding, symbolic reasoning, reasoning with knowledge graphs, and predicting the chemical properties of molecules. (https://arxiv.org/abs/1811.01900)

Programming Languages & Compilers

Prof. Tiark Rompf

Flare is a big data and machine learning platform developed here at Purdue (https://flaredata.github.io). Flare can transparently accelerate pipelines implemented in Apache Spark and TensorFlow, and provides speedups of 10x-100x, thanks to cutting edge compiler technology. For this summer research project we are looking for students with a strong systems background (databases, distributed systems, compilers). The goal will be to extend Flare in one of several possible dimensions: implement code generation for distributed execution (using MPI or similar), implement streaming abstractions for incremental data processing, implement code generation for GPUs, implement a cost-based query optimizer, implement new internal compiler optimizations, implement case studies based on various workloads.

Programming Languages & Compilers

Prof. Tiark Rompf

Lantern is a machine learning framework developed in Scala (https://github.com/feiwang3311/Lantern), based on two important and well-studied programming language concepts, delimited continuations and multi-stage programming (staging for short). Delimited continuations provides a very concise view of the reverse mode automated differentiation, which which permits implementing reverse-mode AD purely via operator overloading and without any auxiliary data structures. Multi-stage programming leading to a highly efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch). This project will extend Lantern in various ways, adding new compiler optimizations or implementing case studies on state-of-the art deep learning models.

Programming Languages & Compilers

Prof. Tiark Rompf

Despite the recent successes of deep neural networks in fields such as image recognition, machine translation, or gameplay, big challenges remain in applying deep learning techniques to applications that require symbolic reasoning: theorem proving, software verification and synthesis, and various forms of discrete optimization, including compiler optimization and query optimization in databases. This project aims to make progress in this exciting space, in particular by adopting techniques from Gameplay (AlphaGo, etc) to symbolic problems. Various concrete tasks are possible, see our recent research seminar for inspiration (http://tiarkrompf.github.io/cs590/2018/).

Programming Languages & Compilers

Prof. Tiark Rompf

We have several ongoing strands of work that explore the foundations of programming languages, in particular compiler transformations, effect systems, approaches for verification and static analysis. To increase the trust in our theoretical models and results, we mechanize most of these artifacts in the Coq proof assistant. For this project, we are seeking students who already have experience with Coq but who want to deepen their expertise and apply their skills to concrete research developments.

Programming Languages & Compilers

Prof. Roopsha Samanta

The goal of this project is to enable construction of provably safe and robust machine learning applications through effective integration of symbolic reasoning, probabilistic inference and continuous optimization. The project will involve development and/or implementation of algorithms for verification/repair/synthesis of safe and robust machine learning applications.  The ideal candidate will have a strong background in formal methods/programming languages and familiarity with machine learning frameworks.

Programming Languages & Compilers

Prof. Roopsha Samanta

This project aims to develop foundations, algorithms and tools for a principled approach to scalable program repair based on abstract interpretation.  The ideal candidate will have a strong background in program analysis and have excellent programming skills. 

Networking & Operating Systems

Prof. He Wang

Consider business analytics, if we are able to understand human locations, postures and gestures in a shopping mall, we will be able to understand their shopping behavior. Interestingly, this is already happening extensively in the web - our clicking patterns, mouse movements, etc. are all driving a billion-dollar business, called web analytics. Our footsteps through indoor environments are indeed like our clicking patterns. When we look at a cereal box in the grocery store, it is indeed like right clicking on an online item. Web and "physical world analytics" are so similar, and yet physical analytics is just not present today, simply because we don't have the ability to understand locations, postures, and gestures. If we enable indoor localization, posture and gesture inferences, we can enable physical business analytics. In this summer project, we will design and implement physical business analytics systems.

Last Updated: Jan 30, 2019 11:33 AM

Department of Computer Science, 305 N. University Street, West Lafayette, IN 47907

Phone: (765) 494-6010 • Fax: (765) 494-0739

Copyright © 2018 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.