Research Advisor |
Project Title |
|||
|
|
Improving Application Security Using system Call Sandboxing | ||
| Alex Hanna | Dr. Alok Chatuverdi | The Science of Agent-Based Evacuation | ||
| Stan Luban |
Dr Daisuke Kihara | Comparative Study of Small RNA and Small Peptides in Complete Genome Sequences | ||
| Yen Hock Tan | Dr Daisuke Kihara | Comparison of Distantly Related Protein Sequence Alignments with Multiple Amino Acid Similarity Matrices | ||
| Ei Ei Phyu | Dr Daisuke Kihara | Computer Graphics Program in Foreign Language | ||
| Rajasekar Karthik | Dr. Buster Dunsmore | Configurable Interfaces | ||
| Brian Trisler | Dr Daisuke Kihara | Structure-basis of protein-to-protein interaction | ||
| Jonathan Kilroy & Guy Curry | Dr. Rajesh Subramanyan | Model Based Automated GUI Testing | ||
| Ben Hayward & Greg Austin | Dr. Rajesh Subramanyan | Finite System Analysis and Optimization and its Relevance to Blackjack |
Computer viruses are a serious and growing threat in today's computing environment. A common way to get a virus is via the internet and buggy software. A user might run an application which may be vulnerable to malicious attacks through the internet. If a virus exploits that program to gain entry to the computer and goes undetected by anti-virus software, there is very little protection and the virus can have its way with the computer.
Our project aims to add a layer of protection between an exploited program and your computer. The idea would be that while there are lots of ways to exploit a piece of software, once that software is exploited it will have to deviate from its normal operation, in order to cause debilitating damage to the entire computer. The behavior we focused on for our project was the resources that a program uses (mainly files), how it uses those resources (i.e. read, write, etc…) and the programs (processes) that it runs. We made use of a technique known as “sandboxing” to restrict the actions of a program. The basis for this is that if the normal actions of a program are confined to a certain set and on some occasion the program tries to perform an action outside of this set, you would not want to allow this action to go forward without at least taking a close look at it first.
We focused our efforts on implementing such a sandboxing scheme for the Windows operating system. We first had to devise a method to profile what normal behavior was for a given program and once we had this profile we had to develop a mechanism to enforce it. We do this by examining a program's behavior at the system call level, the point at which a users program interfaces with the operating system. Looking at all the system calls a program makes required the use of a driver that replaced the normal table of system calls with itself and then after doing what it needed to, passed control onto the original system call. We were fortunate to have found a freely available driver that intercepted system calls and printed them to the screen. We were able to build on this driver to both glean the needed information to make a profile and enforce the profile. Enforcing the profile consists of taking a given system call that a program has made and comparing it to a list of allowances in the profile. If the call is determined to be allowed then the call is passed along to the real system call, otherwise the driver returns an error. The fact that this driver is in between the OS and program is transparent to the program and denying a program from making a certain call appears to the program as if the call just failed as might happen in the course of normal operation for any of a number of reasons. This automatic denial could also be augmented by a prompt to the user, asking if the blocked behavior should really be blocked or if in fact it should be added to the allowances as legitimate behavior reducing the chances that something that shouldn't be blocked will be.
Our scheme could strengthen the defenses of application programs and provide better integrity for software on Windows systems. While our implementation focused on sandboxing one program at a time, improvements could allow the sandboxing of all vulnerable programs in a system resulting in a greater level of protection against the ill effects of malicious software and the headaches it causes.
The Science of Agent-Based Evacuation
Alexander Hanna, Computer Science/Mathematics
Research Mentor : Alok Chatevurdi, Management
(back to top)
As the complexity of buildings increases, it becomes more and more challenging to provide a satisfactory level of fire safety in the buildings. Even if the building satisfies modern fire safety codes, it does not necessarily guarantee needed safety levels to the occupants. In the recent years it has been a tendency to move away from prescriptive fire codes to a performance-based approach.
In the present work, a state-of-the-art integrated environment to study interaction among fire and agents was created. Evacuation from a typical office building on fire with a large number of rooms, several hallways and exits was studied. For the fire simulations NIST large-eddy simulation code Fire Dynamics Simulator ( FDS ) was used. FDS code provided time resolved temperature, heat release, CO, CO2, and soot distribution in the building. An intelligent path-finding algorithm was developed and implemented to navigate the agents through the geometry of the building while taking into account the effects of temperature, heat release, carbon monoxide, carbon dioxide, and soot, while also having an effect on agent health. The created integrated environment was designed to provide the bridge connection between two simulations for data transfer.
It was shown that fire position, agents' positions, and number of exits available affect significantly agents' health and death toll. The results can be used for better fire safety building design and regulations.
Comparative Study of Small RNA and Small Peptides in Complete Genome Sequences
Stanislav V. Luban, Computer and Biological Sciences
Research Mentor : Daisuke Kihara, Computer and Biological Sciences
(back to top)
In addition to protein-coding genes which produce messenger RNAs, many genes exist that yield non-coding small RNA (sRNA), which have structural, regulatory, or catalytic function. Recently, various sRNA families have been discovered both by experimental and computational approaches from intergenic genome sequence regions. Computational detection of sRNAs is relatively difficult because, unlike protein-coding genes, sRNAs have no easily exploitable statistical predispositions. However, conserved RNA secondary structures can be localized using probabilistic models of expected mutational patterns in pairwise sequence alignments. Here potential sRNAs from 30 microbial genomes were identified using the program QRNA (S. Eddy et al.). QRNA scores and classifies alignments of conserved intergenic regions as potential non-coding structural RNA loci by means of the pair stochastic context-free grammar model and as normal coding regions or position-independent regions using the pair hidden Markov model. To identify conserved intergenic regions, a BLAST search was performed in 30 reference microbe genomes. Total of 29488 putative sRNA regions were localized and classified by clustering algorithms into 3768 families. To verify the results, identified sRNAs were compared with known structural sRNAs in the Rfam database. In addition, results specific to E. coli were verified with another study searching for sRNAs in that organism by localizing regions that contained a specific promoter and terminator in proximity. At a 10 nt threshold, 128 of the 277 (56.4%) sRNAs predicted were localized by the method used in the current research. Future work may include expanding the search to a larger quantity of more diverse genomes and experimental verification of putative sRNAs. Hypothetical coding sequences were also extracted from the intergenic regions. The pair hidden Markov model hypothesis was used in QRNA for the detection of these “small peptides.” If small proteins exist similarly to sRNAs in bacterial genomes, then large-scale sequencing sweeps would have missed them, since they would have been small enough to be disregarded as statistical noise. After extraction, hypothetical coding sequences were filtered using a multiple sequence alignment. An initial list of coding region families has been compiled and candidates have been compared to existing sequences in the Swissprot database. No significant matches have been found, but the search method is being refined. Further research can be used to significantly improve the extraction of the small coding regions from the putatively non-coding areas of genomes. Then the sequences can be clustered using existing clustering algorithms or slightly modified ones. A creation of a “small peptide” family list would be completely novel, and if it could be applied to the genomes of more complex organisms with larger non-coding sequences, it would open new doors in the identification of protein-coding genetic material.
Comparison of Distantly Related Protein Sequence Alignments with Multiple Amino Acid Similarity Matrices
Yen Hock Tan 1 , He Huang 2 , 1 Computer Sciences, 2 Biological Sciences
Research Mentor : Daisuke Kihara *2, 1
(back to top)
Distantly related protein sequence alignments are used in tools for identifying sequence homologies and predicting protein structures. Amino acid similarity matrix is a key in the dynamic programming algorithm that generates such alignments. As of today, there are more than 80 known amino acid similarity matrices. Among these, we selected a set of matrices that we designed to align distantly related sequences. We also included a set of matrices derived from pair wise statistical potential. Calculated alignments are compared to a benchmark set of structurally aligned sequence sequences, prepared for the fold level sequence homology. First, we investigated the accuracy of the optimal alignments using different gap penalties because these are key factors which govern resulting alignments. We observed that our set of derived matrices and a few other known matrices performed significantly well compare to the others. Second, we optimized our derived matrices by combining with several other matrices. We observed that the optimized matrices preformed similar to out derived matrices. Third, we calculated the correlation coefficient of all matrices used. We observed that there are 2 distinct clusters of matrices. Comparison of alignments in the family level and super family level sequence homology will be discussed. The effect of using sub-optimal alignments on the accuracy will also be discussed.
Computer Graphics Program in Foreign Language Instruction
Ei Ei Phyu, Computer Science
Research Mentors: Kazumi Hatasa and Diasuke Kihar, Computer Sciences
(back to top)
The purpose of this work is to enhance the learning process of foreign languages by developing an interactive application. The learner identifies virtually her/himself on the screen with a character that is simulated in real-time and located inside a given room. According to the scenario, instructions are given by another character outside this room to find a particular object and deliver it. This computer-based animation targets the following skills: identify an object, learn directions and understand practical instructions in a foreign language. The animation is cross-platformed, i.e. it can run identically on different platforms. It uses Irrlicht as a high performance, real-time, 3-D engine for the reason that it is open source, free, written and usable in C++ and is also applicable in . NET languages. Virtual reality is implemented with state-of-the-art features to account for character 3-D game-like animation and collision detection.
During the past semester, accomplishments were made to develop the rendering of the scene (room) in the same manner as 3-D cameras, allowing the view to be stationary or controlled. Lightings were used to make the scene more realistic. Collisions with the boundaries were detected to avoid objects passing through others (for instance, it is not allowed for the character to walk through walls). Movements of the character were programmed using MD3 (.md3) format for 3-D model files and models were stored in 3D model file format (.3ds). During this semester, research work was made the program accept a file as an input and follow efficiently the embedded commands.
The overall goal of this application is to produce a real-time environment that is aesthetically accurate enough to ease the learning of a foreign language. Further studies would focus on smooth displacements of the character, develop a user interface and enable full mouse control. Additional diversity in the rooms, objects and proposed activities could be considered.
Configurable Interfaces
Rajasekar Karthik, Computer Sciences/Mathematics
Research Mentor : Buster Dunsmore, Computer Sciences
(back to top)
Through this research, the authors suggest a product realization architecture that allows for rapid exploration of the design space for individually tailored products. Specifically, the product realization methodology focuses on the aspects of product design, and relies on analysis and simulation of physical systems to evaluate the design space from both a feasibility and optimality stand point. Analysis of physical systems is also important as it allows mapping from customer requirements to design variables representing the product. In this method, the subsystems of the product are instantiated by catalog based product configuration or product synthesis software modules. Such a product realization system involves constraint satisfaction techniques, and multi-criteria optimization techniques. In the current business environment, where location of knowledge centers and expertise is distributed, distributed constraint satisfaction becomes important to such a system.
A concept graph representation is used to depict the physical connectivity and relationships between product components and subsystems. The exploration of the design space is enhanced by using graph search techniques to retrieve past designs. Retrieval of past designs prevents duplication of effort, and allows reuse of past designs.
An industrial erector set is used as an example to explain some of the analysis related ideas. ILOG is used to develop a configurator for the industrial erector set. This example is a work in progress.
Structure Basis of Protein-to-Protein Interaction
Brian Trisler, Computer Science/Psychology
Research Mentor : Diasuke Kihara, Biological and Computer Sciences
(back to top)
The goal of this research is to elucidate patterns of the sequence and structure motifs involved in protein-protein interactions. To achieve this goal we will begin by ascertaining orthologous protein-protein interactions (interactome) among several organisms. We will reveal the essential interactome for sustaining life, as well as reduce intrinsic false positives in the interactome data. To identify patterns of interacting domains and motifs responsible for protein-protein interactions, we will identify those pairs which occur frequently in the interacting proteins. Functional domains will be assigned by referring domain databases. Finally, to identify the patterns of interacting structural motifs which are responsible for the protein-protein interactions, we will identify pairs of structural domains and local structures which occur frequently in the interacting proteins. In addition to the avai lab le experimentally determined structures, structures predicted by homology modeling and threading methods will be used. Compiled lists of interacting sequence and structural domains and motifs will be applied to interpret and/or predict protein-protein interactions of the other organisms. This extracted knowledge will facilitate the prediction of novel interactions in other organisms, including humans, in the future.
Model Based Automated GUI Testing
Guy Curry and Jonathan Kilroy, Computer Science
Research Mentors: Dr Rajesh Subramanyan and Dr Aditya Mathur
Computer Science
(back to top)
As software becomes increasingly user-friendly, it relies on the implementation of complex Graphical User Interfaces (GUI's). GUI software is more difficult to test than non-GUI software due to the large number of objects with complex interactions, and the need of human intervention in testing. Existing GUI testing methods have yet to balance scalability, error detection effectiveness, and ease of automation satisfactorily.
We perform a comparative analysis of two model-based testing methods, the W method, developed by TS Chow in the 70's and not originally intended for testing GUIs, and the more recent CIS method, developed by Lee White.
Using state representations defined by W and CIS methods, test generation software for W method developed by a previous student and improved by us, and code obtained for the CIS method, we have studied and analyzed errors for different specification-implementation pairs. Experiments were conducted by programming example GUI applications and introducing real world errors in the code for the purpose of identifying which errors are detected by one or both algorithms. CIS was found to have a simpler state representation, smaller test size, but needs human intervention in test generation and may miss certain errors. The W method is more precise, tests can be generated automatically, but has a larger test size. Further experiments are needed to determine conclusively the error detection capabilities of both methods.
Future directions include an empirical study of the fault detection effectiveness of existing methods and a proposal of an improved model-based GUI testing method that may be automated.
Finite System Analysis and Optimization and its Relevance to Blackjack
Ben Hayward and Greg Austin, Computer Science
Research Mentor : Dr. Rajesh Subramanyan, Computer Science
(back to top)
A mathematical approach to blackjack, this project attempts to relate the randomized but finite system of blackjack into an interpretable and advantageous format for the user. Employing a variety of existing mathematical techniques researchers worked to optimize a user's performance through the use of card counting and betting methodologies.
Using the visible cards, probabilities were calculated to provide success rates across all viable initial situations. By calculating the probability of occurrence for each success rate an overall player win probability was calculated. Using a simulation engine created for this purpose the game was played exhaustively in order to gather data for in game probability variance. Then, using card counting to decrease uncertainty, variations in the overall win probability were used to influence betting amounts. These betting variations were subsequently mapped back to traditional card counting techniques in order to give a standard player a reference for wagering standards. Also comprehensive computer generated card counting was used to detect variations in the decision making probabilities of whether to hit, stay, or double. These disparities were also mapped back to traditional card counting methodologies in order to provide a user with an optimized formula for decision making that takes the changing probabilities of the game into account.
The poster presents detailed information on how the probabilities were calculated as well as displaying the optimized approach and its results as compared to those of the standard playing style.
Undergraduate Research Day is sponsored by the College of Science at Purdue University on April 11, 2005. For current information on UG ResearchDay click here.