C. Aaron Rodgers
Master's in Computer Science - Purdue University - December 2016.GPA:3.8
Certificate of Completion for Machine Learning - Stanford University via Coursera - June 2016.
Certificate of Completion for Linear Algebra - U.T. Austin via edX - September 2015.
Certificate of Completion for Algorithms I - Stanford University via Coursera - July 2014.
Certificate of Completion for Cryptography I - Stanford University via Coursera - August 2013.
Bachelor's of Science in Computer Science - East Central University at Ada, OK - May 2013.GPA:4.0
Bachelor's of Science in Legal Studies, Minor in Mathematics - East Central University at Ada, OK - May 2011.GPA:4.0
Verbal - 99th %ile. Quantitative - 98th %ile. Analytical - 98th %ile.
Tools and Environments: HDFS, MapReduce, Hive, Spark, MPI, SAS, Netezza, Pthreads, Flex, Bison, Git, Node.js, MySQL, PostgreSQL, MS Visual Studio, MS SQL Server Management Studio, and MS SQL Server Reporting Services.
Baya: Mapping an Ambiguous Schema to XML with Logistic Regression
In summer of 2016 I completed an internship at IBM's T. J. Watson Research Center in Yorktown Heights, NY. My task for the summer was to apply machine learning to assist Watson Health staff in mapping inconsistently-populated Health Level 7 (HL7) messages to XML. I proposed that we use logistic regression to determine which tokens in a given HL7 message were the best candidates for our target schema fields. To test this approach, I implemented Baya, a 4-KLOC command line tool written in Java. I coded the logistic regression classifier and the gradient descent solver from scratch using matrix operations. Slides demonstrating my approach and my results are available here.
Parallelism over HDFS Encryption Zones
This was another project I completed at T. J. Watson. Some members of my team were implementing a service built on HDFS, and because they were storing medical data, they were required to encrypt data stored in HDFS. I suggested that we use HDFS encryption zones and ran a simulation to demonstrate the parallelism benefits of HDFS encryption. A shortened summary of my results is available here.
TCP Reno vs. TCP Vegas
A number of studies have noted that TCP Reno tends to crowd out TCP Vegas when both protocols are sending packets over a network. This happens because TCP Reno uses packet loss to sense network congestion, while TCP Vegas uses packet delay to estimate congestion and is therefore more conservative. A colleague and I collaborated on a Networks class project (FA-15) that sought to replicate these findings. We used Mininet to simulate the networks in our tests. At the end of the semester, we submitted our findings in a report.
Regression Analysis Benchmarks on SAS, Netezza, SparkIn summer of 2015, I interned with Humana. While there, I learned that data scientists at Humana were using regression analysis to do health outcome predictions for its customer base. I was interested in the computational costs of these analyses, so I designed and executed some regression benchmarks across three systems: SAS, Netezza, and Spark (on a cluster I rented through AWS). My benchmark results are available here.
P-to-K MapReduce in MPI
As a class project for Parallel Computing (FA-14), I tried to use primitive MPI functions to implement a reduce operation that was more efficient than the existing MPI Reduce() library call. As an added twist, the reduce operation would reduce to some number of reducers K, with K > 1. Although I was not able to beat the existing library call by much, I did learn a good bit about hypercube network topologies and the MPI library. My results are available here.
NAND to Tetris
This self-directed course teaches the student how to build a simple 16-bit computer from the ground up. The student begins by arranging NAND gates in a simulator to implement memory registers and an ALU. The student then builds a CPU, assembler, virtual machine, compiler, and operating system, each component building upon the last. I completed this course in summer 2014, investing around 160 hours total. I implemented the assembler, virtual machine code translator, and compiler in Java. To supplement the course, I implemented a few circuit primitives on a breadboard. Diagrams and a video demonstration are available on Youtube.
DictionaryIn this project, I synthesized an interactive English dictionary using lexicographic data from Princeton's Wordnet database, n-gram data from Google, and phonetic encodings from Carnegie Mellon's Pronouncing Dictionary. After parsing the source files with Python, I imported them into Postgre SQL, where I combined them and stored them as JSON objects (a nifty feature available in Postgre SQL since version 9.3). A sample of dictionary entries is available here.
Randomly Composing Shakespeare
Will a series of random keystrokes eventually produce one of Shakespeare's works? A co-worker posed this classic question to me recently, and my attempts to grapple with the problem eventually resulted in a pair of python scripts that can compute, via series or brute force, the probability of a non-overlapping word appearing in a string of n letters. Code is available here and here. My contribution to the solution on StackExchange is here.
This is a novel array sorting algorithm that I developed independently while taking Data Structures at ECU. The algorithm compares favorably with and is in some instances superior to insertion sort and binary tree sort. Documentation is available here. Sample performance test results are here. For those interested, the sorter code, together with a test rig and a collection of standard algorithms written by my classmates and myself, is available here.
Optimizing Binary Tree Deletions
This was an independent studies research course I completed in Spring 2013 with Dr. Bill Walker, professor of computer science at ECU. The topic of my research was the optimization of binary tree node deletion algorithms. The question that I hoped to answer was whether the degeneration of binary trees into linked lists could be prevented by alternately performing left- and right-handed deletions. My results, which I delivered at the Texas-Oklahoma Regional Undergraduate Symposium (TORUS) in February of 2013, are available here. While completing my research, I also coded an interactive applet that allows a user to graphically demonstrate the performance difference between two of the algorithms I studied.
Granting Certiorari: A Mathematical Model for Supreme Court Acceptance of Appealed Cases
I completed this research project while finishing up my bachelor's in legal studies. My supervisor was Dr. Anita Walker, professor of mathematics. The question addressed by this research project was whether the rate of acceptance for appeals under a given area of jurisprudence bore any relationship to the incidence of landmark decisions in that same area of jurisprudence. My results, presented at TORUS in February 2011, are here.