Computer Science & Engineering

Research Experiences for Undergraduates


Project #7: Building the ecosystem for developing parallel programs

Faculty: I-Ting Angelina Lee

Existing tool support for debugging and performance engineering parallel programs is lacking. We are currently developing a set of tools for debugging and performance engineering parallel programs as well as a framework for supporting these tools efficiently. Currently, we are focusing on developing tools for Cilk programs,

a C/C++-based multithreaded language, which operates differently from an ordinary multithreaded program implemented using pthreads. That means, there are unique optimizations one can explore to efficiently support these tools.

Project #7A: Many of the tools require compiler instrumentations to gather detailed information about the execution of the parallel program being debugged. We are designing compiler instrumentations that are tailored specifically

to support Cilk programs.  Students will work with the PI to develop the compiler instrumentations and perform experimental study to validate the design.

Skills required: Familiarity with C/C++; basic understanding of how a compiler

work (e.g., having taken a compiler course);

Project #7B: As part of this project, we are also investigating data structures to support the tools efficiently.  In order to debug the program, these tools need to log detailed information about the computation. Interesting data structure questions arise in such a setting, since we want to minimize both the time it takes to access the data structure and

the space it takes to store the logged information, and there are opportunities for optimizations due to how a Cilk computation operates. Students will work with the PI to design and implement different data structures and perform experimental study to validate the design.

Skills required: Familiarity with C/C++; mathematical maturity including taking an undergraduate algorithms course; experience with parallel programming (in any language or any platform) is a plus but not required.

Project #7C: Finally, we could use more benchmarks!  As part of the project, we are looking into converting existing large-scale software into multithreaded Cilk programs.  Students will work with the PI to convert large-scale software into Cilk programs and potentially conduct "user experience" study with the tools we already have in place.

Skills required: Familiarity with C/C++; experience with parallel programming (in any language or any platform) is a plus but not required.

Project #8: Locality-aware concurrency platforms

Faculty: I-Ting Angelina Lee and Kunal Agrawal

We are building a platform for writing cache efficient parallel programs. As part of that platform, we are designing compiler and runtime transformations that can convert divide and conquer programs that are written without considering cache-efficiency and transform them so into cache efficient ones.  In order to develop these transformations, we plan to conduct an algorithmic study in order to understand the potential of these

transformations to improve performance. Students will conduct experimental study by implementing various algorithms

and work with the PIs to design compiler / runtime transformations. We will first consider relatively simple algorithms

such as merge sort, various linear algebra algorithms, FFT, etc --- basically algorithms with regular data access patterns and clear recursive structure. For all these algorithms, we will implement three versions --- the standard

divide and conquer algorithm that may not be cache-efficient, the cache-aware algorithm that obtains cache efficiency while knowing the parameters of the cache, and the cache-oblivious algorithm that guarantees cache-efficiency without using the parameters of the cache. We will compare the performance of these algorithms to understand the

implications of various algorithm design decisions.

We will apply the compiler transformations to the cache-inefficient implementation by hand to understand the potential performance improvement provided. Comparing this hand-transformed code with cache-aware and cache-oblivious algorithms to understand the performance implications of delegating the cache management to the compiler and runtime system.

Skills required: Familiarity with C/C++; mathematical maturity including an undergraduate algorithms course

Project #9: Concurrency platforms for streaming applications

Faculty: Kunal Agrawal

Streaming applications are used in many domains such as multimedia, astrophysics, finance, etc.  We are interested in developing scheduling strategies to improve the performance of these applications and implementing and evaluating these strategies using the Autopipe streaming platform.  Students will be involved in both application development and system design.

Skills required: programming experience in C/C++

I learned about new and developing technologies, which was exciting, and my experience will give me an edge over others entering the field.

Projects for Summer 2015

Project #1: Fast Distributed Algorithms for Large Data Analysis Using MapReduce

Faculty: Ben Moseley

The main goal of our project is in the development of sound theoretical foundations leading to the discovery of fast efficient methods for large data analysis using distributed methods and finding new ways of unlocking the underlying power of MapReduce through the development of new algorithmics.  Students will be integrated into the project by performing empirical evaluations of algorithms in MapReduce/Hadoop, a distributed computing framework.  Students will also be involved in developing algorithms for distributed computation and analyzing the algorithms. The main goals of this project are to develop and evaluate state of the art algorithmic methods on real world data sets for machine learning applications of submodular optimization and clustering as well as massive graph analysis using triangle enumeration. Beyond empirical evaluation of algorithms for these problems, the experiments will be used to better understand proposed theoretical models for the MapReduce framework and how well they accurately capture practice. 

Skills required: mathematical maturity including taking an undergraduate algorithms course. Programming skills. No prior experience with distributed computing or Mapreduce/Hadoop is necessary, but would be beneficial. 

Project #2: Campus-scale Indoor Mapping

Faculty: Yasutaka Furukawa

This project seeks to build an online indoor map of the Washington University campus. The map provides both photorealistic and non-photorealistic views of buildings in our campus. While outdoor digital mapping has made dramatic improvements over the last decade, indoor mapping is still in its very early stage and most buildings do not have any indoor mapping data. The project consists of many steps. Data acquisition, 3D reconstruction of building interiors from massive point clouds, annotation of the reconstructed 3D model, and the development of the digital mapping application/viewer. Students will work with a postdoc and/or graduate students in our group in tackling these problems. In addition to digital mapping, campus-wide indoor 3D models would facilitate novel applications in science, engineering, and commerce. The models enable us to 1) assess compliance with building codes, accessibility codes, and energy efficiency levels, 2) understand risks in disaster scenarios such as fire, earthquake, and hostage situations, 3) assist the visually impaired with indoor navigation, and 4) provide a platform to collect and study our indoor social behaviors.

Skills required: C++ coding experiences and linear algebra are required. WebGL or android programming experience is preferred but not required.

Project #3: Strategies for Spontaneous Teamwork

Faculty: Brendan Juba and Sanmay Das

We plan to organize a joint project this year around strategies for agents playing a cooperative exploration game as part of a larger team, without precoordination. The two projects will span theoretical work, systems work, and empirical work, and will accommodate students with a variety of backgrounds and interests. One project is to help develop a simulation platform for modeling agents playing "multi-armed bandit problems." This project will also involve collecting data to test and validate the simulations. The other, more theoretical project is to develop optimal team

strategies for these "multi-armed bandit" games. Part of this work will involve an analysis of the fundamental limits of what teams can achieve in these games.

Skills Required: mathematical maturity for the theoretical project (familiarity with game theory and/or machine learning is a plus); proficiency with Java, C/C++, or Python for the simulation project.

I liked having close contact with Ph.D. students and talking to professors.

If there's some topic or aspect of the research process you want to do, don't be afraid to talk to your advisor about it. If you don't, you may lose out on a good opportunity.

I really liked seeing so many new applications of the things I learned in my coursework. As someone who is definitely interested in computer science but isn't sure about where exactly my interests lie, this summer was really helpful.

Interested in working with our faculty?

     Find out more about their research!

           See something that excites you?

               Join us for a semester or a summer!

I enjoyed working on original research in a field about which I'd previously known nothing, with the chance to publish our findings.

My lab was a good environment for research, but the people were what made my summer really enjoyable.

Project #4: Tools and Techniques for Computer Network Offense and Defense

Faculty: Patrick Crowley

It is an unfortunate fact that most of the world’s computer networks experience frequent infiltration attempts. How are such attacks carried out and how can they be detected and defended against?  In this project, summer research assistants will study and develop proficiency in using contemporary methods and tools for both network offense and defense. Students will develop their skills in a controlled laboratory environment; once gained, these skills will be put to practice in a real-world network environment.

Skills required: programming experience in Python, Java or C/C++, basic knowledge of Linux/Unix, prior experience with open source tools such as nmap or Metasploit helpful but not required.

Project #5: Real-Time Security and Monitoring with the Passive Network Appliance

Faculty: Patrick Crowley

The Passive Network Appliance (PNA)—an NSF-funded open-source project developed at Washington University—is a high-performance, programmable platform for network monitoring and security. The PNA can be used to make sense of complex or malicious traffic in large campus and corporate networks. The PNA allows programmers to develop custom monitors to measure, model, or detect events of interest in network traffic. Summer research assistants will join the PNA team and develop monitors that provide new and useful capabilities, such as denial-of-service detection, SPAM detection, HTTP/web profiling, and video profiling.

For more information:

Skills required: programming experience in C or C++, basic knowledge of Linux/Unix, networking experience helpful but not required.

Project #6: Exploring the future Internet with Named Data Networking

Faculty: Patrick Crowley

Named Data Networking (NDN) is a new Internet architecture that capitalizes on strengths -- and addresses weaknesses -- of the Internet's current host-based, point-to-point communication architecture in order to naturally accommodate emerging patterns of communication. By naming data instead of device addresses, NDN transforms data into a first-class entity. The current Internet secures the connections between machines through which data is passed. NDN secures the data directly, a design choice that decouples trust in data from trust in hosts, enabling several radically scalable communication mechanisms such as automatic caching to optimize bandwidth. Another major consequence of the NDN architecture is reduced application complexity: by communicating with named data, applications can more directly express their communication patterns as compared to the current IP architecture. Existing applications include: file distribution, text chat, video streaming, and voice/video chat. Summer research assistants will study the NDN architecture, and develop original small- and medium-scale applications for PCs or Android that that illustrate the relationship between application complexity and the NDN design choices. Applications may include: a distributed Twitter-like service, peer-to-peer file sharing, and location-aware messaging.

For more information:

Skills required: programming experience in Python, Java or C/C++, basic knowledge of Linux/Unix or Android, networking experience helpful but not required.

Project #10: Parallel Platform for Real-Time Systems

Faculty: Kunal Agrawal & Chris Gill

In recent years, multi-core processor technology has improved dramatically, and more and more computers now contain parallel processors.  For example, Intel has recently put 80 cores in a Teraflops Research Chip, and ClearSpeed has developed a 96-core processor. As multi-core processors continue to scale, they provide an opportunity for performing more complex and computation-intensive tasks in real time.  However, to take full advantage of multi-core processing, these systems must exploit   intra-task parallelism, where parallelizable real-time tasks can utilize multiple cores at the same time.  We are building the first platform for parallel real-time tasks.

Students working on this project may work on either platform development or testbed development for the platform.  In addition, we are also setting up a shake table to do civil engineering experiments using our platform.  Students may also work on this shake table experiment setup.

Skills required: C/C++ programming experience. Civil engineering background or linear algebra knowledge a plus

Project #11: Growing Rice and Soybeans on a Computer

Faculty: Weixiong Zhang and Sharlee Climer

Complex traits of plants, such as tolerance of environmental stress and high yield, are controlled by complex gene regulation. Therefore, understanding gene regulatory mechanisms is of a paramount economic importance. We are developing novel and efficient computational approaches to elucidation of gene expression regulation. In particular, we are developing machine learning and datamining methods and software tools to identify "patterns" from a huge amount of gene profiling data from high-throughput platforms (next-generation sequencing and microarrays). Such "patterns" can be diverse, ranging from individual candidate genes to networks of interacting or associated genes which are characteristic of the biological processes determining the complex traits of interest.

Skills required: Proficiency with Java, C or Python

Project #12: Understanding Complex Diseases (e.g., Alzheimer's and cancer) through Systems Biology

Faculty: Weixiong Zhang and Sharlee Climer

Many complex diseases, such as Alzheimer's and many forms of cancer, are devastating and detrimental, leading to enormous economic and societal burden. Understanding the genetic basis of disease mechanisms is the key to developing effective therapies for such disease. We are developing on machine learning and datamining approaches to analyzing and integrating biological data to understand the causal relationship among genetic variation (e.g., mutations on DNA sequences), gene expression variation and disease phenotype.

Skills required: Proficiency with Java, C or Python

Project #13: Drones for Networking Communication

Faculty: Raj Jain

We are designing a network that can be quickly deployed, remain operationally stable despite frequent replacement of nodes, be fault tolerant in extreme situations, provide for easy maintenance and be cost effective for the users. We propose using a mini-drone (also known as unmanned aerial vehicles)-based system to create virtual cell towers that can be deployed quickly. Both software and hardware oriented research components are available for students.

Skills required: must have taken a first course in networking

Project #14: Creating an Example-Rich Programming Environment

Faculty: Caitlin Kelleher

Looking Glass is a 3D programming environment designed for kids with an online community. With Looking Glass kids can program their own 3D animated stories, remix other programs, and then share their creations to the community. For this project we want to utilize the shared programs as examples to teach kids programming within Looking Glass. For this project you will work with the Looking Glass lab to make Looking Glass an example-rich programming environment. To make sure we pick exciting examples for each user, you will need to filter and download examples from the online community that are customized to each kid's preferences and expertise. Then, you'll design and implement several ways to incorporate the examples into Looking Glass and then user test these changes to verify their effectiveness at helping kids learn new programming concepts.

Skills required: Basic programming skills

Project #15: Protein Modeling from Density Maps

Faculty: Tao Ju

Gorgon ( is a graphical environment co-developed by our lab and Baylor College of Medicine for modeling protein structures from density maps collected by imaging techniques such as x-ray crystallography and cryo-electron microscopy. The core of Gorgon is a suite of computational algorithms and interactive modules, allowing a biologist to obtain accurate models from an image in an efficient manner. For this project, you will work with the

faculty and a graduate student to improve the current software in one or multiple aspects, including but not limited to: improving the modularity and scalability of the software, building programming interfaces with 3rd party software and databases, adding new interactive features for structure building and validation, improving the graphical visualization of proteins and 3D volumes, and enhancing the underlying modeling algorithms. The faculty and you will together identify the type of work that best suits you.

Skills required: Experience in C++ and Python programming is required. Familiarity with 3D graphical programming and biology background are recommended, but not required.

Project #16: Active Drug Discovery

Faculty: Roman Garnett

The goal of this project is to develop intelligent algorithms for accelerating the drug-discovery process. A crucial step in the drug design process is "virtual screening." Here we have an identified biological target and a small number of compounds (perhaps one) known to show activity against that target. Our goal is to efficiently search that database for further examples of active compounds to investigate further as potential drugs. This is a difficult problem; the number of purchasable compounds numbers in the tens of millions and only a handful of these will show activity against a given target. Current virtual screening procedures are highly myopic and do not carefully consider the inherent decision-making process involved in selecting compounds for analysis. Our goal is to adapt and extend recently developed tools from machine learning (under the name "active search") to intelligently prioritize the screening process to identify potential drugs more quickly.

Skills required: Familiarity with MATLAB and machine learning and mathematical maturity.  Familiarity with chemistry/biology a plus but not required.

Project #17: Photo-Geometry -- Measuring the World With Images

Faculty: Robert Pless

Our project builds tools to answer questions such as: "How does the changing climate affect growth patterns of trees", or "Which public health programs are most effective at increasing how much time people spend being active outdoors", or "When was this picture that I found on Twitter actually taken?". We answer these questions by downloading and analyzing publicly available image data from webcams and Flickr and Twitter.  Students working on this project could help write mobile apps that ask people to help fill in missing data, web visualization tools that help people ask specific questions, or image analysis tools that extract and reason about image features.

Skills required: No specific skill is mandatory, but prior experience working with images, databases, web-design, or mobile app programming would give you a richer summer experience.

Project #18: Prioritizing Work in Biosequence Similarity Search

Faculty: Jeremy Buhler

Modern molecular biology experiments produce huge volumes of DNA and protein sequence.  Assigning meaning to these sequences frequently involves comparing them to massive databases of known sequences, or to probabilistic models such as protein and RNA families, or clustering them to create new families that can be compared to each other and to known families.  While many clever heuristics are known to make these computational tasks faster, today's data sets may contain terabytes of raw sequence, pushing the limits of available computing power. This project will explore algorithmic techniques to prioritize among many possible comparisons in the course of similarity search.  Prior knowledge of relationships among sequences or families in the database may be useful to reduce redundant searching and to maximize the number of distinct results returned for a given amount of search time. Different heuristics may be useful to rapidly identify some but not all matches in the database, which raises the problem of how to pick a small set of heuristics that will collectively discover almost all matches. Students will use these and similar ideas to help develop new and better high-throughput search algorithms for biosequence data, and will implement and test their ideas on real biosequence databases.

Skills required: C++ programming and facility with basic algorithms and data structures (sorting, hashing, and graph traversal). Familiarity with a scripting language such as Python or Perl is a plus, as is facility with optimization techniques such as dynamic programming and covering algorithms. Prior biology background is NOT required.