SEMERU lab beefs up bug detection for software
Written by Carly Martin|
April 3, 2015
Your computer crashes as you bookmark a page, your iPad freezes when you select “Mayfair” on Instagram, and the GPS on your phone loses its coordinates every time you cross Jamestown Road. These irritating technological hiccups are the result of “bugs,” or faults, in the source code of applications. Source code is essentially the two dimensional, extended series of alphabetical and numerical code that your phone, computer or any device uses to run software. To create apps like the ghost icon on Snapchat or the sliding feature that unlocks a smartphone, software developers use specific universal “languages” that are compatible with certain operating systems.
To universally and definitively communicate the position of every pixel in said Snapchat ghost along with all of the other features of the program, the necessary instructions can be extensive. The internet browser Mozilla Firefox, for example, is the product of about four million lines of computer code. If a piece of this code is incomplete or contradictory with other instructions, you have a bug: Your Instagram filter of choice is off limits, and you’re hopelessly lost on Jamestown Road.
Bugs like these spell doom for software developers; flawed applications go straight to last place in sales in the cutthroat app marketplace. The ability to identify where bugs are hidden in source code is extremely valuable. Users describe the problem they have with a device to a developer and it’s the developer’s job to comb through endless source code to find the affected region and fix the problem. Right now, software companies spend about 80 percent of their resources on maintenance rather than development, and Dr. Denys Poshyvanyk’s computer science lab is working to address this disproportion.
Operating from McGlothlin-Street Hall, Poshyvanyk heads the SEMERU, an acronym inspired by Mount Semeru, an active volcano on the island of Java in Indonesia. The acronym stands for Software Engineering Maintenance and Evolution Research Unit. Their team focuses on improving the productivity of computer programmers by innovating new, more efficient methods of maintaining software in a cost-effective manner.
“What we’re trying to do is automate the tasks that software developers face in daily life,” Poshyvanyk said.
Currently, the task central to this lab’s research is figuring out a way to take natural language (or, how people who are not experts in computer programming describe a technical problem with their device) and synthesize that information into something a computer can understand and identify in source code. In other words, this lab hopes to develop a system that can take a user’s description of a bug and automatically find it in source code without a developer as a go-between. If this lab is successful, bugs could be flagged for repair as soon as a user runs into them.
To approach the “huge abstraction,” as Poshyvanyk describes, that exists between our natural language and computer code, he combines techniques from different areas of computer science, integrating different means of information retrieval together, along with both static and dynamic analysis.
In static analysis, one is given the source code and it is analyzed firsthand, whereas in dynamic analysis the program is run, data is collected, and the source code is analyzed with that information in mind. Poshyvanyk is also using machine learning to tackle the “abstraction,” which is an area of software engineering which describes how systems combine and categorize new information.
Kevin Moran, a masters of computer science student working in Poshyvanyk’s lab, demonstrated two of their main projects to me, titled Fusion and Monkeylab. Both specialize in producing clearer, more useful user summaries of software errors. I was amazed by how intuitive operating these programs were to someone like me, who is computer literate enough to type articles for The Flat Hat, but is still grappling with the concept of “machine learning.” I learned that these two projects deal with essentially the same problem: Right now, if a user experiences a problem with an application, they can report it online to the developer who can begin looking for the bug in all of the application’s source code. Often these reports do not have enough information to allow for an efficient search. Consequently, almost 80% of a software developer’s productivity goes toward looking for these bugs.
“The goal of a bug reporting system is to give information to a developer to one, reproduce the bug, to make sure that it’s not something that the user happened into once that is kind of an anomaly and two, give them detailed information related to the source code for them to fix it,” Moran said.
Fusion helps users create these error reports. It’s extremely promising because it has the potential to produce reports that describe the bug-caused malfunctioning event entirely. For programmers looking to fix a bug, it is crucial that they account for everything that happened prior to the malfunction. It’s those details that help pinpoint the location of the flaw in source code. Imagine you were having trouble locking your smartphone every time you tried to set up a specific passcode, and you wished to send the developer an error report so that this this problem could be fixed for the phone’s next software update. You would access Fusion, where you’d be guided through a strategic set of questions that culminate in an error report that would serve as a map to the problem once in the hands of the developer. These questions begin inquiring about the make and model of your device, and from there the program reproduces images of, say, your phone’s lock screen or keyboard, so you can easily identify which parts of the device you interacted with in the particular order that always leads to the malfunction you wish to report. This analysis sends more helpful information to the developer than the description you might have provided if you had, for example, called their customer service department and filed a verbal error report.
Monkeylab has a very similar aim: minimizing the amount of time software developers spend picking apart code to find the origin of a bug. Instead of a user-produced error report, this program collects kinetic information based on how the user interacts with the device before the malfunction. Here, your device with the faulty program is connected to a computer running Monkeylab, and the user replicates all of the steps needed to evoke the bug-caused malfunction. Each swipe on your phone’s screen, or key pressed, is recorded in a trace that automatically produces an error report with the exact, relevant information a developer needs to find a bug in the code, again, in much less time than is usually spent now.
“We are flipping it on its head: We’re modeling each user event on the phone as a word in a paragraph … from the time the user starts and stops using the phone, that’s a paragraph full of all these different words,” Moran said.
I thoroughly enjoyed my time with Poshyvanyk’s lab. It was incredibly exciting to observe programmers innovating in a way that could make an immediate, noticeable impact on our lives. It was also insanely gratifying to be able to actually grasp what they were doing and its implications. I attribute this to Poshyvanyk’s and Moran’s clear, patient descriptions and demonstrations, and the applicability of their work to my life, relative to many other projects I’ve gotten the chance to explore.
“That’s the cool thing about software engineering in general, is that a lot of the stuff we do can be immediately impactful to corporations … that’s one of the major reasons that I got into it,” says Moran.
There’s so much more to their lab, and I highly recommend checking it out here: http://www.cs.wm.edu/semeru/.