Program Details


Students will take part in formal training through targeted courses, intensive leadership and professional skills training, industry internships, and an annual industry-focused research symposium. The training program structure was developed in consultation with our industry partners who are in urgent need of high quality personnel in the area of software analytics. The program will focus on software-analytics techniques for a variety of sectors including software development, water security, food security, and mining.

Research Objectives and Project Themes

In collaboration with our industry partners, our SOAR team has identified the following five general project themes that the SOAR trainees will explore with their research and projects.

Software Systems Quality

Software bugs cost millions of dollars to the global economy. Existing automatic bug-finding tools rely on static code analysis, a technique that does not scale well, or dynamic-behavior analysis, a technique that cannot reveal rare bugs. These methods tend to require a significant amount of manual work and have a steep learning curve, limiting real-world adoption. 

Machine-learning approaches applied to the plethora of bug examples described in source-code repositories offer a promising alternative. Our team has an impressive record of creating methods and tools in this area, including dealing with clone-related bugs, supporting query reformulation for locating bugs, and large scale studies for understanding the nature and presence of bugs. SOAR trainees will develop methods to support automated program repair, automated code reviews, programmer-assistant bots, cloned-bug detection, and context-sensitive query reformulation for concept and bug localization.

Evolutionary Software Design

Software maintenance accounts for up to 80% of total development cost, and clones, i.e., duplicated code fragments, are one of the major contributing factors. Modern Integrated Development Environments (IDEs) assist with clone detection but there are no methods to help developers manage clones and even less is understood about how to practice safe cloning. More work is needed on reasoning about clones, their implications to software design and quality, and their management throughout the software lifecycle. In this theme, trainees will aim to develop methods and usable tools to assist developers to safely clone code within IDEs during development, and to manage existing clones during maintenance and evolution. 

Technical Debt

To keep up with a competitive world with financial and schedule pressures, shortcuts are often taken when building software, resulting in long-term negative impacts such as software maintenance costs, unpredictable software performance, and poor overall productivity. Relying on the technical debt expertise of our team members and on prior work with clones (one form of technical debt), SOAR trainees will further explore this important challenge with our industrial partners. Trainees, with support and mentorship from team members, will interview developers and managers while at the same time mining evidence of technical debt from software repositories, with the objective of cross validating the findings in order to develop, in the long-run, automated technical-debt metrics based on a variety of software-development artifacts. 

Social Software Engineering

There is a wealth of information that can be extracted from analyzing the social interactions among developers, as captured in the co-developed artifacts of software repositories. SOAR trainees will deploy social-network analysis and machine-learning methods to extract interaction patterns, their evolution over time, and their impact on software quality. This work will build on our team’s recent work on software-team analysis and visualization and recent studies on gender diversity and communication patterns, that has found that diversity improves communication and subsequently the likelihood of undesirable source-code “code smells”.

Trustworthy, Explainable, and Visual Analytics for Software Teams

SOAR trainees will investigate the general problem of developing usable and actionable analytics tools for software development: information extracted through analytics is unlikely to drive action if users do not understand or trust it. The focus will be on investigating trustworthiness, explainability, visualization, and the user experience.