-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsec1-introduction.tex
50 lines (41 loc) · 6.62 KB
/
sec1-introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
\IEEEraisesectionheading{\section{Introduction}\label{sec:introduction}}
\IEEEPARstart{R}{efactoring} is a well-known technique to improve the design of a system and enable its evolution~\cite{Fowler:1999}.
In fact, existing studies~\cite{MurphyHill2012, tsantalis_empiricalstudy, Kim:2012:FSE, kim-tse-2014, fse2016-why-we-refactor} show that refactoring is frequently applied by development teams, and it is an important aspect of their software maintenance workflow.
Therefore, detecting refactoring activity in software projects is a valuable information to help researchers understand software evolution.
For example, past studies used such information to shed light on important aspects of refactoring practice, such as the usage of refactoring tools~\cite{negara2013, MurphyHill2012}, the motivations driving refactoring~\cite{Kim:2012:FSE, kim-tse-2014, fse2016-why-we-refactor}, the risks of refactoring~\cite{Kim:2012:FSE, kim-tse-2014, Kim:2011, weissgerber2006refactorings, bavota2012does}, and the impact of refactoring on code quality metrics~\cite{Kim:2012:FSE, kim-tse-2014}.
Moreover, it is often important to keep track of refactorings when performing source code evolution analysis because files, classes, or functions may have their histories split by refactorings such as \emph{Move} or \emph{Rename}~\cite{icse2018}.
Additionally, knowing which refactoring operations were applied in the version history of a system may help in several practical tasks.
For example, in a study by Kim~et~al.~\cite{Kim:2012:FSE}, many developers mentioned the difficulties they face when reviewing or integrating code changes after large refactoring operations, which impact several code elements. Thus, developers might feel discouraged to refactor their code. If a tool is able to identify such refactoring operations, it can possibly resolve merge conflicts automatically.
Moreover, diff visualization tools can also benefit from such information, presenting refactored code elements side-by-side with their corresponding version before the change.
Another application for such information is adapting client code to a refactored version of an API it uses~\cite{henkel2005catchup, Xing:2008:JDevAn}. If we are able to detect the refactorings that were applied to an API, we might be able to replay them on the client code automatically.
Given the importance of studying refactoring activity, we proposed RefDiff in previous work~\cite{msr2017}. RefDiff is an automated approach that identifies refactoring operations performed in the version history of Java systems.
By that time, our main goal was to provide a reliable tool to mine refactoring activity in a fully automated fashion, with better precision and recall than existing approaches. Since then, other approaches have emerged, such as RMiner~\cite{tsantalis2018rminer}, which enhanced precision to even higher standards.
Today, the availability of such tools enables large-scale and in-depth empirical studies on refactoring practice~\cite{fse2016-why-we-refactor, icse2018}.
Nevertheless, despite the advancements in the field of refactoring detection, existing tools are centered in the Java language.
Thus, we are still not able to mine refactoring activity in a vast amount of software repositories written in other programming languages.
By restricting refactoring research to a single language, we may get a biased understanding of the reality.
Interestingly, in the most recent edition of his refactoring book, Fowler changed all examples to use JavaScript~\cite{fowler2018refactoring}, which corroborates the idea that refactoring practice in other languages should be discussed on equal footing with Java.
Moreover, the practical applications of refactoring detection tools are hindered by the lack of support of other popular programming languages.
For all these reasons, in this paper we propose a multi-language refactoring detection approach, named as RefDiff~2.0, which is a redesign of its first version that introduces an extensible architecture.
In RefDiff~2.0, the refactoring detection heuristics are fully implemented in a common core, and support for programming languages is provided by plug-in modules.
As a way to validate this architecture, we implemented and evaluated extension modules for three mainstream programming languages with distinct characteristics: Java, JavaScript (a widely popular dynamic programming language, used mostly to build web applications) and C (a procedural programming language, used mostly to implement system software).
Additionally, we reworked the refactoring detection heuristics of RefDiff to significantly improve its precision when compared to our previous work.
Now, RefDiff achieves 96.4\% of precision and 80.4\% of recall when evaluated in the Java dataset proposed by Tsantalis~et~al.~\cite{tsantalis2018rminer}, against 79.3\% of precision and 80.2\% of recall in its prior version.
Moreover, RefDiff's precision is on par with RMiner, the current state-of-the-art in Java refactoring detection (96.4\% vs. 98.8\%).
This is a key achievement because our approach is not specialized in a single language.
In summary, we deliver the following contributions in this work:
\begin{itemize}
\item A major extension of our refactoring detection approach proposed in previous work~\cite{msr2017}, which includes a redesign of its core to work with a language-independent model and improved detection heuristics.
\item A publicly available implementation\footnote{RefDiff and our evaluation data are public available at:\\
\url{https://github.com/aserg-ufmg/RefDiff}}
of our approach, with out-of-the-box support for Java, C, and JavaScript.
\item An evaluation of the precision and recall of RefDiff using a large scale dataset of refactorings performed in real-world Java open-source projects, comparing it with RMiner, a state-of-the-art tool for detecting refactorings in Java. As a byproduct of this evaluation, we also extend the dataset with new refactoring instances discovered by our tool.
\item An evaluation of the precision and recall of RefDiff in real-world C and JavaScript open source projects.
\end{itemize}
The remainder of this paper is structured as follows.
Section~\ref{SecBackground} describes related work, discussing existing refactoring detection approaches.
Section~\ref{SecApproach} presents the proposed approach in details.
Section~\ref{sec:eval:java} describes the design and results of a large scale evaluation of RefDiff in Java projects.
Section~\ref{sec:eval:js:c} describes the design and results of an evaluation of RefDiff in C and JavaScript projects.
Section~\ref{sec:challenges} discusses challenges and limitations.
Last, Section~\ref{SecConclusion} presents final remarks and concludes the paper.