Analysing and Comparing the Effectiveness of Mutation Testing Tools: A Manual Study

Best paper award: SCAM 2016



Mutation testing is considered as one of the most powerful testing methods. It operates by asking testers to design tests that reveal a set of mutants, which are purpose-made injected defects. Evidently, the strength of the method strongly depends on the used mutants. However, this dependence raises concerns regarding the mutation testing practice that is implemented by existing tools. Thus, it is probable that implementation inadequacies can lead to incompetent results. In this paper, we cross-evaluate three popular mutation testing tools for Java, namely MUJAVA, MAJOR and PIT, with respect to their effectiveness. We perform an empirical study of 3,324 manually analysed mutants from real-world projects and we find that there are large differences between the tools’ effectiveness, ranging from 76% to 88%, with MUJAVA achieving the best results. We also demonstrate that no tool is able to subsume the others and provide practical recommendations on how to strengthen each one of the studied tools. Finally, our analysis shows that 11%, 12% and 7% of the mutants generated by MUJAVA, MAJOR and PIT are equivalent, respectively.

Experimental Analysis

Test Subjects

Details about the test subjects utilised in this study are given below:

Test Subjects’ Details: "LoC" shows the source code lines of the projects; "Class" presents the name of the considered class, along with the enclosing package; "Method" refers to the names of the considered methods.
Test Subject LoC Class Method
Commons-Math 16,489 org.apache.commons.math.util.MathUtils gcd
    org.apache.commons.math.geometry.Vector3D orthogonal
Commons 17,294 xorg.apache.commons.lang.ArrayUtils toMap
      lastIndexOf(Object[], Object, int)
    xorg.apache.commons.lang.WordUtils capitalize
Pamvotis 5,505 pamvotis.core.Simulator addNode
Triangle 47 Triangle classify
XStream 15,048 decodeName
Bisect 37 Bisect sqrt
Total 54,420 - -

Manual Analysis Results

Details regarding the results of the performed manual analysis are presented below:

Manual Analysis Results: Columns "#Mutants", "#Equivs", "#Tests" present the number of generated mutants, the number of manually detected equivalent mutants and the number of the test cases that were manually created to cover the generated mutants per tool and method.
    MAJOR     PIT     MUJAVA  
Method #Mutants #Equivs #Tests #Mutants #Equivs #Tests #Mutants #Equivs #Tests
gcd 133 17 6 79 9 7 237 23 7
orthogonal 120 3 8 65 0 8 155 5 9
toMap 23 5 7 50 2 5 32 7 5
subarray([],int,int) 25 5 6 27 3 4 64 8 6
lastIndexOf(Object[], Object, int) 29 2 8 43 1 7 81 4 12
capitalize 37 6 5 42 1 6 69 14 9
wrap 71 8 10 70 4 6 198 19 7
addNode 89 11 8 53 3 8 318 33 34
removeNode 18 2 5 29 0 3 55 7 6
classify 139 7 25 94 1 16 354 38 27
decodeName 73 24 5 81 16 6 156 28 10
sqrt 51 4 4 29 3 4 135 17 6
Total 808 94 97 662 43 80 1,854 203 138


All the data of this study can be downloaded from the following links: Bisect, Triangle. Commons, Commons-Math, Pamvotis, XStream.


Effort has been put into making this study as replicable as possible. Thus, for each project, we supply:

  1. All its dependencies (lib/ directory).
  2. Necessary libraries and settings for PIT and MAJOR.
  3. Generated mutation adequate test suites for all mutation tools.
  4. Manually detected equivalent mutants for each mutation tool ( file – can be opened with any text editor; best viewed with an editor that supports Org-mode).


For each project, we supply scripts that automate the project's compilation, test execution, mutant generation and execution and the cross-evaluation experiment. These scripts have been tested on GNU/Linux- and UNIX-based machines and can be found in the scripts/ directory of the projects. Note that the scripts must be run by visiting the containing directory. The most important script files are listed below:

Description of various script files: These files can be found at the scripts/ directory of each project.
Filename Description Run PIT against its mutation adequate test suite (ALL available mutation operators are used) Same as above, but this time for MAJOR Execute MAJOR's mutants with PIT's mutation adequate test suite Execute MAJOR's mutants with MUJAVA's mutation adequate test suite Execute PIT's mutants with MAJOR's mutation adequate test suite Execute PIT's mutants with MUJAVA's mutation adequate test suite Execute MAJOR's mutants with the mutation adequate test suites of all tools Same as above, but this time for PIT's mutants