I was always wondering what the huge reports churned out by PMD and Checkstyle be useful for. The reason was the amount of data the default setting spits out. I also had conversations trying to remove the report out of the CI as no one reads and it makes no sense.
An idea popped out, why not fail the CI build on a violation?; then devs are forced to read the report. It seemed simple but coding standards mentioned in the default settings in checkstyle were not to be used as is. I ran through what all checkstyle can provide and landed on one interesting option called ‘simian’ for code duplication check. Simian is proprietary but it is free for open source and also available for trial use. I also had one issue for long in the back burner to be fixed, which is copy pasting the test data for similar tests. Configured simian to report error for 5 line duplication and ran it on my codebase (It runs too too fast mostly in single digit seconds for large code bases).
I continuously monitored the results and started fixing tests one by one every day until there was no error reported. After a clean run I added “failonduplication=true” to the simian task. That did the trick. I was trying to add one more test case and inadvertently in a hurry to rush home; I did a copy paste of the previous test method and just changed the parameters and assertions alone. I ran the dev build before checking in thinking in another minute I should be rushing home, the copy paste detector halted the build. It took me only two more minutes to fix that duplication issue, but there was a sense of security that something has been nipped in the bud instead of these small code refactors are put in the technical debt backlog. Those two mins of time spent in removing the duplication on detection prevents a tech debt and longer term maintenance issues.
This is just a start, I am analyzing various places where tools will serve as strict reminders. Other advantages of a tool doing this job is
- New devs coming into the team gets immediate feedback about what are priorities in the code for the team
- I also observed that we act immediately to fix something if the build fails instead of reading a report being churned out on every build.
The code snippet which was used for the duplicate check is below.
<simian language="java" threshold="5" failonduplication="true" reportduplicatetext="true"> <fileset dir="${basedir}/src" includes="**/*.java"/> <fileset dir="${basedir}/test" includes="**/*.java"/> <formatter type="plain"/> </simian>