When I first had a look at the wrapper Boolean data type in Java a funny thought came to my mind about the tri-state nature of a boolean (Why null becomes a possible value?). Later applied the thought to any other data type which is nullable and got funny analogies. Since Object Oriented Programming was related directly to real world, I came up with the following analogy.

Every morning you get your newspaper delivered at your doorstep. The paper boy rings the bell and leaves the paper at your doorstep. Let us assume that the door bell is the function call and newspaper is the object you receive. What happens if you wont get the newspaper delivered?; quite obvious, the paper boy either won’t arrive or will come to your doorstep and communicate why the paper is not delivered.

What would happen if instead of the above scenario the paper boy arrives at your doorstep and delivers a null newspaper? You must receive the newspaper & assert that it is not null and decide to do some other activity, else your day ends abruptly because a null pointer is thrown when you tried to read the newspaper. Now let us not limit this to just newspaper, what about the need to worry about nulls every where? Would not you be driven crazy that you will have to worry about null mails, null phone calls, null water bottles?

This is what happens with many Java programmers where there is some point in their life they had dealt with a sticky, messy null pointer exception caused by some one who had made use of null for a boolean as a logic flow alternative to the true and false state. More worse is the SQL boolean data type having unknown and null as possible values.

So much of code has been written with guard clauses to prevent these nasty null pointer exceptions from getting in and eventually cost 3 extra lines of code in almost all critical business logic methods. Lately I have seen a positive trend among peers in the way we treat nulls. I have seen many consiously avoiding nulls and replacing them with empty objects, Null object pattern or special instances to handle ambiguities. When we move to newer (I would say newly adopted) languages I believe null will cease to exist.

I was always wondering what the huge reports churned out by PMD and Checkstyle be useful for. The reason was the amount of data the default setting spits out. I also had conversations trying to remove the report out of the CI as no one reads and it makes no sense.

An idea popped out, why not fail the CI build on a violation?; then devs are forced to read the report. It seemed simple but coding standards mentioned in the default settings in checkstyle were not to be used as is. I ran through what all checkstyle can provide and landed on one interesting option called ‘simian’ for code duplication check. Simian is proprietary but it is free for open source and also available for trial use. I also had one issue for long in the back burner to be fixed, which is copy pasting the test data for similar tests. Configured simian to report error for 5 line duplication and ran it on my codebase (It runs too too fast mostly in single digit seconds for large code bases).

I continuously monitored the results and started fixing tests one by one every day until there was no error reported. After a clean run I added “failonduplication=true” to the simian task. That did the trick.  I was trying to add one more test case and inadvertently in a hurry to rush home; I did a copy paste of the previous test method and just changed the parameters and assertions alone. I ran the dev build before checking in thinking in another minute I should be rushing home, the copy paste detector halted the build. It took me only two more minutes to fix that duplication issue, but there was a sense of security that something has been nipped in the bud instead of these small code refactors are put in the technical debt backlog. Those two mins of time spent in removing the duplication on detection prevents a tech debt and longer term maintenance issues.

This is just a start, I am analyzing various places where tools will serve as strict reminders. Other advantages of a tool doing this job is

  • New devs coming into the team gets immediate feedback about what are priorities in the code for the team
  • I also observed that we act immediately to fix something if the build fails instead of reading a report being churned out on every build.

The code snippet which was used for the duplicate check is below.


<simian language="java" threshold="5" failonduplication="true" reportduplicatetext="true">

<fileset dir="${basedir}/src" includes="**/*.java"/>

<fileset dir="${basedir}/test" includes="**/*.java"/>

<formatter type="plain"/>

</simian>