Continuous Integration: Tooling, Processes & Implicit Communication

Continuous integration (CI) is a process and set of tools that development teams can use to continually manage the software they develop, test and deploy. In this post I will highlight the various tooling I have used on various .NET projects that harnesses CI, I will discuss some key processes around using CI that make it most effective and I will present some of the more subtle ways CI allows for implicit communication between team members that isn't so obvious when you're starting out with the practice.

Hudson is a really good open-source, Java, web-based application that can be used as a CI build system on Windows or Unix/Linux. What I find particularly compelling about Hudson is how easy it is to install and get started with a new project. First download the Hudson WAR from the homepage. Once you have started it from the command-line using java -jar hudson.war almost everything else is done from its GUI based menus; there is no need to configure any XML or to have an intimate knowledge of its API (CruiseControl is like this).

:: Tooling

Redsolo has a pretty complete post on setting up Hudson for a .NET project and getting the unit-tests and some static analysis (FxCop) to run. I initially configure my own Hudson instances very similarly to his post but over time I have altered it in the following ways:

The use of a source control management (SCM) system is crucial when building a CI infrastructure. I use Subversion (SVN) but you could just as easily use Git if you're more the distributed version control type. It is critical for CI to have a central place where all developers can regularly commit their code and configuration changes for a software project. Applications may work in isolation (when worked on independently) but time and time again I have found that the integration of a team's code changes and configuration can be very very challenging. CI promotes integrating early and often and therefore tries, and I believe succeeds, at avoiding the headaches that are associated with integrating at the end of a project (which by the way is very difficult to estimate against as you have no idea what kinds of bugs are lurking when it comes time to fully integrate).
I use NAnt to easily specify and execute a range of tasks from project compilation, database management, unit and acceptance test execution, static analysis, documentation generation and deployment. Hudson has a NAnt plugin which allows it to use NAnt as a build step. Thus, when Hudson notices that a developer has checked-in code, it does a full checkout of the subversion tree and executes a specific NAnt target to begin compilation or any other task required for that particular project.
I generally use MSTest and not NUnit for unit-testing. Both integrate nicely with Hudson but Hudson only knows how to interpret JUnit and NUnit test results. Because I still want a graph of the test results to show up on the project's main page, I install the MSTest plugin that converts the test report (a TRX file) into a format that Hudson understands natively (which happens to be JUnit XML).
I use FxCop, StyleCop and NCover for static analysis of my projects. FxCop and StyleCop are used more for helping the development team adhere to some design and implementation best practises as well as coding standings throughout development whereas NCover is used more for "after the fact" analysis of the code base (e.g. it highlights potential production code our team has not tested adequately). I use the Violations plugin to visually capture the number of FxCop and StyleCop violations within the code base after each new build. I also use a combination of a NAnt target (to generate the NCover report) and the HTML publisher plugin to publish the NCover report to Hudson so it is accessible via a link on the main page.
I automatically generate Doxygen documentation from the XML comments associated with the classes and methods within my code base. I use a NAnt target along with the HTML publisher plugin to publish the HTML to Hudson so it is accessible via a link on the main page. This way the development team always has a reference to the most up-to-date API documentation for the code base.
I find the Task Scanner plugin particularly helpful to keep track of various keywords in the code base that developers may leave in to remind themselves or others of what to fix or attend to later. Any custom tags such as FIXME, TODO, @deprecated, ??? can all be tracked and are shown visually on the main project page.
I use the Claim plugin which allows a specific developer to claim a build break and comment on what the problem/solution might be. The "claim" mechanism is sticky which means that if a developer continually breaks the build they will be continually responsible for fixing it. This plugin is great for helping a distributed team communicate implicitly about who is responsible for build breaks (instead of email or instant message communication between the whole team - less communication spam is always a good thing).
I also archive specific build artifacts from each CI build that I may want to analyze in the future. That way if I ever want to go back in time and discover where a particular bug entered the code base (and by whom) I can do so. It is much easier to simply download the executables from the CI build server and run them than to checkout a specific revision from source control, compile it and then run it to verify it does or does not have the bug I'm looking for.

Hudson's website also has some best practices to follow for Build Management, Release Management, Deployment Automation, and Test Orchestration

:: Key Processes

Here are some key processes that I use on teams to harness the benefits of CI:

Only one Commit per Build - Finding and solving software bugs that cause a build to break can be like searching for a needle in a haystack. Finding that needle can be made much easier if your software uses logging (such as log4net), a team member has deep insight into certain parts of the application that they know cause certain problems, specific sets of code and configuration changes can be isolated and labelled as the changes that may have caused the bug, or your software application has a testing suite that can pin-point failure areas. Each of these can be harnessed more effectively and precisely if a CI build is executed after each and every developer's commit (i.e. a single CI build does not encompass multiple commits). The reason for this is, if a build failure does occur, it is far easier to analyze a single code change set from one developer and begin the process of finding the bug than to begin analyzing multiple change sets from multiple developers. Whomever's code change actually broke the build at this point is very hard to determine without all developer's working together instead of just the single developer who's code change broke the build. Caveat: The only additional note on this is that it becomes difficult to keep to this requirement as a team grows in size and the build takes longer and longer to execute. For example: If a single build takes 1hr to execute, in a given working day a development team will only be able to run 8-10 CI builds. The more developers that are on a team, the more commits your team will perform each day. At some point your team won't be able to keep to this "one commit per build" policy but it is definitely something to strive for if you can.
Start a Build from Scratch - When setting up your CI build make sure to re-checkout the FULL source controlled tree (do not update the local copy of the tree) and fully rebuild (compile) all executables and libraries from their raw source code. Simply doing an update of your local copy of the tree or rebuilding only the executable/libraries that are needed, allows for the possibility of artifacts, from the previous build, to be used in the current build. The most effective way to eliminate unintended or corrupted build breaks and, more importantly, to truly test how everything integrates from your raw source code, the CI build should start from scratch. Delete everything, start from a freshly created directory and then checkout your source code from source control. Compilation, testing and deployments can then be performed on the raw source code and not any other artifacts still lying around in the directory from a previous build.
Multiple Builds - On my projects I usually need multiple builds. I need one build, called CI Build, that automatically builds after a single source control commit by a developer. I then have another build called Nightly Build, which is run each and every night and therefore builds against all the source control commits from the previous day. Then I may have a QA Build that has a different deployment procedure to some QA server for the testers. The QA Build is usually setup to build after a manual step that one of the testers will execute. Lastly, I have a Production Build that has a deployment procedure specifically geared towards deploying the application into a production environment and doing all the appropriate configuration and tests required to validate that it is stable and functional in production. Of course the Production Build is very important to get right and should be protected against unauthorized execution (i.e. password protect its usage or have only a select few individuals with the permissions to execute it).
Build Both the Release and Debug Configurations - When compiling .NET projects in Visual Studio you have the ability to compile in Release or Debug (or others if you have defined your own). I would highly recommended having both a CI Release Build and a CI Debug Build. I have found very subtle differences between the two when it comes to compilation, timing for unit-tests, configuration & static analysis results. I've had situations where the debug build passes but the release build does not and it was usually the result of some mis-configuration but occasionally I catch a real nasty timing issue in a unit or acceptance test. Note that I care about both build configurations because developer's routinely develop and test an application which was compiled in debug mode whereas the software actually being released to production is compiled in release mode.
Keep the Main Source Control Tree Pristine - Each and every developer works from and modifies the main source control tree. Each modification to the tree affects all other developers so it is imperative that it be kept clean and at the highest quality. The software that a development team produces is what is in the tree, therefore whatever quality the tree has the software has also. This is a slightly more concrete way of thinking about the intangible aspects of software. It is hard to visualize what software really is and what quality it has but it is much easier to visualize the files checked-in to a source control tree and the quality of each of those files. Therefore, keep the files that are checked-in to the source control tree pristine at at the highest quality.
Priority Number 1: Fix the Broken Build - When a build breaks during CI it effects the whole team. Developer's should be immediately informed that the source control tree is unstable and they then know NOT to commit any changes or to update their local code versions to the broken/unstable one. Developer's can of course continue to develop their feature(s) or resolve their bug(s) but eventually they're going to want to commit code changes or update their local versions with everyone else's code changes. With a broken build, developer's are forced to work in isolation which is the very thing CI is trying to avoid. A broken build delays integration and should therefore be avoided at all costs.

:: Implicit Communication

Some projects can be routinely bogged down by communication overhead between team members which makes it difficult to be productive (less communication overhead means more time building and improving software). Question such as: why does this code not compile, who's responsible for this piece of code, what effect will this code change have, what exactly are we testing, what are the exact files I need in order to test this on another machine, are all valid questions developers can ask but if there is an easier way to communicate the answers other than direct developer-to-developer interaction, productivity can be greatly improved. This is particularly useful for development teams that are distributed by both location and time but even small teams located in the same office can see marked improvements by reducing communication overhead by using CI practices.

Having a single, repeatable process which is what CI fosters, allows entire development teams to be on the same page. At any point in time you can verify the state of the code base: is it stable, is it broken or is it in transition (i.e. currently building, testing and deploying). If things are stable then I can concentrate on developing features for my software application. If it's broken then I know something is wrong and the CI build system will most likely help me out by informing me why it's broken or at least try to point me in the right direction (i.e. what commit by what developer broke the build, did a test fail, did deployment fail, is it a configuration issue or a code change... etc). If it's in transition I know that it's potentially unstable and should wait until I know more. Following these basic principles, development teams keep in-sync by knowing at any point in time whether the life blood of their software application (it's source code and configuration) are stable or whether there is a problem that needs fixing.

Extending this CI process with the tools mentioned above: static analysis reports, the ability for specific developers to claim broken builds and comment on them, auto-generated API documentation and code base task scanners, a development team has relevant information at its finger tips that helps them communicate without direct face-to-face or email communication. I call this implicit communication. If, as a developer, I can quickly determine who broke the build, if they have taken responsibility for it and what they have decided to do to fix it, I can quickly go back to the feature I was previously building or I can get in touch with the person responsible for breaking the build and help them by providing a deeper insight about the reason for the failure. Apart from build breaks, I can access the latest API documentation and verify that I understand a method's contract that I am about to use. I can also quickly verify the cyclomatic complexity of a method that I am refactoring and decide whether it has enough test coverage. All of this can be done from my own desk with no direct communication with another developer. I have found that on some projects without this CI infrastructure there is always a "guru" developer who seems to know the most about a project (usually because he's been there the longest) and everyone needs their insight about each and every software related issue. You can imagine what kind of bottle neck that causes and what might happen if the "guru" leaves the project... chaos.

At the end of the day this implicit communication helps me build better software as I have the information I need to design and implement software features and fix bugs. By reducing the communication overhead that CI tools and processes allow, I am more productive at developing, improving and delivering the software customers pay for.

-------------------------------

:: UPDATE

Recently the team that was building Hudson left the project and subsequently forked it into what is now called Jenkins (This was due to some political issues with Oracle over the name Hudson and who had control over it). If you already have a Hudson instance and wish to switch to Jenkins (as the team is mostly over their now) this Jenkins wiki article will help you make the switch.

Andrew Thompson

Saturday, 12 March 2011

Continuous Integration: Tooling, Processes & Implicit Communication

No comments: