Sunday, February 20, 2011

Problems Unit-Testing Quartz.NET's IJob.Execute Method

I've been using an open source library at work called Quartz.NET which is an Enterprise Job Scheduler for the .NET platform. Quartz.NET is modeled after the popular Quartz Scheduler for the Java platform and shares many similar elements. I needed a scheduler for a variety of reasons but it is mostly used to fire "jobs" at specified times that perform some work like downloading a file via FTP and storing it to disk.

I've been using Quartz.NET version 1.0.3 which has the following IJob interface and can be used to implement your own custom "jobs" that perform some task:

namespace Quartz
{
     public interface IJob 
     {
          void Execute(JobExecutionContext context);
     }
}

The Execute method looks fairly simple and non-problematic until you realize that it is the single entry point that Quartz.NET has into your own application code (i.e. the scheduler calls the implemented Execute method on your custom jobs when a trigger fires). Problems start to arise if you ever want to thoroughly unit-test your custom jobs as the JobExecutionContext parameter passed in is a concrete object and not an interface. A previous post of mine has already mentioned my theory of software development with bugs so in that light let me explain further.

In order to effectively unit-test you need to isolate the method under test and test ONLY the logic contained within that singular method (not any other class's methods that may be called from the method under test). Misko Hevery of Google has a good presentation on what makes code hard to test and the above Quartz.NET interface code makes it hard to pass in a completely clean mock object as the parameter since the JobExecutionContext parameter is concrete and cannot be completed mocked (unless all its methods are virtual which they are not).

I've been using RhinoMocks to easily mock up my own interfaced objects, set expectations on what methods should be called on those mocked up objects and then asserting that the methods were in fact called (see this post for details). When RhinoMocks is used to mock up an interface or class it essentially wraps a proxy object around the interface, abstract class or concrete class you give it. It works great for interfaces but it can only override interface methods, delegates or virtual methods. Therefore if you try to mock up any class with non-virtual methods and then need to set expectations on those non-virtual methods you're out of luck (See these RhinoMock limitations). Most of the methods on the JobExecutionContext are virtual but there are some key ones related to the next, previous and current fire times which are not virtual and therefore cannot be set by RhinoMocks.

After understanding this problem I posted a question to the Quartz.NET Google Group asking why the JobExecutionContext parameter is not an interface. As it turns out the next version of Quartz.NET will use an interface for this parameter instead of a concrete type.

So there are 3 options that I can think of when using the Quartz.NET IJob interface for your own custom jobs (Some are more geared towards fully unit-testing the logic contained within your custom jobs):
  1. Continue using the Quartz.NET 1.0.3 library and either avoid testing your custom job logic completely by ignoring the next, previous and current fire time properties OR create a concrete JobExecutionContext object within your test code and pass that into your custom job's Execute method. I would avoid the first approach as you're then missing test coverage but the second approach is not ideal as you're not isolating your method logic and therefore a single change to one method may end up breaking multiple disparate unit-tests which take a while to determine whats wrong and fix.
  2. Wait for the next version of the Quartz.NET library which has an interfaced parameter for the IJob.Execute() method. This should be released sometime in 2011.
  3. Modify the Quartz.NET 1.0.3 open source library yourself so that the JobExecutionContext parameter on the IJob.Execute() method is an interface instead of a concrete type. This isn't an ideal solution as you now have to manage your own 3rd party library but if you can't wait for the next Quartz.NET release but still need the convenience of an interfaced JobExecutionContext parameter this is your only real option.

Friday, February 18, 2011

A Theory of Software Development: Bugs

Software development is a difficult task. It requires in-depth knowledge of the technical problem being solved, an acute awareness of how the technical problem fits into the framework of the business it functions within, an adherence to a set of established or evolving processes and finally, the ability to regularity adapt to new requirements, new technologies, new people and many forms of technical shortcomings throughout the lifecycle of project. Essentially software development is the process of solving many technical problems while simultaneously managing complexity and adapting to change.

Throughout a project many challenges are overcome and the ones that are not usually manifest themselves one way or another as bugs within the software application. And one thing is clear during today's modern software development: there will be many software bugs that need to be managed. Managing these bugs will become an routine task that every development team will face over and over again throughout the development lifecycle. So where do these bugs come from? Bugs find their way into software applications for 2 main reasons:
  1. Insufficient Requirements: Bugs that surface because the software is exposed to scenarios that it was never designed to be exposed to. For example, a method is designed to do X when given data Y, however after it is built method X is given unexpected data Z which causes it to return an unexpected result or alternatively throw an exception. This can also be thought of as violating a precondition.
  2. Insufficient Code:  Bugs that surface because developers design or build the software incorrectly. For example, a method should do X (the requirement specifies that it should do X) but it is built by a developer in such a way that it does Y instead of X. This can also be thought of as violating a postcondition.
I find it helps to think about software bugs in this manner because over time you can start to see which bugs are continually cropping up and therefore which ones your team should focus on avoiding. If it's of the Insufficient Requirements variety, I find that something is going wrong in the requirements gathering and dissemination process (either the requirements that are being gathered are incorrect or incomplete or the way they are understood and used by developers are incorrect). If it's of the Insufficient Code variety, I find that there is generally not enough (or any) unit-testing and/or acceptance testing being done. Finding ways to identify these issues and correct them is paramount to continually improving the software your team builds. The rest of this post will focus on the Insufficient Code variety and how a specific type of testing can be used anticipate and mitigate these bugs.

Once the understanding of where bugs come from has been established, the next step is to understand how bugs are classified within the Insufficient Code variety. Misko Hevery at Google has a unified theory of bugs which I find particularly compelling. He classifies bugs into 3 types: logical, wiring and rendering. He goes onto argue that logical bugs are on average the most common type of bug, are notoriously hard to find and are also the hardest to fix. Thus if developers only have so much time and energy to spend on testing, they should focus their testing efforts on uncovering logic bugs more so than wiring or rendering bugs. Of all the types of testing, unit-testing is the best mechanism for uncovering logic bugs. Misko Hevery goes onto say the following about unit-testing:
Unit-tests give you greatest bang for the buck. A unit-test focuses on the most common bugs, hardest to track down and hardest to fix. And a unit-test forces you to write testable code which indirectly helps with wiring bugs. As a result when writing automated tests for your application we want to overwhelmingly focus on unit test. Unit-tests are tests which focus on the logic and focus on one class/method at a time.
A key question that can be asked after reading the above paragraphs is how then do I write testable code that makes it easier to unit-test and therefore uncover logic bugs more easily? This can be accomplished by doing the following:
  1. Methods should have parameters that are interfaces not concrete types (unless the types are primitives such as ints, strings, doubles... etc)
  2. Constructors should not do any significant work besides checking and setting their parameters to internal member variables
  3. Avoid global state at all costs (this includes using singletons and static methods)
  4. Do not use the new operator within classes whose primary focus is logical in nature (i.e. classes whose methods contain ifs, loops and calculations)
    1. Note 1: Factory classes can and should have the new operator as these classes are specifically designed and used for wiring up other classes together
    2. Note 2: Certain data structures like lists and arrays are usually fine to "new" up inside logical classes
  5. Avoid Law of Demeter violations
So here is what the above rules actually look like in source code format for a class called Process which has a Start and Stop method and performs logic functions on some scheduler and data store objects:

public class Process
{
     private IDataStore _dataStore;
     private IScheduler _scheduler;

     public Process(IScheduler scheduler, IDataStore dataStore)
     {
          if(scheduler == null)
          {
               throw new NullParamException("scheduler");
          }
          if(dataStore == null)
          {
               throw new NullParamException("dataStore");
          }
      
          _scheduler = scheduler;
          _dataStore = dataStore;
     }

     public void Start(int delay)
     {
          _scheduler.JobComplete += _dataStore.Save;

          _scheduler.Start(delay);
     }

     public void Stop(bool forceShutdown)
     {
          _scheduler.JobComplete -= _dataStore.Save;

          _scheduler.ShutDown(forceShutdown);
     }
}

This class can now be effectively unit-tested by creating mock IDataStore and IScheduler objects that are passed into the constructor upon creation (see how I do this using RhinoMocks: a dynamic mock object framework). By using this approach, only the method under test is actually being tested and not any other classes. I have found this approach for designing classes and methods to be of particular importance as a project grows larger and matures. When a healthy regression suite (set of unit-tests) has been created for an application without taking into account this approach (i.e. a single unit-test will test multiple classes at the same time) and then a change request occurs, I find that many disparate unit-tests begin to fail after making code changes. It is therefore essential to isolate each unit-test so that it tests only the logic contained within a single classes' method and no other methods. Adhering to this principle from the beginning of development will save countless hours of refactoring seemingly unconnected unit-tests that fail after a single code change.

A final note about wiring logical classes together: By using the above approach, one or more application builder classes (factory classes) will need to be built that are responsible for wiring together the application. These builder classes create ("new" up) all logic classes and use inversion of control (IoC) to pass other logic classes to each other through their constructors. This essentially wires up the entire object graph of the application. These builder classes can then be tested but the resultant tests are more integration/system tests as they verify that the application’s object graph has been setup correctly and are therefore an attempt at uncovering any possible wiring bugs that may exist.

Thursday, February 17, 2011

State Machines and Regular Expressions

I came across a great article by Mark Shead about state machines, computer science basics and regular expressions. It really helped me understand the theory behind regular expressions and helped me realize why I wouldn't be able to use them to match strings embedded within double quotes that were data fields within a CSV file.

Wednesday, February 16, 2011

Using RhinoMocks Effectively for Unit-Testing

I'm a big proponent of unit-testing the code I design and build. It allows me to verify the behavior of the production code I write as well as providing a robust regression test suite that stops other developers and myself from breaking functionality we didn't think we would when making code changes. No matter what the language you develop applications in, unit-tests coupled with source control (e.g. Subversion), a continuous integration build system (e.g. Hudson), and some process best practices are indispensable tools for developing quality applications in the an efficient manner.

This is how I have been using RhinoMocks effectively within my C# .NET MSTest unit-tests:

private MockRepository mocks;
private IScheduler schedulerMock;
private IDataStore dataStoreMock;

[TestInitialize] 
public void Setup() 
{ 
     mocks = new MockRepository(); 
     schedulerMock = mocks.DynamicMock<IScheduler>();
     dataStoreMock = mocks.DynamicMock<IDataStore>();
}

[TestCleanup] 
public void CleanUp() 
{ 
     // Verifies that all the expectations on all mock objects in the repository are met. 
     mocks.VerifyAll(); 
}

[TestMethod]
public void StartAndStopTheProcess()
{
     // setup some expectations for methods that should be called
     Expect.Call(schedulerMock.Start);
     Expect.Call(() => schedulerMock.Shutdown(false));
     Expect.Call(dataStoreMock.Save(null).IgnoreArguments();

     // switch the mock repository from record mode into replay mode
     mocks.ReplayAll();
     
     // test the creation of the process and its starting and stoping
     Process process = new Process(schedulerMock, dataStoreMock);
     process.Start();
     process.Shutdown();
}


Here is a checklist when creating a new unit-test class that uses RhinoMocks:
  1. Create private member variables for the RhinoMocks MockRepository object and any other mocks you'll need to mock up and pass into your methods under test
  2. In your TestInitialize method, new up a MockRepository object and create dynamic mocks for each of your mock objects
  3. In your TestCleanup method, call VerifyAll() on the MocksRepository object so that when the unit-test completes, every expectation that you have set will be verified. Note that if an expectation is not met an exception will be thrown at this point by the RhinoMocks framework.
  4. In your unit-test method:
    1. Setup each expectation accordingly deciding whether to ignore arguments, return certain types when called, or anythign else RhinoMocks allows you to do (see this PDF document for a quick reference about the RhinoMocks 3.3 API)
    2. Switch the MockRepository object from record mode to replay mode
    3. Call the particular method you want to unit-test passing in the mocked-up objects you've set expectations on.

Tuesday, February 15, 2011

Byte Order Mark found using .NET BinaryReader class

Bug: Using .NET's BinaryReader class to read in a file's contents in byte format may result in reading in the Byte Order Mark (BOM) Unicode character if the file(s) were encoded in UTF-8 or Unicode.
byte[] data = new byte[size];
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{ 
     using (BinaryReader br = new BinaryReader(fs)) 
     { 
          data = br.ReadBytes(size); 
     }
}

When reading in one of our UTF-8 encoded files using the above code, the first 3 bytes of the data byte array were the BOM character.

Solution:  Read in the file's contents as text using the .NET StreamReader class which automatically compensates for the BOM character.

Detailed Explanation:
A tester on my team discovered a bug after writing some integration tests that ingested multiple CSV files. The code ingests the data within these files as bytes using the .NET BinaryReader class (see above code).

Upon investigation I noticed that the byte array contained 3 additional bytes at the beginning of the byte array which had the integer values of 239, 187 & 191 respectively. After reading Wikipedia I discovered that these 3 bytes represent the BOM Unicode character (\uFEFF char or U+FEFF code point). The purpose of this Unicode character, according to Wikipedia, is to signal the endianness (byte order) of a text file or stream. Given that UTF-8 and Unicode data can be encoded as 16-bit and 32-bit integers, the machine reading the encoded data needs to know its byte order so that it can read in the data correctly.

While thinking about and looking for a solution, I came across this stackoverflow discussion and this one which talked about simply stripping the BOM  from the data if its present. This didn't seem like the best approach so I continued thinking and came across a pretty simple solution. 

Instead of reading in the raw bytes from the file, I'd read in the actual text using the .NET StreamReader class. The added benefit of using .NET StreamReader class is that it has a number of constructor options and some of them have a bool parameter called detectEncodingFromByteOrderMarks. This highlighted the fact that the class handles the BOM character when reading in text from a stream and if I know the encoding of the file I am reading in, I can use the code below:
string data;
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{ 
     using (StreamReader sr = new StreamReader(fs, encoding, true, size)) 
     { 
          data = sr.ReadToEnd(); 
     }
}

The data string now contains only the text data from the file that was read in and not the BOM. Therefore, even if the a file contains the BOM character because it's contents were encoded as UTF-8, the StreamReader class (set with the constructor parameters above) compensates for the BOM character and emits it from the string read in.

Tuesday, February 1, 2011

Launch of Career in Canada Event

I'll be talking on a panel at the University of British Columbia's Launch a Career in Canada Event on February 9th, 2011 for international students.

It's a networking event and opportunity to meet with employers and international alumni and get tips on finding a great job.

Here is the Alumni Profile they wrote up about me on UBC's Career Services Blog.