17 Comments Introduction to test-doubles - 03/16/09

As soon as you start unit-testing or test-driving your development, you’ll learn about test-doubles and how they can make your tests lightening-fast sooner or later. And if you set up a continuous integration process (which you should) and you have more than 5 unit tests, you’ll probably have to know about test-doubles sooner in stead of later :-) .

What is a test-double?

Gerard Meszaros defines a test-double as follows:

“Sometimes it is hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren’t available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT. When we are writing a test in which we cannot (or choose not to) use a real depended-on component (DOC), we can replace it with a Test Double. The Test Double doesn’t have to behave exactly like the real DOC; it merely has to provide the same API as the real one so that the SUT thinks it is the real one! “

The concept is very easy to understand, but if you’ve never heard of them, I assume that how a test-double looks like, is still a blurry thing. First I have to say you have to use them with caution and only when appropriated. Apart from that, there are several types of test-doubles. You can find a list of all types in Meszaros’ book and an enumeration of them here

Why use test-doubles?

I think I can summarize the need of a test-double in one line: Use a test-double to keep your tests focussed and fast. If you’re doing CI and TDD, you’ll have a very big test suite after a while, and it’s critical to keep it running in a few minutes. If you don’t, you’ll end up giving up CI, and you’ll loose the continous feedback it offers you.

If your SUT depends on a component that needs a lot of setup code or needs expensive resources, you don’t want to be doing all this in a simple test. Your SUT shouldn’t care how the component it depends on needs to be configured. If you are doing it, you’re writing integration or even acceptance tests to go through the whole system… That’s why replacing a DOC object with a fake, can come in very handy sometimes. Test your SUT in isolation, that’s the goal. The DOC-components will have tests of their own. And you’ll have integration tests on top of it all.

Expectations, verifications, and stuff like that

Before I get to mocks and stubs, you need to understand the expectation-verification thing.

First of all, a mock  or a stub, is just an object that looks like the real DOC, but is actually just a fake which you can use to pass the test, or record the calls your SUT makes to it. When using such a mock/stub, you can set expectations on it. An expectation, is a statement, in which you explicitly expect a call to a particular method or property, with particular parameters, and even a particular return value. After you’ve set the expectations you consider important*, you can verify that these calls actually took place, and thus verifying that your SUT is executing what you expected.

What is a stub?

A stub is an object that you use just to get your code passing. When you don’t really care about how the interaction with the DOC-object happens, you can use a stub to replace the real dependency. A stub can be an empty implementation or a so-called “dumb” implementation. In stead of performing a calculation, you could just return a fixed value.

When you use stubs using a mocking framework, it’s way easier than creating an extra class that can act as a stub for your test. How you do this exactly is for the upcoming post, but the good news is that you don’t need to manually code the stub.

What is a mock?

You’ll use mocks, when you really want to test the behavior of the system. You can set expectations on a mock, of methods/properties to be called with specified parameters and/or return arguments. The final part of the test is then always verification of the expectations that were set. If they were not satisified, your tests fails. This is especially interesting when you need to be completely sure that these were actually called. Just imagine an overly simplified profit calculator. You can never calculate your profits (or losses), if you havn’t calculated your revenues and expenses first, can you? Well, you could expect these are calculated first. (This is of course an overly simplified example for the sake of simplicity…)

What is a fake?

A fake is a class or method that’s implemented just for the sake of testing. It has the same goal as the other variations of test-doubles, replace the depend-on-component to avoid slow or unreliable tests. The classic example, is replacing a Repository that accesses the database, with an in memory repository, which just has a collection of objects it uses to return. That way you’ve got data that you can use to test the SUT on, without the overhead of communication with expensive or external components.

The database is just an example, you can perfectly use a fake object, to hide a complex processing of some kind, and just return the data you need to continue (which would be returned from the complex processing in production code). It will make your tests focus better on the SUT, and they will be a lot faster.

Wrapup

It’s almost impossible to unit test without using mocking-techniques. Your tests can become extremely slow when you have a lot of them, and the continuous feedback loop is lost.
Mocking is a very powerful technique, but beware of misusing it. I actually try to avoid mocks. Just think: Do I need to verify how my SUT interacts with the DOC? If not, don’t use a mock, use a stub. When using lots and lots of mocks, your tests can become brittle. Just imagine refactoring your DOC and breaking 20 tests. After looking into the problem, you notice that 17/20 tests broke because the expectations set on this DOC as a mock, aren’t fully correct anymore? That’s something you really should avoid. Keep your tests focused ;-) .

Recommended readings

Mocks aren’t stubs by Martin Fowler
Test doubles by Martin Fowler
xUnit Test Patterns by Gerard Meszaros (also check out the website)
Test Doubles: When (not) to use them by Davy Brion

I’ll continue this post with how you can use these types of test-doubles using a mocking framework like Rhino.Mocks as soon as I get the chance ;-) .

18 Comments Things I liked most out of TFS2008 - 03/2/09

Last week, I followed a TFS2008 course, since we’re planning to upgrade soon (finally!). I thought I’d list the most important improvements, or better said, the improvements I liked most.

Out of the box Continuous integration

It’s unnecessary to say that this is the feature I love most in the new TFS… Previously we did our integration builds manually on commit. It’s not the same as having it automated. It requires more discipline (I’m still seeing check-ins that don’t build, or with failing unit tests…), but now those days are over. You can create a new build definition and configure it to run after each check in.

TFS also offers the following option:

Accumulate check-ins until the prior build finishes (Build no more often than every xx minutes) I’m having very mixed feelings about this option.
1) If you accumulate check-ins and build them together, it will be harder to detect what broke the build if it contains several changesets => you loose feedback and isolation of errors, while that’s one of the nicest features of continuous integration
2) If you choose to put a periodicity on your builds, that means there’s just something wrong with your build.

For both options, the conclusion is the same one: Your commit build should run very fast, and if it’s not fast enough to keep up with your check-ins, take a look at it in stead of using this option. Look at the origin of your problem, in stead of just patching it with fewer builds… You’ve now got the possibility to create several builds, make them run at different times, and based on different criteria, so use this possibility! In my opinion, I’d create an extremely fast commit build, a daily secondary build (containing integration tests and performance tests), and another daily build that deploys your application to a production-mimiced environment. You can also opt to just use one build in stead of two for the last two mentioned above, it depends on the situation and the size of your application.

New check-in policies

There’s a new build policy available that states you cannot check in unless the previous continuous integration build succeeded, and I personally think this is a valuable option. Imagine the following sequence of events:
- Developer1 checks in and a commit build is triggered
- Deverloper1’s commit build fails
- Developer2 checks in changes (that will break the build for another reason)
- Developer2’s build fails because of several reasons

I hope you can see the problem here. If Developer2 would be able to commit his changes to the repository, the build would break because of several reasons. So error-detection gets harder. And that’s where this policy becomes interesting. Only commit your changes if the previous build succeeded.
The only thing I ask myself is if TFS also prevents queued builds to run after the broken one… and also resume when the broken build is fixed… That would be a great addition to the policy!

Running unit tests without test lists

Finally we can drop the annoying VSMDI-file and tell the build to run all tests in the specified assembly. If you have several test projects you can work with wild cards to include all assemblies that end with Tests for example. This is much better than test lists, it doesn’t require any maintenance (adjusting the vsmdi-file every time you add tests) and ensures that all tests are always run. You can’t cheat anymore by excluding your failing tests from the test list to keep the build running :-) , and that’s a good thing.

Other nice additions

Build queing
You can now queue your builds, so it won’t be rejected if a co-worker checks in a second before you do. It will just be queued and will run when the previous build finishes. What’s also nice is that you can prioritize your builds. I can image that in some cases this is a valuable option.

Build retention policies
Finally you can automatically delete the build-outputs that are xx-days/weeks old, and you can even apply a different retention policy on the following levels:
- Succeeded builds
- Partially succeeded builds
- Failed builds
- Stopped builds

About partially succeeded builds by the way… I don’t think this option is very valuable. Why would you want builds that can partially succeed? A build succeeds or it fails, period. This is a black-or-white situation, grey is not a possibility. I’ve heard people defending this option by stating that a build can be partially succeeded if the solution builds but the tests don’t run, or when the solution builds, the tests run, but deployment fails…
Pfffff, all excuses. If your tests don’t run, your build fails, period. Otherwise you’re not doing continuous integration. And if you want to seperate deployment from the rest of your build, create a seperate build for it, in-stead of this option.

Version control improvements
I like the Get-latest on check-out option. I’m always reminding myself to do a get latest every time I want to check out something, so this option is very welcome. There are some developers that dislike this option because it forces you to integrate with changes your teammates made even if you don’t want to. I don’t agree. Why wouldn’t you want to integrate with other changes? You’ve got a whole test suite that’s backing you up ;-) .
There are flaws to this option though.
1) It only performs a get latest of the file your checking out. If this file is using new types or methods, you’ll be forced to do a complete get latest on project, or even solution level. If this could be automated, it would be nicer.
2) The one and only case I don’t want to integrate with the changes made is when the commit build fails.

Apart from that there have been nice UI-improvements that easen dead-simple actions a lot:
- Save attachment from workitem to disk
- Drag and drop features in the source control explorer and in workitem attachments
- Go to Windows explorer from source control explorer
- Improved help in command line (tf.exe)

TFS Power tools

The TFS power tools are a set of tools that you can download separately and use on top of TFS. And these always include very cool features. Here are the ones I appreciate the most:

Shell extensions
It’s been available for ages with tools such as Subversion and TortoiseSVN, but now we can finally perform source control operations on our files directly from windows explorer, with TFS.

Search
Improved search capibility using wildcards and paths, but my favourite certainly is searching by status:
- Files that are checked-out
- Files checked out to user x

Build notification application
A little monitoring application that polls the build server in search for builds that are queued, started or completed. It notifies you even if the build was started by another team-member, and displays a nice Outlook-like popup containing the build status and a direct link to it (or even the drop location).

Alerts editor
With this nice addition you can subscribe to alerts on 3 different levels: work item, check in or build. My favourites: only getting the build-e-mail when a build fails and when a work item is assigned directly to me.

8 Comments Code-review with NDepend - 02/9/09

A friend of mine asked me about some tips ‘n tricks to apply when performing a code review on an application. It got me thinking and I had a few things to say, especially about NDepend, so why not sum them up in a post?

The context of this post:
You’ve got an application that is very important to you, but has some issues. You can’t get a stable version, you’re having performance problems, you’re getting lots of complaints, …….

Warnings:
I’m not going to get into details about how to get started with NDepend. My goal is to show you how I would use it to perform a code review. If you need some help to get you started, there are some great resources to look at.
I’m not saying the below order of things-to-do is the right order. I just randomly wrote down what I was thinking. Be sure to let me know what you would do differently, or what you would add, and don’t forget to specify why :D !

Tools that can help during code-reviews

There are several ways to do a code review, there also are several tools that aid to the analysis of existing code. NDepend, is one of the tools that will help you the most during a code review. As I’ve already said some other times, NDepend does code analysis on .NET applications.

Another great tool for static code analysis, is FxCop. It comes with a default set of rules that each developer should apply. You can even define your own rules, filter the results you get, and so on… If you take a quick look at both, you’ll think they do the same, but you’re wrong. They’ve got different mentalities. FxCop is all about rules applied to code, while NDepend, is in first instance, about metrics (86 of them!) on which you can run queries to get a better picture of what you’re analyzing. NDepend also offers the possiblity to generate warnings whenever some metrics go over a limit you don’t want to cross.

It’s not my intention to do an NDepend vs FxCop comparison-battle, so I’m just going to stop here. For a code review, I’d prefer NDepend because you’ve got more options to really understand the whole code base, period.

First things first

Define and understand what the product is for. What kind of application are you reviewing? Is it an application that needs complex calculations or has complex business rules? Do you need to be saving lots of data? Is performance an issue?

If you need to do a code review on an existing application, that’s because there are some problems with it. No-one invests in a full review of a working application that can take some days just to know how good the code is. 

What are the problems with it now? What’s the goal of this code review? Is it just to pin-point what should be refactored to improve the code-base, or are we talking about possible rewrites?  Are there bugs, that when fixed, just create other bugs? Is the code hard to change? Is it hard to maintain? Is it hard to understand?

Overall approach of the project

Are we working with an object-oriented domain model here? Or does the project use data-centric or data-driven approach? Depending on the type of the project, this can be either positive or negative, so it’s important to note.

One of the most common scenario’s in a data-centric approach (especially if the app is a few years old) is the use of DataSets. Using CQL, you can write a query that checks if typed DataSets exist in the project:

-- Look for typed datasets
SELECT TYPES WHERE DEPTHOFDERIVEFROM "System.Data.DataSet" == 1
-- Look for the use of datasets
SELECT TYPES WHERE DEPTH OF CREATEA "System.Data.DataSet" == 1

If you’re using objects instead of datasets, still you can’t be sure you’re dealing with an OO domain model. You’ll need to investigate the app some more.

What about layering?

How’s the application’s architecture? The first and most obvious thing to do is to take a look at the layering within the application.
If you just run the analysis, you’ll get the number of assemblies in the report.
Maybe, the fact you don’t have physical layering, is a good thing :) .

An important part of a code review, is checking how physical or logical layers use each other. In other words, you should look for dependency cycles. Visual Studio won’t even let you create a dependency cycle beteen assemblies, but you can seriously mess things up between namespaces.

Some basic CQL examples for physical layering:

SELECT TYPES WHERE IsDirectlyUsing "MyAssembly.Entities.Customer"

SELECT NAMESPACES
WHERE IsUsing "MyAssembly.Entities.Customer"
ORDER BY DepthOfIsUsing

-- Don't use DataLayer from Application layer
SELECT NAMESPACES
WHERE IsDirectlyUsing "DataLayer"
AND NameIs "Application"

Suppose you don’t have physical layering. Still, you don’t want your “Application.UserInterface” namespace to be using “DataLayer.CustomerDAL”, right? You can check it with this simple query:

SELECT TYPES FROM "Application.UserInterface"
WHERE IsDirectlyUsing "DataLayer.CustomerDAL"

You can explore the dependencies between your namespaces and types, using the dependency matrix and the dependency graph within NDepend. This needs some getting used to, at least, for me it did.

You can use the dependency matrix to track down dependency cyles. But you can also use a query to get started:

SELECT NAMESPACES WHERE ContainsTypeDependencyCycle

When I’m looking for dependency cycles, I start of with the dependency matrix, pinpoint the dependency (follow the red rectangulars), and finish off with CQL queries and dependency graphs. If you just write some “who is directly using me”-queries, you’ll soon get to see what types are causing the dependencies. If you want to see it visually, you can also generate a dependency graph of the dependency cycle to immediately see where it’s going wrong. That’s just a great feature!

Code quality and code health

After running the analysis, you’ll get an HTML report, that contains tons of information. When you’re done digesting that (which will be an overall view on the application), you can start to narrow your view, look at assemblies in isolation and look how it works internally. Narrow your scope with CQL-queries, or within the class browser (by clicking right on types).

NDepend has a lot of queries it runs during analysis, here are my favorites for code-health:

SELECT METHODS WHERE CyclomaticComplexity > 20
ORDER BY CyclomaticComplexity DESC

This query looks for methods that have complex if-while-for-foreach-case structures. They are usually in need of some refactoring-love, and in some cases even a design pattern can help (Strategy, Specification, …).

SELECT METHODS WHERE NbParameters > 5
ORDER BY NbParameters DESC

You can review the results of this query, and maybe even apply one of Kent Beck’s implementation patterns: Parameter Object. It states that if a group of parameters is passed together to many methods, consider making an object whose fields are those parameters and passing the object instead.

Here are some more queries to pinpoint refactoring-needed-code:

SELECT METHODS WHERE CouldBePrivate 

SELECT TOP 10 METHODS WHERE NbVariables > 15
ORDER BY NbVariables DESC 

SELECT TOP 10 METHODS WHERE NbLinesOfCode > 30
ORDER BY NbLinesOfCode DESC

There are some queries that can help you check if the developers respected the most important design principles:

Single Responsibility principle

SELECT TYPES WHERE TypeCe > 50 ORDER BY TypeCe DESC

Interface Seggregation Principle

SELECT TYPES WHERE IsInterface AND NbMethods > 15

YAGNI

SELECT METHODS WHERE
   MethodCa == 0
   AND !IsPublic
   AND !IsEntryPoint
   AND !IsExplicitInterfaceImpl
   AND !IsClassConstructor
   AND !IsFinalizer

I can’t really find a way to query for LSP, OCP or DIP violations, you can’t have it all folks!

Testing

NDepend makes it very easy to check code coverage (both NCover and VSTS coverage files are supported). Well, I already mentioned once, that code coverage doesn’t alway give a correct view on the quality of your tests. Still, you could choose to cover types, methods that are complex. This is a default CQL query that helps you identify uncovered complex methods:

SELECT METHODS WHERE
(
   NbILInstructions > 200
   OR ILCyclomaticComplexity > 50
   OR ILNestingDepth > 4
   OR NbParameters > 5
   OR NbVariables > 8
   OR NbOverloads > 6
)
AND !PercentageCoverage < 100

In older applications, you’ll be lucky if you even find tests. Personally, I think starting to write tests for existing code that hasn’t been tested at all, is a bit useless. In most cases, the code won’t even be testable. What I do encourage, is to write tests for the code refactored.

Performance

If you need to to some profiling to find out why you’re having performance issues, I would encourage the use of a profiler such as JetBrain’s dotTrace. If you’re just looking at NDepend’s predefined CQL-queries for performance, you’ll get queries for boxing, unboxing, big instance sizes, and so on. But they are not going to give you enough insight.

For example, if you’re using typed datasets, be sure to expect some exclamations for these queries. Typed datasets do nothing but boxing and unboxing whenever you do a get or set operation. On top of that, in most cases, they have a very big instance type, so instantiating them is very expensive. If you’re using them responsibly, this will not be the cause of your perf-issues. That’s why you should use a profiler instead.

Roundup

NDepend is a great tool, I think it’s the best on the market right now. What I’ve covered here are just basics and many of the queries I showed you are predefined and executed with each analysis. Once you start experimenting more with the CQL language, the metrics out our disposal, and with the dependency matrix, you’ll get more and more insight in the codebase you’re analyzing.

I think it’s also a great tool to check how your own development skills are! I’ve been in the situation several times that I would wish for a developer to review my work, tell me what’s good, but especially what’s not good – what’s wrong with it and what I can do about it -. If this is just not possible, analyze your code with NDepend, and I’m sure you’ll find some mistakes that you won’t ever repeat in the future.

|