Posts Tagged ‘unit testing’

17 Comments Introduction to test-doubles - 03/16/09

As soon as you start unit-testing or test-driving your development, you’ll learn about test-doubles and how they can make your tests lightening-fast sooner or later. And if you set up a continuous integration process (which you should) and you have more than 5 unit tests, you’ll probably have to know about test-doubles sooner in stead of later :-) .

What is a test-double?

Gerard Meszaros defines a test-double as follows:

“Sometimes it is hard to test the system under test (SUT) because it depends on other components that cannot be used in the test environment. This could be because they aren’t available, they will not return the results needed for the test or because executing them would have undesirable side effects. In other cases, our test strategy requires us to have more control or visibility of the internal behavior of the SUT. When we are writing a test in which we cannot (or choose not to) use a real depended-on component (DOC), we can replace it with a Test Double. The Test Double doesn’t have to behave exactly like the real DOC; it merely has to provide the same API as the real one so that the SUT thinks it is the real one! “

The concept is very easy to understand, but if you’ve never heard of them, I assume that how a test-double looks like, is still a blurry thing. First I have to say you have to use them with caution and only when appropriated. Apart from that, there are several types of test-doubles. You can find a list of all types in Meszaros’ book and an enumeration of them here

Why use test-doubles?

I think I can summarize the need of a test-double in one line: Use a test-double to keep your tests focussed and fast. If you’re doing CI and TDD, you’ll have a very big test suite after a while, and it’s critical to keep it running in a few minutes. If you don’t, you’ll end up giving up CI, and you’ll loose the continous feedback it offers you.

If your SUT depends on a component that needs a lot of setup code or needs expensive resources, you don’t want to be doing all this in a simple test. Your SUT shouldn’t care how the component it depends on needs to be configured. If you are doing it, you’re writing integration or even acceptance tests to go through the whole system… That’s why replacing a DOC object with a fake, can come in very handy sometimes. Test your SUT in isolation, that’s the goal. The DOC-components will have tests of their own. And you’ll have integration tests on top of it all.

Expectations, verifications, and stuff like that

Before I get to mocks and stubs, you need to understand the expectation-verification thing.

First of all, a mock  or a stub, is just an object that looks like the real DOC, but is actually just a fake which you can use to pass the test, or record the calls your SUT makes to it. When using such a mock/stub, you can set expectations on it. An expectation, is a statement, in which you explicitly expect a call to a particular method or property, with particular parameters, and even a particular return value. After you’ve set the expectations you consider important*, you can verify that these calls actually took place, and thus verifying that your SUT is executing what you expected.

What is a stub?

A stub is an object that you use just to get your code passing. When you don’t really care about how the interaction with the DOC-object happens, you can use a stub to replace the real dependency. A stub can be an empty implementation or a so-called “dumb” implementation. In stead of performing a calculation, you could just return a fixed value.

When you use stubs using a mocking framework, it’s way easier than creating an extra class that can act as a stub for your test. How you do this exactly is for the upcoming post, but the good news is that you don’t need to manually code the stub.

What is a mock?

You’ll use mocks, when you really want to test the behavior of the system. You can set expectations on a mock, of methods/properties to be called with specified parameters and/or return arguments. The final part of the test is then always verification of the expectations that were set. If they were not satisified, your tests fails. This is especially interesting when you need to be completely sure that these were actually called. Just imagine an overly simplified profit calculator. You can never calculate your profits (or losses), if you havn’t calculated your revenues and expenses first, can you? Well, you could expect these are calculated first. (This is of course an overly simplified example for the sake of simplicity…)

What is a fake?

A fake is a class or method that’s implemented just for the sake of testing. It has the same goal as the other variations of test-doubles, replace the depend-on-component to avoid slow or unreliable tests. The classic example, is replacing a Repository that accesses the database, with an in memory repository, which just has a collection of objects it uses to return. That way you’ve got data that you can use to test the SUT on, without the overhead of communication with expensive or external components.

The database is just an example, you can perfectly use a fake object, to hide a complex processing of some kind, and just return the data you need to continue (which would be returned from the complex processing in production code). It will make your tests focus better on the SUT, and they will be a lot faster.

Wrapup

It’s almost impossible to unit test without using mocking-techniques. Your tests can become extremely slow when you have a lot of them, and the continuous feedback loop is lost.
Mocking is a very powerful technique, but beware of misusing it. I actually try to avoid mocks. Just think: Do I need to verify how my SUT interacts with the DOC? If not, don’t use a mock, use a stub. When using lots and lots of mocks, your tests can become brittle. Just imagine refactoring your DOC and breaking 20 tests. After looking into the problem, you notice that 17/20 tests broke because the expectations set on this DOC as a mock, aren’t fully correct anymore? That’s something you really should avoid. Keep your tests focused ;-) .

Recommended readings

Mocks aren’t stubs by Martin Fowler
Test doubles by Martin Fowler
xUnit Test Patterns by Gerard Meszaros (also check out the website)
Test Doubles: When (not) to use them by Davy Brion

I’ll continue this post with how you can use these types of test-doubles using a mocking framework like Rhino.Mocks as soon as I get the chance ;-) .

3 Comments Test types and Continous Integration - 02/23/09

Martin Fowler defines continuous integration as follows:

“Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily – leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.”

I’m writing this post as a follow-up to my previous one, types of testing. I’ll talk about how each type of test, fits into a continuous integration process.

Introduction

You can read all about Continuous integration in Martin Fowler’s paper here. A nice addition (and one that’s lying on my bookshelf as many others), is the book -> Continuous integration: Improving software quality and reducing risk, by Paul Duvall, Steve Matyas and Andrew Glover.

I’ll be talking about two types of builds. I’ll refer to them as the commit build and the secondary build. The primary-stage build (aka the commit build) automatically runs whenever someone commits changes to the repository (see Every build should build the mainline on an integration machine). When this build has tests that fail, the build also fails. The broken tests must be repaired as soon as possible to fix the build. This is a show-stopper and must be dealt with as soon as possible. The secondary-stage build, is a build that runs whenever possible; in my opinion, at least once a day. It can be done manually, or it can run as a nightly build in a script that grabs the latests executables and runs these specific test suite. If this build fails, developers can carry on working. Don’t get me wrong, this build has to be fixed too, but it doesn’t have the same priority as a broken commit build.

Unit tests

Unit tests are the most important part of your continuous integration process (in the sense that these tests are ran the most). After each commit to the repository, the build executes all unit tests to finalize the commit. Your unit tests should run within the commit build and make the build fail if any test fails.

It’s very important to keep these tests focused, and especially fast. You must realize that each commit will execute the tests, and it’s important to have immediate feedback. You can’t be waiting half an hour just to commit some changes, right?! That’s why unit tests use test double patterns (use a test double for each of the SUT’s expensive dependencies). I’ve only read a few pages in Meszaros’ book, but I know it contains a chapter that covers these patterns (can’t wait to get there!).

Integration tests

Integration tests run within the secondary build. These tests are normally slower than unit tests since they test the integration of several components, thus they do use (and set up) actual dependencies. This makes these tests slower, but still, we should try to keep them relatively fast. Running these tests is also very important, but since it’s an expensive operation, we do it far fewer times than running the unit tests. In my opinion, they should run at least once a day. These tests normally include testing access to your database, so I try to  run these tests after each database-change, for example. If they fail, you’ve probably broken a NHibernate mapping, a typed DataSet, or some code using an ugly magic string somewhere. My rule is, run them at least once a day, and every time you’ve made a change that directly affects the integration of your code with an external component.

Acceptance testing

If you’re using automated acceptance testing, these tests should can also be executed automatically within your integration process. I think it’s a good habit to run these tests daily, only it can be very annoying when developing your user interface. Whenever you need to add some textbox somewhere, you’ll have some failing tests (hopefully -remember TDD-). In that case,  I tend to keep the general rule of having them all pass at the end of the iteration, that’s the final deadline. If you choose to do so, it might be a good idea to set up a third build, or to just run them manually as part of your iteration (a bit like regression tests in this sense). If you just run them at the same level as your integration tests, you’ll have your secondary build failing during the whole iteration, which is not a good thing.

If you’re doing user acceptance testing, you should have your CI process deploy your application to the UAT-environment automatically (we do this after each iteration).

Performance testing

I’ve heard of projects where the secondary build also includes performance tests. Usually I don’t think this is necessary, unless in applications where performance is absolutely critical. If a certain level of performance is a requirement, including them in your continuous integration process gives you the advantage of constant feedback and easily identifying what part of your code might contain a memory leak and needs some investigation or rolling back.

I’d use these rules to make up my mind:
1) Do I really need performance tests?
2) Do I really need constant feedback on my application’s performance?
3) Can I have these tests executed by an independent build (not in the commit build, nor in the secondary build)?

Smoke testing and regression testing

I have skipped these two types of tests out of my initial list in my previous post, because these are just unit tests, integration tests, acceptance tests or performance tests in the long run. The big difference in the naming is just because of when they are executed, basicly. And in a continuous integration process, this would be during the commit build, or during the secondary build (or any other builds), depending on the type of test :D .

Wrapup

I think this post gives a nice overview of what tests to put in what build within a continuous integration process. Maybe this approach isn’t the best one, so if you’ve got any other ideas, be sure to leave them in the comments :) .

8 Comments Types of testing - 02/18/09

I notice that there’s a lot of confusion for people that are just starting to explore automated testing for applications, be it using TDD or not. There are many kinds of tests, all terms are used extensively, and I’ve seen this being the cause of confusion for many developers.
So here’s an overview of the most common types of testing and what their goals are.

Unit testing

As the name already states, a unit test, tests a single unit. What’s a unit? The smallest thing you could test, thus, in object orientation, that would be a method.
It’s common to create a fixture per class, and one or more tests per method. You’ll probably create one passing test, and several failing tests.

You should always start out with unit tests. If they’re failing, there is no use in starting with other types of tests. It’s simple. Build up the dependencies you need in the wanted state (valid or invalid, depending on the goal of the test). Perform the operation you’re testing, and verify if the result was correct.

Unit testing also introduces a whole new world of curious little objects such as mocks and stubs. I’ve been using Rhino.Mocks as my mocking framework for a few weeks now. Mocking is a subject that deserves a post on its own (or maybe even more). My xUnit Test Patterns book arrived yesterday, so you can expect some test-posts in the future :) .

Integration testing

I think integration testing can be summarized in the following line:
Testing of (sub)systems that interact (or integrate) with expensive or external resources

Thus, integration testing aims to test the combination of different software modules. Some examples:
- Test CRUD operations on your datastore
- Test import and export of data (system talks to the file system)
- Test systems that connect to webservices
- Test the integration of two modules with valid dependencies
- …

The typical example for demonstrating an integration test, is the repository example.

These tests are a lot more expensive than unit tests (since they connect to external resources, or build expensive objects), but you should still try to keep them fast. In my opinion, your integration tests should run -at least- once a day.

Acceptance testing

Acceptance tests come in two flavours: user acceptance tests and automated acceptance tests.

If you’ve got dedicated users that are testing the system after each iteration, you’d be doing user acceptance testing, or better said, your users would be doing acceptance tests.
It’s a common and best practice, to deploy an application into a user acceptance environment several times during the development process. This has several advantages:
- Bugs are discovered sooner. And thus are easier to fix.
- Misconceptions in analysis are discovered sooner (since you’ve got user feedback)
- Usability is tested sooner
- User adaptiveness can be estimated better
- Deployment problems are discovered and solved before going to a production environment

Automated acceptance tests are just like user acceptance tests, only automated. They also perform user interface actions, like clicking on buttons and filling in data in forms. These tests go through the whole cycle, just like a normal user would. They create a new product (for example) by clicking on the “New product”-menuitem, they fill in the data, and finally click the save button. You should create acceptance tests that fail when required information is missing, and assert you’re showing an error message. Let’s say it’s a bit weird the first time you do this, but it has great advantages.

You miss a few of the advantages you get when your users test, but there are ways to minimize them. For starters, let your users write your acceptance tests (not the code, but their intent), try to make them cover each scenario. Have them review the tests, whenever they want to change functionality. Demo the application after each iteration, so they can give feedback about usability.

Smoke testing

Smoke testing can be defined as a set of tests that need to pass in order to include the new or modified functionality into the entire system.
To give you a specific example, let’s consider the ordering story. Assume we’ve just added functionality to cancel an order.
The corresponding smoke tests to commit this functionality could be:
- Can I still add a new order?
- Can I still update an order?
- Can I confirm and update a canceled order (these should fail)?
- Can I still delete an order, unless canceled?
- Can I still search for orders?

You’re not testing the whole application, but, you’re testing only the functionality that is connected to the one built or repaired.

If you’re doing continuous integration (and you should), this is actually covered by the build that starts after commit, thus smoke tests arent’s specifically set up (at least not in my experience).

I’ve come across the use of smoke tests in legacy applications that don’t even have automated tests. No automated tests means no quality assurance. That’s why developers had to run a set of manual smoke tests before releasing an application. They stored a list of test cases in form of an Excel file containing a few steps to verify when a subsystem was changed. These were high-level tests, they didnt’ test complex functionality, but they had to run, or you weren’t allowed to release the application for user testing.

Regression testing

Regression testing is the term used for rerunning old tests to check if any functionality was broken by added or modified functionality. The ideal way of doing this is rerunning the entire test suite (unit tests as well as integrations tests) after each change to the system.
Sometimes changes are coming in too fast to keep this up, but it’s important to at least run all your unit tests after each change (and if even this seems to take a lot of time, you should try to make your tests faster).
We don’t really talk about regression testing, it’s something that’s invisible when your doing automated testing, and especially when doing continuous integration :) . If you’ve got users that are testing, it’s common that they re-execute all the tests they did in the previous release, and finally test the new/added functionality to test the whole system. In that case, it’s more common to explicitly talk about regression tests.

Performance testing

There are several types of performance testing. I’m covering the most common ones with a brief description of their goals.

Load testing
How will my application behave under a load of 50 users performing 5 transactions per minute?
Load testing is applied to systems that have a specified performance requirement. If performance is an important requirement, the tests should run at least once a day, so the impact of changes on performance can be noticed immediately.

Sress testing
How much load can my application carry?
This type of testing is usally applied to check at what load the application will crash. It’s good to know the maximum load your app can carry. If it’s very close to the performance requirement set by the customer, it’s time to do some serious profiling :) .

Endurance testing
Will my application be able to carry 50 users even after running x hours-days-…?
Endurance tests are used to evaluate if your application is able to run under a normal work load (users and transactions) during a prolonged time.

Wrapup

This post turned out to be longer than I thought, but I think it gives you a nice overview of what each test type intends to do. If I forgot anything, feel free to add!

9 Comments How do you know your tests are good? - 01/26/09

I keep asking myself, how do you know your test suite is healthy? I mean:
- How do you know your tests are testing what they should be testing?
- How do you know your tests aren’t testing what they shouldn’t be testing?

You could use code coverage as a metric. But that’s not going to get you all the way. Code coverage is especially handy for people that are doing the testing, but not the TDD-way. Just run TestDriven.net with NCover, and you’ll immediately know what you forgot to test.
But still, code coverage isn’t smart enough to know if what you’re testing is actually meaningfull. Let me simplify this with an example.

We have a very simple method, that calculates the age of a person, based on his/her birthday, and today’s date (I’m not going to involve correct calculation, nor validation, since it’s not the scope of this example):

   1: public class AgeCalculator
   2: {
   3:     /// <summary>
   4:     /// Calculates the age based on a given birthday and the current date
   5:     /// </summary>
   6:     /// <param name="birthday">Birthday to base age-calculation on</param>
   7:     /// <returns></returns>
   8:     public int CalculateAge(DateTime birthday)
   9:     {
  10:         return DateTime.Now.Year - birthday.Year;
  11:     }
  12: }

Then we have useless test number one:

   1: [Test]
   2: public void TestThatCoversAgeCalculationButIsNotTestingIt()
   3: {
   4:     AgeCalculator calculator = new AgeCalculator();
   5:     int age = calculator.CalculateAge(DateTime.Now.AddYears(-5));
   6:     CanDriveSpecification canDriveSpecification = new CanDriveSpecification();
   7:     Assert.IsFalse(canDriveSpecification.IsSatisfiedBy(age));
   8: }

Here’s it’s coverage:

image

NCover is telling me that this test (I used TestDriven.Net to run it with coverage) has 100% coverage. But taking a second look at the test, you’ll see (if you haven’t already) that this test is really testing something else, it’s testing if someone with age 5 may drive or not, which doesn’t ensure that our age calculation is correct. If that’s the only test you have covering your age calculation, your tests havn’t functionally covered it.

The second useless test:

   1: [Test]
   2: public void TestThatCoversAgeCalculationsButTestsItWrong()
   3: {
   4:     AgeCalculator calculator = new AgeCalculator();
   5:     DateTime birthDay = new DateTime(1984, 10, 31);
   6:     Assert.AreEqual(24, calculator.CalculateAge(birthDay));
   7: }

This is a test dedicated only to the age calculation, so we’re getting closer. Still, this test is just wrong! This will run today, tomorrow, even next month, but forget about it in november! And I’m still getting my 100% coverage.

The two tests above will run. They will even provide you with a 100% code coverage for the CalculateAge method, but they are not at all representative. So code coverage is not even close to measuring the quality of our test suite.

How can you ensure the tests are good?

You will always need to keep a human eye on the effectiveness of your tests. It’s the developer’s responsibility to be testing something usefull.

That brings me to the following conclusion: we can distinguish two different kind of developers that write tests:
1) developers that write tests as part of their duty (because management says so)
2) developers that write tests to ensure quality of their work, even without management asking

You’ll be thinking, that the code I showed you above is just plain stupidity, and can’t be written by any developer. Well, I’m sorry to dissapoint you. Maybe in this example I exagerated a bit, but I’m sorry to say, that this type of test-code is actually very common. Developers that don’t care about testing, write this type of code, and not because they’re too dumb to see what’s wrong with it, but just because they don’t care about the tests.

That’s what leads me to test-driven development.

Test-driven development is a technique that requires developers to first write their tests, and then write code to make the tests pass. This all is done is an iteration known as Red-Green-Refactor.

1) Think => how should you write your test?
2) Red => write your test, and see it fail (since there is no implementation code)
3) Green => write production code to make your test run
4) Refactor => refactor both test- and production code
5) Repeat => Do it all again

James Shore does a great job covering this topic in more depth, so don’t forget to take a look!

Revisiting the AgeCalculator example

How could I best write my test?

I need to calculate the age of a person, based on his/her birthday, and the current date. So it would be nice to call a method CalculateAge on a calculation class named AgeCalculator.

Red

image

I took a screenshot, just to show you that this won’t even build (since this class and method don’t even exist)!

Green

Create an interface IAgeCalculator with a CalculateAge method. This method should accept a DateTime parameter and return an int. Create a class AgeCalculator that implements the interface.

   1: public interface IAgeCalculator
   2: {
   3:     /// <summary>
   4:     /// Calculates the age based on a given birthday and the current date
   5:     /// </summary>
   6:     /// <param name="birthday">Birthday to base age-calculation on</param>
   7:     /// <returns></returns>
   8:     int CalculateAge(DateTime birthday);
   9: }
  10:  
  11: public class AgeCalculator : IAgeCalculator
  12: {
  13:     /// <summary>
  14:     /// Calculates the age based on a given birthday and the current date
  15:     /// </summary>
  16:     /// <param name="birthday">Birthday to base age-calculation on</param>
  17:     /// <returns></returns>
  18:     public int CalculateAge(DateTime birthday)
  19:     {
  20:         return DateTime.Now.Year - birthday.Year;
  21:     }
  22: }

Refactor

Maybe in this simple case, you wouldn’t want to refactor your code. But imagine, your AgeCalculator should be able to block invalid birthdays. An invalid birthday could be a date in the future. First you’d write a test that should throw an exception when you’re passing a future date. Then adjust the AgeCalculator class. We’re instantiating the AgeCalculator two times now. You could now choose to instantiate your AgeCalculator class in your test-setup. Then, we’re asked to add a new feature, in which you should be able to pass a date to calculate the age against (in stead of DateTime.Now).
We implement it the RGR-way and it looks as follows:

   1: /// <summary>
   2: /// Calculates the age given a birthday and a date to calculate the actual age against
   3: /// </summary>
   4: /// <param name="birthDay">Birthday to base age-calculation on</param>
   5: /// <param name="compareDate">Date to calculate age against</param>
   6: /// <returns></returns>
   7: public int CalculateAge(DateTime birthDay, DateTime compareDate)
   8: {
   9:     return compareDate.Year - birthDay.Year;
  10: }

Then you’ll notice you could adjust the CalculateAge method that only accepts the birthdate. You could have that method call the CalculateAge(DateTime birthDate, DateTime compareDate) and pass it DateTime.Now, to avoid code duplication (remember the DRY principle?). So, as you can already see, the refactoring step never ends. We’ve already added 2 new features, and we’re still refactoring feature number one :-) .

Roundup

I’m not at all a TDD-expert (I wish!), this post only reflects how I got interested in TDD. Right now, I try to practice it whenever I can, but I have a very long way to go. I’m currently also reading the TDD-starter book: Test-driven development by example.

I think that in this little example, I proved that TDD helps to keep your tests focused, clear and meaningfull. It also follows the YAGNI principle, since you’ll never be writing code you havn’t written a test for, thus you won’t be writing code you don’t need, unless you’re writing tests you don’t need, and then you’re doing TDD all wrong :) .

TDD can only be done by developers that actually have interest in building qualitative software (if you don’t care, I don’t think you can find the discipline to do it). Just imagine the first useless test I wrote above failing because the calculation was changed and the developer introduced a bug. That will keep the team looking for bugs in the specification, while it’s the calculation that went wrong. And what about the second useless test? That one just won’t run in a few months. And you can start looking for the bug again. In such a case, it’s better not to have the tests at all.

Last but not least

James Shore already linked to the rules to follow in TDD in his Red-Green-Refactor post, but if you havn’t read it yet, then do it now. I’m just repeating the rules here because I think they are very valuable.

Quoting Michael Feathers:
A test is not a unit test if:
1) It talks to the database
2) It communicates across the network
3) It touches the file system
4) It can’t run correctly at the same time as any of your other unit tests
5) You have to do special things to your environment (such as editing config files) to run it.

|