Test Driven Traps, part 1

Have you ever been in a situation, where a simple change of code, broke a few hundred tests? Have you ever had the idea that tests slow you down, inhibit your creativity, make you afraid to change the code. If you had, it means you’ve entered the Dungeon-of-very-bad-tests, the world of things that should not be.

I’ve been there. I’ve built one myself. And it collapsed killing me in the process. I’ve learned my lesson. So here is the story of a dead man. Learn from my faults or be doomed to repeat them.

The story

Test Driven Development, like all good games in the world, is simple to learn, hard to master. I’ve started in 2005, when a brilliant guy named Piotr Szarwas, gave me the book “Test Driven Development: By Example” (Kent Beck), and one task: creating a framework.

These were the old times, when the technology we were using had no frameworks at all, and we wanted a cool one, like Spring, with Inversion-of-Control, Object-Relational Mapping, Model-View-Controller and all the good things we knew about. And so we created a framework. Then we built a Content Management System on top of it. Then we created a bunch of dedicated applications for different clients, Internet shops and what-not, on top of those two. We were doing good. We had 3000+ tests for the framework, 3000+ tests for the CMS, and another few thousand for every dedicated application. We were looking at our work, and we were happy, safe, secure. These were good times.

And then, as our code base grew, we came to the point, where a simple anemic model we had, was not good enough anymore. I had not read the other important book of that time: “Domain Driven Design”, you see. I didn’t know yet, that you can only get so far with an anemic model.

But we were safe. We had tons of tests. We could change anything.

Or so I thought.

I spent a week trying to introduce some changes in the architecture. Simple things really: moving methods around, switching collaborators, such things. Only to be overwhelmed by the number of tests I had to fix. That was TDD, I started my change with writing a test, and when I was finally done with the code under the test, I’d find another few hundred tests completely broken by my change. And when I go them fixed, introducing some more changes in the process, I’d find another few thousand broken. That was a butterfly effect, a chain reaction caused by a very small change.

It took me a week to figure out, that I’m not even half done in here. The refactoring had no visible end. And at no point my code base was stable, deployment-ready. I had my branch in the repository, one I’ve renamed “Lasciate ogne speranza, voi ch’intrate”.

We had tons and tons of tests. Of very bad tests. Tests that would pour concrete over our code, so that we could do nothing.

The only real options were: either to leave it be, or delete all tests, and write everything from scratch again. I didn’t want to work with the code if we were to go for the first option, and the management would not find financial rationale for the second. So I quit.

That was the Dungeon I built, only to find myself defeated by its monsters.

I went back to the book, and found everything I did wrong in there. Outlined. Marked out. How could I skip that? How could I not notice? Turns out, sometimes, you need to be of age and experience, to truly understand the things you learn.

Even the best of tools, when used poorly, can turn against you. And the easier the tool, the easier it seems to use it, the easier it is to fall into the trap of I-know-how-it-works thinking. And then BAM! You’re gone.

The truth

Test Driven Development and tests, are two completely different things. Tests are only a byproduct of TDD, nothing more. What is the point of TDD? What does TDD brings? Why do we do TDD?

Because of three, and only those three reasons.

1. To find the best design, by putting ourselves into the user’s shoes.

By starting with “how do I want to use it” thinking, we discover the most useful and friendly design. Always good, quite often that’s the best design out there. Otherwise, what we get is this:

And you don’t want that.

2. To manage our fear.

It takes balls, to make a ground change in a large code-base without tests, and say “it’s done” without introducing bugs in the process, doesn’t it? Well, the truth is, if you say “it’s done”, most of the time you are either ignorant, reckless, or just plain stupid. It’s like with concurrency: everybody knows it, nobody can do it well.

Smart people are scared of such changes. Unless they have good tests, with high code coverage.

TDD allows to manage our fears, by giving us proof, that things work as they should. TDD gives us safety

3. To have fast feedback.

How long can you code, without running the app? How long can you code without knowing whether your code works as you think it should?

Feedback in tests is important. Less so for frontend programming, where you can just run the shit up, and see for yourselves. More for coding in the backend. Even more, if your technology stack requires compilation, deployment, and starting up.

Time is money, and I’d rather earn it, than wait for the deployment and click through my changes each time I make them.

And that’s it. There are no more reasons for TDD whatsoever. We want Good Design, Safety, and Feedback. Good tests are those, which give us that.

Bad tests? All the other tests are bad.

The bad practice

So how does a typical, bad test, look like? The one I see over and over, in close to every project, created by somebody who has yet to learn how NOT to build an ugly dungeon, how not to pour concrete over your code base. The one I’d write myself in 2005.

This will be a Spock sample, written in groovy, testing a Grails controller. But don’t worry if you don’t know those technologies. I bet you’ll understand what’s going on in there without problems. Yes, it’s that simple. I’ll explain all the not-so-obvious parts.

def "should show outlet"() {
  given:
    def outlet = OutletFactory.createAndSaveOutlet(merchant: merchant)
    injectParamsToController(id: outlet.id)
  when:
    controller.show()
  then:
    response.redirectUrl == null
}

So we have a controller. It’s an outlet controller. And we have a test. What’s wrong with this test?

The name of the test is “should show outlet”. What should a test with such a name check? Whether we show the outlet, right? And what does it check? Whether we are redirected. Brilliant? Useless.

It’s simple, but I see it all around. People forget, that we need to:

VERIFY THE RIGHT THING

I bet that test was written after the code. Not in test-first fashion.

But verifying the right thing is not enough. Let’s have another example. Same controller, different expectation. The name is: “should create outlet insert command with valid params with new account”

Quite complex, isn’t it? If you need an explanation, the name is wrong. But you don’t know the domain, so let me put some light on it: when we give the controller good parameters, we want it to create a new OutletInsertCommand, and the account of that one, should be new.

The name doesn’t say what ‘new’ is, but we should be able to see it in the code.

Have a look at the test:

def "should create outlet insert command with valid params with new account"() {
  given:
    def defaultParams = OutletFactory.validOutletParams
    defaultParams.remove('mobileMoneyAccountNumber')
    defaultParams.remove('accountType')
    defaultParams.put('merchant.id', merchant.id)
    controller.params.putAll(defaultParams)
  when:
    controller.save()
  then:
    1 * securityServiceMock.getCurrentlyLoggedUser() >> user
    1 * commandNotificationServiceMock.notifyAccepters(_)
    0 * _._
    Outlet.count() == 0
    OutletInsertCommand.count() == 1
    def savedCommand = OutletInsertCommand.get(1)
    savedCommand.mobileMoneyAccountNumber == '1000000000000'
    savedCommand.accountType == CyclosAccountType.NOT_AGENT
    controller.flash.message != null
    response.redirectedUrl == '/outlet/list'
}

If you are new to Spock: n*mock.whatever(), means that the method “whatever” of the mock object, should be called exactly n times. No more no less. The underscore “_” means “everything” or “anything”. And the >> sign, instructs the test framework to return the right side argument when the method is called.

So what’s wrong with this test? Pretty much everything. Let’s go from the start of “then” part, mercifully skipping the oververbose set-up in the “given”.

1 * securityServiceMock.getCurrentlyLoggedUser() >> user

The first line verifies whether some security service was asked for a logged user, and returns the user. And it was asked EXACTLY one time. No more, no less.

Wait, what? How come we have a security service in here? The name of the test doesn’t say anything about security or users, why do we check it?

Well, it’s the first mistake. This part is not, what we want to verify. This is probably required by the controller, but it only means it should be in the “given”. And it should not verify that it’s called “exactly once”. It’s a stub for God’s sake. The user is either logged in or not. There is no sense in making him “logged in, but you can ask only once”.

Then, there is the second line.

1 * commandNotificationServiceMock.notifyAccepters(_)

It verifies that some notification service is called exactly once. And it may be ok, the business logic may require that, but then… why is it not stated clearly in the name of the test? Ah, I know, the name would be too long. Well, that’s also a suggestion. You need to make another test, something like “should notify about newly created outlet insert command”.

And then, it’s the third line.

0 * _._

My favorite one. If the code is Han Solo, this line is Jabba the Hut. It wants Hans Solo frozen in solid concrete. Or dead. Or both.

This line, if you haven’t deducted yet, is “You shall not make any other interactions with any mock, or stubs, or anything, Amen!”.

That’s the most stupid thing I’ve seen in a while. Why would a sane programmer ever put it here? That’s beyond my imagination.

No it isn’t. Been there, done that. The reason why a programmer would use such a thing is to make sure, that he covered all the interactions. That he didn’t forget about anything. Tests are good, what’s wrong in having more good?

He forgot about sanity. That line is stupid, and it will have it’s vengeance. It will bite you in the ass, some day. And while it may be small, because there are hundreds of lines like this, some day you gonna get bitten pretty well. You may as well not survive.

And then, another line.

Outlet.count() == 0

This verifies whether we don’t have any outlets in the database. Do you know why? You don’t. I do. I do, because I know the business logic of this domain. You don’t because this tests sucks at informing you, what it should.

Then there is the part, that actually makes sense.

OutletInsertCommand.count() == 1
def savedCommand = OutletInsertCommand.get(1)
savedCommand.mobileMoneyAccountNumber == '1000000000000'
savedCommand.accountType == CyclosAccountType.NOT_AGENT

We expect the object we’ve created in the database, and then we verify whether it’s account is “new”. And we know, that the “new” means a specific account number and type. Though it screams for being extracted into another method.

And then…

controller.flash.message != null
response.redirectedUrl == '/outlet/list'

Then we have some flash message not set. And a redirection. And I ask God, why the hell are we testing this? Not because the name of the test says so, that’s for sure. The truth is, that looking at the test, I can recreate the method under test, line by line.

Isn’t it brilliant? This test represents every single line of a not so simple method. But try to change the method, try to change a single line, and you have big chance to blow this thing up. And when those kinds of tests are in the hundreds, you have concrete all over you code. You’ll be able to refactor nothing.

So here’s another lesson. It’s not enough to verify the right thing. You need to

VERIFY ONLY THE RIGHT THING.

Never ever verify the algorithm of the method step by step. Verify the outcomes of the algorithm. You should be free to change the method, as long as the outcome, the real thing you expect, is not changed.

Imagine a sorting problem. Would you verify it’s internal algorithm? What for? It’s got to work and it’s got to work well. Remember, you want good design and security. Apart from this, it should be free to change. Your tests should not stay in the way.

Now for another horrible example.

@Unroll("test merchant constraints field #field for #error")
def "test merchant all constraints"() {
  when:
    def obj = new Merchant((field): val)

  then:
    validateConstraints(obj, field, error)

  where:
    field                     | val                                    | error
    'name'                    | null                                   | 'nullable'
    'name'                    | ''                                     | 'blank'
    'name'                    | 'ABC'                                  | 'valid'
    'contactInfo'             | null                                   | 'nullable'
    'contactInfo'             | new ContactInfo()                      | 'validator'
    'contactInfo'             | ContactInfoFactory.createContactInfo() | 'valid'
    'businessSegment'         | null                                   | 'nullable'
    'businessSegment'         | new MerchantBusinessSegment()          | 'valid'
    'finacleAccountNumber'    | null                                   | 'nullable'
    'finacleAccountNumber'    | ''                                     | 'blank'
    'finacleAccountNumber'    | 'ABC'                                  | 'valid'
    'principalContactPerson'  | null                                   | 'nullable'
    'principalContactPerson'  | ''                                     | 'blank'
    'principalContactPerson'  | 'ABC'                                  | 'valid'
    'principalContactInfo'    | null                                   | 'nullable'
    'principalContactInfo'    | new ContactInfo()                      | 'validator'
    'principalContactInfo'    | ContactInfoFactory.createContactInfo() | 'valid'
    'feeCalculator'           | null                                   | 'nullable'
    'feeCalculator'           | new FixedFeeCalculator(value: 0)       | 'valid'
    'chain'                   | null                                   | 'nullable'
    'chain'                   | new Chain()                            | 'valid'
    'customerWhiteListEnable' | null                                   | 'nullable'
    'customerWhiteListEnable' | true                                   | 'valid'
    'enabled'                 | null                                   | 'nullable'
    'enabled'                 | true                                   | 'valid'
}

Do you understand what’s going on? If you haven’t seen it before, you may very well not. The “where” part, is a beautiful Spock solution for parametrized tests. The headers of those columns are the names of variables, used BEFORE, in the first line. It’s sort of a declaration after the usage. The test is going to be fired many times, once for for each line in the “where” part. And it’s all possible thanks to Groovy’s Abstract Syntaxt Tree Transofrmation. We are talking about interpreting and changing the code during the compilation. Cool stuff.

So what this test is doing?

Nothing.

Let me show you the code under test.

static constraints = {
  name(blank: false)
  contactInfo(nullable: false, validator: { it?.validate() })
  businessSegment(nullable: false)
  finacleAccountNumber(blank: false)
  principalContactPerson(blank: false)
  principalContactInfo(nullable: false, validator: { it?.validate() })
  feeCalculator(nullable: false)
  customerWhiteListEnable(nullable: false)
}

This static closure, is telling Grails, what kind of validation we expect on the object and database level. In Java, these would most probably be annotations.

And you do not test annotations. You also do not test static fields. Or closures without any sensible code, without any behavior. And you don’t test whether the framework below (Grails/GORM in here) works the way it works.

Oh, you may test that for the first time you are using it. Just because you want to know how and if it works. You want to be safe, after all. But then, you should probably delete this test, and for sure, not repeat it for every single domain class out there.

This test doesn’t event verify that, by the way. Because it’s a unit test, working on a mock of a database. It’s not testing the real GORM (Groovy Object-Relational Mapping, an adapter on top of Hibernate). It’s testing the mock of the real GORM.

Yeah, it’s that stupid.

So if TDD gives us safety, design and feedback, what does this test provide? Absolutely nothing. So why did the programmer put it here? Because his brain says: tests are good. More tests are better.

Well, I’ve got news for you. Every single test which does not provide us safety and good design is bad. Period. Those which provide only feedback, should be thrown away the moment you stop refactoring your code under the test.

So here’s my lesson number three:

PROVIDE SAFETY AND GOOD DESIGN, OR BE GONE.

That was the example of things gone wrong. What should we do about it?

The answer: delete it.

But I yet have to see a programmer who removes his tests. Even so shitty as this one. We feel very personal about our code, I guess. So in case you are hesitating, let me remind you what Kent Beck wrote in his book about TDD:

The first criterion for your tests is confidence. Never delete a test if it reduces your confidence in the behavior of the system.
The second criterion is communication. If you have two tests that exercise the same path through the code, but they speak to different scenarios for a readers, leave them alone.
[Kent Beck, Test Driven Development: by Example]

Now you know, it’s safe to delete it.

So much for today. I have some good examples to show, some more stories to tell, so stay tuned for part 2.

You May Also Like

Sample for lift-ng: Micro-burn 1.0.0 released

During a last few evenings in my free time I've worked on mini-application called micro-burn. The idea of it appear from work with Agile Jira in our commercial project. This is a great tool for agile projects management. It has inline tasks edition, drag & drop board, reports and many more, but it also have a few drawbacks that turn down our team motivation.

Motivation

From time to time our sprints scope is changing. It is not a big deal because we are trying to be agile :-) but Jira's burndowchart in this situation draw a peek. Because in fact that chart shows scope changes not a real burndown. It means, that chart cannot break down an x-axis if we really do more than we were planned – it always stop on at most zero.

Also for better progress monitoring we've started to split our user stories to technical tasks and estimating them. Original burndowchart doesn't show points from technical tasks. I can find motivation of this – user story almost finished isn't finished at all until user can use it. But in the other hand, if we know which tasks is problematic we can do some teamwork to move it on.

So I realize that it is a good opportunity to try some new approaches and tools.

Tools

I've started with lift framework. In the World of Single Page Applications, this framework has more than simple interface for serving REST services. It comes with awesome Comet support. Comet is a replacement for WebSockets that run on all browsers. It supports long polling and transparent fallback to short polling if limit of client connections exceed. In backend you can handle pushes in CometActor. For further reading take a look at Roundtrip promises

But lift framework is also a kind of framework of frameworks. You can handle own abstraction of CometActors and push to client javascript that shorten up your way from server to client. So it was the trigger for author of lift-ng to make a lift with Angular integration that is build on top of lift. It provides AngularActors from which you can emit/broadcast events to scope of controller. NgModelBinders that synchronize your backend model with client scope in a few lines! I've used them to send project state (all sprints and thier details) to client and notify him about scrum board changes. My actor doing all of this hard work looks pretty small:

Lift-ng also provides factories for creating of Angular services. Services could respond with futures that are transformed to Angular promises in-fly. This is all what was need to serve sprint history:

And on the client side - use of service:


In my opinion this two frameworks gives a huge boost in developing of web applications. You have the power of strongly typing with Scala, you can design your domain on Actors and all of this with simplicity of node.js – lack of json trasforming boilerplate and dynamic application reload.

DDD + Event Sourcing

I've also tried a few fresh approaches to DDD. I've organize domain objects in actors. There are SprintActors with encapsulate sprint aggregate root. Task changes are stored as events which are computed as a difference between two boards states. When it should be provided a history of sprint, next board states are computed from initial state and sequence of events. So I realize that the best way to keep this kind of event sourcing approach tested is to make random tests. This is a test doing random changes at board, calculating events and checking if initial state + events is equals to previously created state:



First look

Screenshot of first version:


If you want to look at this closer, check the source code or download ready to run fatjar on github.During a last few evenings in my free time I've worked on mini-application called micro-burn. The idea of it appear from work with Agile Jira in our commercial project. This is a great tool for agile projects management. It has inline tasks edition, drag & drop board, reports and many more, but it also have a few drawbacks that turn down our team motivation.