This article is the first of a series devoted to a style of TDD that I have gradually elaborated throughout my 16 years of TDD practice in very different contexts. Each article will look at a particular theme around this style (what concrete problems does it tackle? What are the main characteristics of this style? What are the most frequent trade-offs and objections one could make, etc.)? This first article takes up the reasons that have motivated the emergence of this model over the years: to be compatible with the psychology of dev people, regardless of their level or maturity
Note: For those who would like to know more about it, you can refer to the talk I recently gave to DDD Africa, available here: https://youtu.be/djdMp9i04Sc
Sad Panda
Less than 2 years ago, I started to work for a large
international hotel group. One of my missions was to transform 2 back-end developers
(working on all web sites and mobile apps) into a real "API team"
(there are 6 of us now). Our goal: to support the different business lines on
every possible topic (marketing, distribution, finance, etc.).
When I joined this client, I found the people, the platform and the domain very fascinating, but their testing strategy was definitely not one of their strengths. Despite competent and motivated people, there was a whole bunch of sub-best-practices present and lots of hesitations, ultimately all because there were not enough tests. Indeed, the writing of tests was far from being a regular practice and the only one actually writing tests was applying a test-after strategy.
As a result, we had an infernal PR-based system of branches and quality gate code reviews over several days (despite the fact that there was only 3 people working on the same code base). As you can guess, we were very far from continuous delivery… People also felt a little bit infantilized by the numerous remarks and roundtrips made with the tech lead during the gated code reviews. Due to lack of pair programming and automated tests, this was mandatory for everyone before being able to push some code into production.
And if you wanted to add tests, you might be intimidated. Indeed, the overall design of the API was quite complex and our test suite was incredibly painful and complex too. We had never-ending test suite setups (more than 700 lines of test initialization in one test suite class for instance), side effects everywhere (with initialization made through test suite private fields/members) and strange beasts all over the place (i.e.: half concrete implementation / half stub) thanks to some dark magic with our mock framework (partial stubs… trust me, you don’t want to know ;-). For a reason that I still didn’t get (because he's so smart otherwise), the tech lead had a very strong personal principle he was insisting on with everyone: “one should never change our implementation design for testing reasons” (do not add a constructor for testing purpose for instance…)
In short: when they existed, the tests were considered as second-class code. They were complex and rarely refactored cranking out code... Enough to tie knots in our brains on a daily basis, and not attractive at all for those who yet did not write tests… Every time we touched the code base, we needed both code review and to run huge QA end-to-end tests campaign. Some of them were a little bit discouraged and disengaged from this bureaucratic burden.
Typically, the kind of situation I have seen a dozen times here and there in my experience. Because unfortunately, this is not an isolated case. Throughout all these years of practicing Test Driven Development I had to face the fact:
There are still very few people who practice TDD (unfortunately)
I've added "unfortunately" because when I
see what we are able to do now with this API team on other components, when I
think back to what this practice and way of thinking/coding has brought to me
personally (in terms of serenity, efficiency and pleasure), I tell myself that
it's a shame that TDD is still not shared more widely.
I already said it multiple times in articles or conferences in French: from a personal perspective, TDD has acted for me as a bulwark against many of my former biases (procrastination, blank page anxiety / analysis paralysis, doubts). Err… actually I still procrastinate sometimes but never while coding anymore 😉
So yes, I definitely think that it’s a shame that most people have not tried it, nor experienced it in pleasant conditions. I talk about pleasant conditions because there are indeed many pitfalls related to this great practice (and I think I’ve fallen into every one of them over the years). One can even turn it a nice situation into a quagmire if left unchecked to help.
Which leads me to my second observation: those who have tried it also often stopped along the way, instead of persevering in the face of certain implementation difficulties. TDD is one of the most trolling Straw Man in IT (and not only on Hacker news ;-) People are talking about something they think it is but isn't actually. Sad Panda.
But to be honest, some of these difficulties with TDD are ubiquitous. I found them so many times in so many different teams with various levels & maturity. I found them in so many different types of companies, in so many different domains too (finance, health, hospitality, energy, transport,) during all these years.
Always the same again and again. These 3 recurrent difficulties are:
Disclaimer: Some people may notice here that testing
is just a side effect of TDD (which serves a lot more than that). Of course, that
is true. But I think that testing is nonetheless an important chapter too.
Reason why we will detail later to each of these commonly encountered difficulties in the next article. In the meantime, I would like to zoom-in today on what I find problematic for the overall understanding of people.
A negative influence on people that I have observed over and over again onto people: the mental model of the pyramid of tests (and the confusing term of "unit test").
The pyramid of confusion
Okay. This is not really breaking news. Many people
have already approached this theme over the last decade or more, warning everyone
against the fuzziness of this mental model and the pitfalls of understanding one
should not fall into with this metaphor (see. Seb Rose in particular: https://cucumber.io/blog/bdd/eviscerating-the-test-automation-pyramid/ ).
But I will be rougher with this "monument" that the Pyramid of Tests has become. Indeed, all these years of noticing the harmful effects of this mental model onto teams and people's code bases, makes me finally think that this pyramid is really harmful for the vast majority. It’s not “just” a “good opportunity to realize that one may write different kinds of tests”. In most cases it’s a cargo-cult factory.
Same applies to the term “unit test”.
Lost in translation
The fact that we are still using this “unit test” terminology
despite the fact that we all know that this is confusing for everyone still
pisses me off. You want to start an argument on twitter? You just have to say
something about “unit tests” and you will see how many different definitions
and mental models people will suggest or impose on you (implicit for them).
The definition I liked is not the definition the majority of people I’ve met retains. I liked the “unit test” definition from Kent Beck:
“tests that “runs in isolation” from other tests”
And don’t miss that specific point: It’s about the isolation of the test, not about the isolation of the topic/system under test (as brilliantly silver lined by Ian Cooper in his great talk: “TDD Where Did It All Go Wrong”: https://www.youtube.com/watch?v=EZ05e7EMOLM).
But there is one thing for sure: when you ask people what a “unit test” is… 90% of the answers will be “a test of a tiny module, a type, a class, a method, a function…”
Hence, we can try whatever we want… it doesn’t seem to work mainstream over the years despite the books, the conferences, the blog posts, the coaching tips, etc. I personally tried to foster Beck’s definition all around me over the last decade and more, but I have to admit: this is a game lost in advance. Definitely.
We aren’t talking about bullshit here, but it somehow reminds me of Alberto Brandolini’s law:
“The amount of energy needed to refute bullshit is an order of magnitude larger than to produce it”
Reason why I stopped being precise with this concept of “unit test” over the years. I finally avoided as much as possible to use this “unit test” term and to refer to its bundled pyramid too (unless I have a discussion with an expert on the topic).
And I’m not the only one. Other people also avoided the “unit test” terminology over the years (like Gee Paw Hill and his concept of “Microtests” for instance). My strategy was to simply focus on Acceptance tests instead (which are Component/API tests). The very same coarse grained “Acceptance tests” Nat Pryce and Steve Freeman are talking about in their great GooS book (http://www.growing-object-oriented-software.com/).
It’s just a pragmatic decision in order to avoid confusion and misunderstanding all around me again and again.
Anyway, as for the term “unit test”, the Benefit-Cost ratio of the Pyramid being really negative, I was looking for a better way to have discussions about testing strategies with teams and beginners.
Testing Gem
While I was looking for an alternative to this bloody
pyramid a few years ago, I finally came with the Diamond shape (See. https://twitter.com/tpierrain/status/964018082434945024?s=20 ).
As a visual incentive, this fits perfectly with my invitation to write more acceptance tests (coarse grained) than fine grained ones (the latter being those that people - not experts - continue to call "unit tests").
The "diamond" shape largely clarifies things for beginners. Me and some friends had the confirmation of it many times in my experiences at work.
Make the implicit, explicit
In my experience, the
“do not test implementations, test behaviors instead”
recommendation that we used to bundle as a reading guide with the pyramid was not enough. It still left lots of beginners on the straw (because too vague and complicated to understand and translate into action).
On the other hand, I found much more readable and actionable for everyone the hint:
“write more (coarse grained) acceptance tests than fine grain tests (what the majority of people - but not experts - call “unit tests”)”
At least that's what I've experienced around me for the last 10 years now.
BDD or not BDD? (that is a question)
Even if the diamond is more precise and more readable
for the vast majority of people that I have met, there is still something that
bothers some of them. This usually comes with one of those 2 questions:
- “Are your Acceptance tests written in Gherkin (a.k.a. Given-When-Then, aka ”BDD” tests)? ”
- “Are your acceptance tests expressed in classic code, by and for dev people?”
My answer is rather yes to the latter. As far as I'm concerned, I've decided to only pay the cost of adaptation with a BDD technical Framework if I have business stakeholders who are interested in it.
Which, in the end, is very, very rare.
I do BDD very often, but mostly the “discovery” part, rarely the “formulation” part (Gherkin land) and almost no longer the “automation” part (Cucumber, Specflow, etc.). But whatever the context in which I work, these 2 types of tests ("Acceptance" or "BDD" to put it simply) are coarse grained acceptance tests targeting the Behavior of a component, of a service or of an API.
They mainly differ by their expressiveness (because of their target audience) and the additional cost of implementation with the "BDD-Gherkin" support.
For the record, acceptance testing is what most
experts consider to be true unit tests (whereas the mainstream dev people
continue to think "unit tests" as type or class level fine-grained
tests).
How my TDD practice evolved: the origins of Outside-in Diamond 🔷 TDD
Beyond the diamond model to
illustrate the overall testing strategy, it is both a style and a workflow.
Some important expression characteristics and a way of designing and writing
software.
We will see all of this in the future articles of this series. But before that, I'd like to end this article by emphasizing how I
converged to this style (and those compromises).
The diamond is just a marketing or mnemonic aspect of Outside-in Diamond 🔷 TDD.
Fueled with Empathy
All the trade-offs that I have been adopting over the last
decade have the same starting point: the psychology and the reactions of the dev
people I have worked or discuss with (in meetup events, user groups,
conferences).
In front of a recurrent problem, I tried to bend my practice so that it fixes the situation (instead of thinking that one can change people). Here are some examples:
-----
PROBLEM: The testing pyramid mislead people despite
our warnings and disclaimers year after year
TRADE-OFF: I searched another model that speaks more to the majority (and not only to the experts) => the Diamond
-----
PROBLEM: Mainstream people don’t share the same
definition for “unit tests” and really struggle to understand what kind of
tests to write.
TRADE-OFF: I interpret “unit test” like the vast majority (i.e., fine grained tests against classes etc.) and I advocate people to talk about and to write (coarse grained) Acceptance tests instead.
-----
PROBLEM: We have tiny bugs in technical code because
as dev we love to write "business" tests BUT we struggle to put as
much energy and carefulness into slower and more technical contract /
integration tests (i.e., “happy path” kingdom ;-)
TRADE-OFF: I stub only the last miles just before the I/Os so that all the code before may be also included in our blazing fast Acceptance tests
-----
PROBLEM: (variant) When using Hexagonal Architecture, our Adapters on the infrastructure side often have blind spots & silly bugs because it is said everywhere that we need to cover the adapter only in Contract tests (slow, painful to write and -thus- often considered as second-class citizens for people)
TRADE-OFF: I suggested another out-of-the-box but successful testing strategy that includes the adaptation code of our adapters in our acceptance tests. These tests remain blazing fast, without I/Os and which nevertheless remain concise and Domain-driven.
-----
PROBLEM: Our tests code and setup are too long,
complex and introduce mental burden and cognitive overload.
TRADE-OFF: I shortened and simplified Acceptance tests expressivity to nothing more than 8-12 lines thanks to domain driven helpers, fuzzers and builders. Everything is directly driven from the test (and not elsewhere).
Etc. etc.
A long Journey
The result of this process, which took years, is what
my partner (Bruno Boucard) and I now call Outside-in Diamond 🔷 TDD.
The basis of this style has been there for 10 years, and it was only recently that I was able to solve some problems and get a presentable form (with the help of fuzzers in addition to the Builders for my acceptance tests which remain short, antifragile & Domain-driven).
Now that we have covered the why, the next articles ofthis series will talk about the how and the associated trade-offs.
Stay tuned!
Note: for those that would like to dive more into this style of TDD, you can refer to the recent talk I've made on the subject for DDD Africa: https://youtu.be/djdMp9i04Sc
Nice content Thomas. Will be looking forward to the follow ups. Bre do you have a link explaining fizzers?
ReplyDeleteNice content Thomas. Will be looking forward to the follow ups.
ReplyDeleteHello Thomas, It's really interesting what you just write. I use almost identical approach in testing(I test only on application service level because my controller are usually really simple). I feel a little struggle how would you test system state transition when you use CQRS with async between write model and read model. What are your thought about it?
ReplyDelete