Friday, 26 March 2021

Outside-in Diamond 🔷 TDD #2 (anatomy of a style)

After talking about the WHY and the reasons that have motivated the emergence of this style of TDD over the years (basically, being compatible with people's psychology and helping them with their recurring misunderstandings about TDD), this second article will be about the HOW. To do this, we will see some examples and list the main characteristics of tests written with this style.                                                                                

Note: For those who would like to know more about it, you can refer to the talk I recently gave to DDD Africa, available here: https://youtu.be/djdMp9i04Sc or to the first article of this series (exploring the WHY)


Outside-In Diamond TDD


First, it's a workflow

Before we see the various kind of tests Outside-In Diamond 🔷 TDD cares about, I would like to focus on the writing dynamics of these.


As its name suggests, Outside-In Diamond 🔷 is a style where we draw the shape of our System-Service-API-Application from the beginning based on our use cases and business needs (which each turn into acceptance tests). The ability to drive everything from the outside (i.e. from the consumption of our System) allows us not to get lost along the way and to avoid coding useless things that would not be directly necessary for one of our use cases. This is what makes this TDD-style devilishly efficient. But nothing new here (or related to Outside-In Diamond 
🔷). It's just classic old Outside-In bringing its intrinsic benefits.

Outside-In vs. Classicist TDD


A frugal style

Indeed, this outside-in dynamic (through exclusive external uses) combined with triangulation allow us to stick to the YAGNI principle (You Ain't Gonna Need It).

One may notice here that triangulation is more often associated with classist TDD (i.e. Inside-Out workflow) than with Outside-In TDD. And that's one of the reasons why it is often complicated for some TDD practitioners to realize that these are orthogonal topics.

As with the traditional double-loop presented by Nat PRYCE and Steve FREEMAN, we start by writing a first acceptance test against a black box (i.e., our System-Service-API-Application) which does not yet exist and whose outlines will be sketched from our interaction with it. 


For instance, if I'm coding a web API, my subject under test -the entry point of my black box- will usually be a web controller on which I'm going to call a public method (i.e. our very first Operation for this System-Service...).

Once our very first test is red (RED), I will generally turn it green as quickly as possible, by hard-coding in my web controller the response expected by my test (GREEN). The refactoring phase will undoubtedly be an opportunity to do some design by bringing out a hexagon type (if I'm using an Hexagonal Architecture) or any facade for my domain (REFACTOR).


I will then continue by writing a second acceptance test (notice that we are still in the big loop), which will usually suggest another case for the same operation (RED). To turn green as quickly as possible, I will usually add another hard-coded value in my implementation code, while preserving my first case-test using an if statement (GREEN). 

The refactoring-design phase will then serve me to perhaps introduce a right-side port (in the hexagonal architecture sense) in order to start replacing the hard-coded value in my domain by a value found or computed from another hard-coded value returned by my right-side port. Usually, this will slightly impact the initialization code of the test by injecting the right-side port (interface) to our SUT/Subject Under Test. In our case: our web controller (REFACTOR).


One and a half loop... and not a mockist style

At this point, I still haven't written a fine-grained test (the one belonging to the small/inner loop), and I'm about to write my 3rd acceptance test (still in the big loop). This will add a new case to be handled by my current API operation (RED). And this is where triangulation comes in. 


In the context of TDD, Triangulation is the fact of generalizing an implementation only from the second or third (hard-coded) case. 


Let's see that in action.

Here we can make our 3rd test pass as quickly as possible by adding a second if statement and a third hard-coded value (GREEN). Then we can dedicate our design step by refactoring our implementation in baby steps mode and through what looks like a strangler pattern strategy (as once pointed out to me by my friend and talented eXtreme Programmer: Philippe BOURGAU). 


This can be achieved by positioning our new domain material in the implementation code just before the existing if statements. Once our old implementation is eclipsed by the new one, we can remove all these hard coded values from our code (REFACTOR).

The idea is to move with baby steps and without our tests being broken (note: in C#, I'm working with NCrunch, a live test runner that automatically builds and runs all my tests in the background as soon as I change my code. This is really helpful and acts as gamification in order to be always GREEN during the REFACTORING step through baby steps).


Having mostly Acceptance tests does not mean that we aren't taking baby steps! (far from it)


This is the appropriate moment to create intermediate fine-grained (unit) tests in what we call the double or small or inner loop. My personal heuristic over time is to only write those tests if I feel the need for it (ie when facing with a difficulty or when being in a "tunnel effect" for more than 10 minutes). 


In the end, the Outside-in Diamond 🔷 style doesn't put too much pressure on writing lots of little fine-grained (unit) tests. It's à la carte (depending on the maturity of the people I'm mobbing or pairing with). 


One and a half loop - Outside-In Diamond TDD


Moreover, these intermediate tests are very often removed once the implementation is complete. A bit like removing wedges and wooden battens that helped us assemble a concrete wall when it is dry.

I will probably come back to the Outside-in Diamond ðŸ”· workflow  later (and how it fits with design decisions) in an upcoming live coding session. That should be easier to illustrate all this.

Now let's see what the Outside-In Diamond 🔷 TDD acceptance tests look like.


Focus on our Acceptance tests

Yes, let's zoom-in on our favorite tests (the Acceptance ones). These are:

  • Short: no more than 7-15 lines of code per test. To achieve this we use builders to initialize the test context, fuzzers to quickly and randomly generate values, and helper methods for intention-driven assertions (expressed in 1 line). In addition to relieving our mental load when reading our tests, the advantage of having short tests lies in having as little "implementation detail" as possible and as "intention" oriented as possible. Intentions are generally less fragile than implementations.
Sample of Outside-In Diamond TDD


  • Domain-Driven: Indeed, our tests should express Domain concerns with words belonging to our considered Context (see “ubiquitous language” and "Bounded Contexts" from DDD). We will particularly use builders to declare business intentions, and not getting lost with implementation details. Good test builders publicly expose domain intentions and behaviors, and fully encapsulate implementation details and stub configuration privately.
Ubiquitous Language DDD Outside-In Diamond TDD


  • Blazing Fast: between sub-millisecond and 400 milliseconds max per test. To achieve this, we will use stubs to be able to avoid any I/O (always more expensive in terms of latency budget). Stubs will only be used for systems external to our System-Service-API-Application. Warning: Outside-In Diamond TDD is an outside-in style, but it is not a "mockist" style.

  • Isolated and autonomous: no use of member variables or mutable private fields belonging to the test suite | fixture. No initialization via [Setup] methods for the test suite | fixture either. Even if we use builders and fuzzers, any creation (with its intentions) must be declared from the test and stored in local variables (inaccessible by other tests). This helps to avoid the cognitive overload that occurs when people have to go elsewhere to check what is already prepared or initialized before each test. And please, don't get me started with the painful TestFixture|Suite inheritance and setup made in a TestSuite base class ;-)

  • Deterministic: even if we intensively use Fuzzers which propose random values, a means must be provided (in general by the fuzzing library) to be able to replay the same Test under exactly the same conditions (in general by reusing the same seed). This is essential in order to be able to reproduce and understand any failure that occurred once in a test execution (on the software factory or on a dev workstation).

  • Behavior-driven: we try to hide everything that is technical. These tests are always doing the same thing: we ask our black box (to do) something and we check that its answers suit our expectations. The checks (or assertions made) are generally encapsulated in test helper methods that allow us to be concise and business-oriented (regardless of the assertion library you use and the number of checks).
Outside-In Diamond TDD Acceptance test sample


  • Similar: We highly rely on the power of sameness (for instance to reuse the same variable names for the same domain concepts across tests) in our test code in order to smooth our future tests refactoring (e.g.:  to ease possible search and replace). We consider our test code as production code. We improve it and refactor it regularly too.

  • Antifragile: by default, thanks to all the features mentioned, one can easily change and refactor our implementation code without breaking contracts exercised by our tests. To put it another way, our tests age well and don't break when we change the internal structure of our code. They should only break if we introduce a bug or a regression. 

  • Broad spectrum: even if they hide it well (thanks to builders and helpers), our Acceptance tests cover a broad spectrum of our code base and include the real adapters in case of hexagonal architecture (instead of subbing them). This is the opposite of what people usually recommend, but this is the most effective testing strategy that I have ended up over the past 8 years of putting hexagonal architectures in production (in different contexts and for different customers). Since this is more than an Argument from authority ;-) I will dedicate the next article to illustrate and explain all these trade-offs.


Next episode: integration tests & hexagonal architecture tradeoffs

In order to write articles that are a little shorter than usual, I will not describe here the characteristics of our contract tests (i.e. our integration tests). These will be presented in detail in our next article dedicated to our test strategy in case of a hexagonal architecture.

Talk to you soon & Happy testing



PS: Since it has been asked many times, the fuzzer we use at work is Diverse.

Friday, 12 March 2021

Outside-in Diamond 🔷 TDD #1 - a style made from (& for) ordinary people

This article is the first of a series devoted to a style of TDD that I have gradually elaborated throughout my 16 years of TDD practice in very different contexts. Each article will look at a particular theme around this style (what concrete problems does it tackle? What are the main characteristics of this style? What are the most frequent trade-offs and objections one could make, etc.)? This first article takes up the reasons that have motivated the emergence of this model over the years: to be compatible with the psychology of dev people, regardless of their level or maturity


Note: For those who would like to know more about it, you can refer to the talk I recently gave to DDD Africa, available here: https://youtu.be/djdMp9i04Sc





Sad Panda


Less than 2 years ago, I started to work for a large international hotel group. One of my missions was to transform 2 back-end developers (working on all web sites and mobile apps) into a real "API team" (there are 6 of us now). Our goal: to support the different business lines on every possible topic (marketing, distribution, finance, etc.).


When I joined this client, I found the people, the platform and the domain very fascinating, but their testing strategy was definitely not one of their strengths. Despite competent and motivated people, there was a whole bunch of sub-best-practices present and lots of hesitations, ultimately all because there were not enough tests. Indeed, the writing of tests was far from being a regular practice and the only one actually writing tests was applying a test-after strategy.


As a result, we had an infernal PR-based system of branches and quality gate code reviews over several days (despite the fact that there was only 3 people working on the same code base). As you can guess, we were very far from continuous delivery… People also felt a little bit infantilized by the numerous remarks and roundtrips made with the tech lead during the gated code reviews. Due to lack of pair programming and automated tests, this was mandatory for everyone before being able to push some code into production.


And if you wanted to add tests, you might be intimidated. Indeed, the overall design of the API was quite complex and our test suite was incredibly painful and complex too. We had never-ending test suite setups (more than 700 lines of test initialization in one test suite class for instance), side effects everywhere (with initialization made through test suite private fields/members) and strange beasts all over the place (i.e.: half concrete implementation / half stub) thanks to some dark magic with our mock framework (partial stubs… trust me, you don’t want to know ;-). For a reason that I still didn’t get (because he's so smart otherwise), the tech lead had a very strong personal principle he was insisting on with everyone: “one should never change our implementation design for testing reasons” (do not add a constructor for testing purpose for instance…)


In short: when they existed, the tests were considered as second-class code. They were complex and rarely refactored cranking out code... Enough to tie knots in our brains on a daily basis, and not attractive at all for those who yet did not write tests… Every time we touched the code base, we needed both code review and to run huge QA end-to-end tests campaign. Some of them were a little bit discouraged and disengaged from this bureaucratic burden.


Typically, the kind of situation I have seen a dozen times here and there in my experience. Because unfortunately, this is not an isolated case. Throughout all these years of practicing Test Driven Development I had to face the fact:



There are still very few people who practice TDD (unfortunately)


I've added "unfortunately" because when I see what we are able to do now with this API team on other components, when I think back to what this practice and way of thinking/coding has brought to me personally (in terms of serenity, efficiency and pleasure), I tell myself that it's a shame that TDD is still not shared more widely.


I already said it multiple times in articles or conferences in French: from a personal perspective, TDD has acted for me as a bulwark against many of my former biases (procrastination, blank page anxiety / analysis paralysis, doubts). Err… actually I still procrastinate sometimes but never while coding anymore 😉


So yes, I definitely think that it’s a shame that most people have not tried it, nor experienced it in pleasant conditions. I talk about pleasant conditions because there are indeed many pitfalls related to this great practice (and I think I’ve fallen into every one of them over the years). One can even turn it a nice situation into a quagmire if left unchecked to help.


Which leads me to my second observation: those who have tried it also often stopped along the way, instead of persevering in the face of certain implementation difficulties. TDD is one of the most trolling Straw Man in IT (and not only on Hacker news ;-) People are talking about something they think it is but isn't actually. Sad Panda.


But to be honest, some of these difficulties with TDD are ubiquitous. I found them so many times in so many different teams with various levels & maturity. I found them in so many different types of companies, in so many different domains too (finance, health, hospitality, energy, transport,) during all these years.


Always the same again and again. These 3 recurrent difficulties are:



click on the image to zoom


Disclaimer: Some people may notice here that testing is just a side effect of TDD (which serves a lot more than that). Of course, that is true. But I think that testing is nonetheless an important chapter too.


Reason why we will detail later to each of these commonly encountered difficulties in the next article. In the meantime, I would like to zoom-in today on what I find problematic for the overall understanding of people.


A negative influence on people that I have observed over and over again onto people: the mental model of the pyramid of tests (and the confusing term of "unit test").



The pyramid of confusion



Pyramid of Tests (of confusion)


Okay. This is not really breaking news. Many people have already approached this theme over the last decade or more, warning everyone against the fuzziness of this mental model and the pitfalls of understanding one should not fall into with this metaphor (see. Seb Rose in particular: https://cucumber.io/blog/bdd/eviscerating-the-test-automation-pyramid/ ).


But I will be rougher with this "monument" that the Pyramid of Tests has become. Indeed, all these years of noticing the harmful effects of this mental model onto teams and people's code bases, makes me finally think that this pyramid is really harmful for the vast majority. It’s not “just” a “good opportunity to realize that one may write different kinds of tests”. In most cases it’s a cargo-cult factory.


Same applies to the term “unit test”.



Lost in translation


Unit test is confusing

The fact that we are still using this “unit test” terminology despite the fact that we all know that this is confusing for everyone still pisses me off. You want to start an argument on twitter? You just have to say something about “unit tests” and you will see how many different definitions and mental models people will suggest or impose on you (implicit for them).


The definition I liked is not the definition the majority of people I’ve met retains. I liked the “unit test” definition from Kent Beck:


“tests that “runs in isolation” from other tests”


And don’t miss that specific point: It’s about the isolation of the test, not about the isolation of the topic/system under test (as brilliantly silver lined by Ian Cooper in his great talk: “TDD Where Did It All Go Wrong”: https://www.youtube.com/watch?v=EZ05e7EMOLM).


But there is one thing for sure: when you ask people what a “unit test” is… 90% of the answers will be “a test of a tiny module, a type, a class, a method, a function…”


Hence, we can try whatever we want… it doesn’t seem to work mainstream over the years despite the books, the conferences, the blog posts, the coaching tips, etc. I personally tried to foster Beck’s definition all around me over the last decade and more, but I have to admit: this is a game lost in advance. Definitely.


We aren’t talking about bullshit here, but it somehow reminds me of Alberto Brandolini’s law:

 

The amount of energy needed to refute bullshit is an order of magnitude larger than to produce it”

 

Reason why I stopped being precise with this concept of “unit test” over the years. I finally avoided as much as possible to use this “unit test” term and to refer to its bundled pyramid too (unless I have a discussion with an expert on the topic).



And I’m not the only one. Other people also avoided the “unit test” terminology over the years (like Gee Paw Hill and his concept of “Microtests” for instance). My strategy was to simply focus on Acceptance tests instead (which are Component/API tests). The very same coarse grained “Acceptance tests” Nat Pryce and Steve Freeman are talking about in their great GooS book (http://www.growing-object-oriented-software.com/).


It’s just a pragmatic decision in order to avoid confusion and misunderstanding all around me again and again.



Anyway, as for the term “unit test”, the Benefit-Cost ratio of the Pyramid being really negative, I was looking for a better way to have discussions about testing strategies with teams and beginners.



Testing Gem


While I was looking for an alternative to this bloody pyramid a few years ago, I finally came with the Diamond shape (See. https://twitter.com/tpierrain/status/964018082434945024?s=20 ).


As a visual incentive, this fits perfectly with my invitation to write more acceptance tests (coarse grained) than fine grained ones (the latter being those that people - not experts - continue to call "unit tests").


Outside-in Diamond TDD


The "diamond" shape largely clarifies things for beginners. Me and some friends had the confirmation of it many times in my experiences at work.



Make the implicit, explicit


In my experience, the 

do not test implementations, test behaviors instead” 

recommendation that we used to bundle as a reading guide with the pyramid was not enough. It still left lots of beginners on the straw (because too vague and complicated to understand and translate into action).


On the other hand, I found much more readable and actionable for everyone the hint:

write more (coarse grained) acceptance tests than fine grain tests (what the majority of people - but not experts - call “unit tests”)” 


At least that's what I've experienced around me for the last 10 years now.



BDD or not BDD? (that is a question)


Even if the diamond is more precise and more readable for the vast majority of people that I have met, there is still something that bothers some of them. This usually comes with one of those 2 questions:


  •  “Are your Acceptance tests written in Gherkin (a.k.a. Given-When-Then, aka ”BDD” tests)? 
  • Are your acceptance tests expressed in classic code, by and for dev people?”


My answer is rather yes to the latter. As far as I'm concerned, I've decided to only pay the cost of adaptation with a BDD technical Framework if I have business stakeholders who are interested in it. 


Which, in the end, is very, very rare.


I do BDD very often, but mostly the “discovery” part, rarely the “formulation” part (Gherkin land) and almost no longer the “automation” part (Cucumber, Specflow, etc.). But whatever the context in which I work, these 2 types of tests ("Acceptance" or "BDD" to put it simply) are coarse grained acceptance tests targeting the Behavior of a component, of a service or of an API.


They mainly differ by their expressiveness (because of their target audience) and the additional cost of implementation with the "BDD-Gherkin" support. 


For the record, acceptance testing is what most experts consider to be true unit tests (whereas the mainstream dev people continue to think "unit tests" as type or class level fine-grained tests).



How my TDD practice evolved: the origins of Outside-in Diamond 🔷 TDD


Beyond the diamond model to illustrate the overall testing strategy, it is both a style and a workflow. Some important expression characteristics and a way of designing and writing software.


We will see all of this in the future articles of this series. But before that, I'd like to end this article by emphasizing how I converged to this style (and those compromises).

The diamond is just a marketing or mnemonic aspect of Outside-in Diamond 🔷 TDD.



Fueled with Empathy


All the trade-offs that I have been adopting over the last decade have the same starting point: the psychology and the reactions of the dev people I have worked or discuss with (in meetup events, user groups, conferences).


In front of a recurrent problem, I tried to bend my practice so that it fixes the situation (instead of thinking that one can change people). Here are some examples:


-----

PROBLEM: The testing pyramid mislead people despite our warnings and disclaimers year after year


TRADE-OFF: I searched another model that speaks more to the majority (and not only to the experts) => the Diamond


-----

PROBLEM: Mainstream people don’t share the same definition for “unit tests” and really struggle to understand what kind of tests to write.


TRADE-OFF: I interpret “unit test” like the vast majority (i.e., fine grained tests against classes etc.) and I advocate people to talk about and to write (coarse grained) Acceptance tests instead.

 

-----

PROBLEM: We have tiny bugs in technical code because as dev we love to write "business" tests BUT we struggle to put as much energy and carefulness into slower and more technical contract / integration tests (i.e., “happy path” kingdom ;-)


TRADE-OFF: I stub only the last miles just before the I/Os so that all the code before may be also included in our blazing fast Acceptance tests

 

-----

PROBLEM: (variant) When using Hexagonal Architecture, our Adapters on the infrastructure side often have blind spots & silly bugs because it is said everywhere that we need to cover the adapter only in Contract tests (slow, painful to write and -thus- often considered as second-class citizens for people)


TRADE-OFF: I suggested another out-of-the-box but successful testing strategy that includes the adaptation code of our adapters in our acceptance tests. These tests remain blazing fast, without I/Os and which nevertheless remain concise and Domain-driven.

 

-----

PROBLEM: Our tests code and setup are too long, complex and introduce mental burden and cognitive overload.


TRADE-OFF: I shortened and simplified Acceptance tests expressivity to nothing more than 8-12 lines thanks to domain driven helpers, fuzzers and builders. Everything is directly driven from the test (and not elsewhere).


Etc. etc.



A long Journey


The result of this process, which took years, is what my partner (Bruno Boucard) and I now call Outside-in Diamond 🔷 TDD.


The basis of this style has been there for 10 years, and it was only recently that I was able to solve some problems and get a presentable form (with the help of fuzzers  in addition to the Builders for my acceptance tests which remain short, antifragile & Domain-driven).


Now that we have covered the why, the next articles ofthis series will talk about the how and the associated trade-offs.


Stay tuned!


Note: for those that would like to dive more into this style of TDD, you can refer to the recent talk I've made on the subject for DDD Africa: https://youtu.be/djdMp9i04Sc


TDD is evolving