Integration Tests: The Spaghetti of Software Development

by David Snook

June 18, 2024

Matrix Spaghetti

TLDWTRBTPMMH (Too Long, Don’t Want to Read but the Picture Makes Me Hungry)

Integration tests are necessary but almost certainly overused in the typical end-to-end fashion:

Necessary to show that everything actually works
But only available at the end of implementation when integrating real components together
Difficult, or maybe even impossible, to debug
And generally horribly slow, eating up enormous amounts of time and resources

Yes, Another Food Analogy

In my previous post I compared unit tests to french fries and lamented that while I love them, they are insufficient for proving that software actually works. The individual parts might work fine, according to some theoretical specifications, but when all the parts are put together there is still a real risk that the software won’t actually work for the user. Lots of calories, but maybe a lot of empty calories, and all that waste could go to our collective waist, so to speak.

But if testing components in isolation is not the (complete) answer, then perhaps we need something more substantial, like integration tests? They are pretty much the opposite of tiny little unit tests, so if we are going to swing the pendulum wildly, as we often do, then why don’t we dive into a heartier repast that is more of a full meal, like spaghetti? Mmm, spaghetti!

It will be a mistake, of course, like most wild swings of the pendulum, but reasoning through the limitations of integration tests will hopefully set the table for a healthier, more effective approach to testing.

Yeah, I know, I am really straining the limits of these food analogies.

More Substance

But if we are looking for something more substantial than little light-weight unit tests, then integration tests certainly should be on the menu. For one thing, they are able to do the one thing that matters most to users – show that the software actually does something valuable.

Whereas unit tests show that units work, integration tests show that all these units integrated together can actually work together. If you want to know if your software can open a file and transform it from one format to another, for instance, then there is nothing more convincing than if you can put your software bits together and open a file and transform it from one format to another. If it works, then it works, in a tautological sense.

Late to the Table

One problem with integration tests, however, is that the parts generally don’t all come together until very late in the development process. Like spaghetti, many software parts are cooked separately and then combined together only at the last moment, like on the plate.

Even if some parts come together early and can be tested together before the final integration, they have limited usefulness. If the sauce can be cooked together and tests out OK, does that mean that the full spaghetti dish will be good? No guarantees. The pasta could be mushy – mamma mia!

This means that full integration tests can’t be used to provide shaping feedback during the development process. If they fail at the end, the only help that they can provide is to tell what doesn’t work for sure, and to suggest that it might not be time to ship yet. Oops.

So Many Ways They Can Fail

And when integration tests fail, as they are wont to do, there are so many ways that they can fail. It can be just a single component, used differently than in the unit tests, or more likely, it is in the interaction between multiple components.

Whereas good unit tests fail for only a single reason and point directly to the code that needs to be changed, integration tests could be pointing their fingers in many directions at once. Where do you look? In theory, since the whole system is integrated together, it could be anywhere in the system, including the environment (OS version, low disk space, time changes, ect.).

What’s more, integration test failures are not always deterministic. Sometimes they fail, which tells you that something is definitely wrong, but then sometimes they pass, and you know something is still wrong.

And good luck trying to debug a test failure across components, interactions with the OS, etc., especially if the failure is timing-dependent. It could be next to impossible! In terms of our savory analogy, imagine trying to trace along the length of a single strand of spaghetti, as it weaves around in the pile, keeping track of everything that it touches, but without moving anything else.

So Many Ways They Should Pass

When integration tests pass, on the other hand, they usually pass in very constrained ways. Like maybe just one way.

How many integration tests start from a clean slate, with a newly constructed solution and pristine, minimal configuration? Almost all of them? But then how often does that same setup occur for users? Maybe only once, at the very beginning?

In our spaghetti analogy, that might be like laying out the spaghetti noodles in straight lines, pouring on the sauce just in the middle, and – che scemo! – leaving off the parmesan cheese. And then only testing that configuration.

But users need the value of a feature under many different conditions. The test needs to pass when it is freshly plated, so to speak, but it also needs to pass when they have stirred up their spaghetti and when they have added cheese and when they twirl a forkful in their spoon in that cool way and…you get the idea. Lots of different configurations.

So Many Calories

Following the realization about the multitude of possible configurations to its illogical conclusion, we might try to have a separate integration test for every different configuration. At first, that might feel good, psychologically, as we down plate after plate of spaghetti, but full integration tests are likely the most time consuming and expensive tests that we have on the menu. And in any sufficiently complex system, which is the kind that delivers value, there will always be more combinations of conditions and configurations.

I don’t have to tell you what happens when we scarf down plate after plate of slow, expensive tests any more than I have to tell you what happens with plate after plate of food. A certain scene from the movie “Monty Python’s Meaning of Life” comes to mind, though, in which a gluttonous restaurant patron eats so much that he explodes. I’ve said too much already.

But if we eschew these additional test configurations and try to just ignore the fact that the conditional coverage of our testing is weak, we still manage to waste calories by running our few integration tests excessively. Why? Because we run them every time that we do an integration, for every additional feature that we add, maybe even every code change committed to the repository. Why? Because we can’t tell when an individual change is isolated or not, so to be safe we run them just in case. Why? Because systems really are complex, and we don’t have an easy way to relate code changes to the subset of tests that should be affected, and we don’t have anything else that proves that we can do at least some valuable things and we need that psychological boost.

That’s like having spaghetti for breakfast, lunch and dinner. Every day.

Smorgasbord or Hybrid? Yes, Please

So, if unit tests are necessary for initial development, but not sufficient, and if integration tests are also necessary for final verification, but exorbitantly expensive if we want good coverage (yet still insufficient), what do we do?

Do we have a mixture of them both, a few french fries and then some spaghetti? Or do we need to come up with some hybrid that has elements of both?

Yes! I am going to argue that we need them all – unit tests (but maybe more automatic), full integration tests (but perhaps more sparingly), and a hybrid sort of test that combines elements of both.

Ideally, we can use a mixture of these basic test extremes and then combine their best elements to come up with a testing approach that provides design feedback throughout the development process, allows for early integration, and provides much more reasonable assurance of software quality.

Hungry for more? I promise that I will give up on these food-based analogies at some point, but my next post will be at least one more: “Property-based Tests: The Yam Fries of Software Development”.

Until next time, buon appetito!

Revision History

Date	Comment
2024-03-11	: Initial version