Preventing bugs in spring applications with ArchUnit

2021-03-08 tech java

Some context

Our application is using the spring framework¹ with all its usual useful tools and abstractions, one of which being its wonderful support for caching.
In addition to Spring, we are using the VAVR package for its collection of monadic data types (Try, Option and Either).

The problem

After deploying a new release, we started getting reports about occasional errors. Which was a bit weird for multiple reasons, chiefly:

Our QA tends to be very thorough, and the release didn’t show any problems in the tests.
The errors occurred rarely: Not just only for requests with specific properties, but even then only for a few percent of them. Our system is usually completely deterministic, so that’s really suspicious.

After some analysis, we managed to isolate the issue. As it turns out, the failing requests were all being handled by the same pod. Restarting the pod solved the issue and it never re-appeared. So what exactly happened?

A method annotated with @Cacheable managed to cache the result of “I tried to contact an external server, and it wasn’t reachable”. Now normally, springs cache abstraction is pretty good at not caching errors. If a method throws an exception, that result is not cached and the call can be re-tried later on.
But in this method we were catching the exception and returned a Try object indicating an error instead. Some of us would like to do more functional programming, and the use of classes like VAVRs Try allows us to do our error handling in a railway-oriented-programming approach, which admittedly does look nice and clean and pretty on first glance.

Sadly, spring isn’t aware that a Try that contains an exception is an error value that shouldn’t be cached. So that specific pod, after attempting to contact another service one time, decided “well, it’s not reachable. Now i’ll never try again.”.

Our solution

Solving this specific issue was of course easy. Don’t return a Try, just let the exception unroll our stack the way it’s meant to.
That was a quick fix. Takes 10 minutes to do, 10 minutes to test, another 10 to deploy.

But that whole issue made us feel a bit anxious about our code: Was that the only instance of this type of bug? And how do we make sure it never happens again?

Initially, we did the obvious thing: Have the team manually comb through the code, carefully looking at each method annotated as @Cacheable, ensuring that technical errors are never encoded in return values that might be cached by spring.
Of course that works and leads to valuable insight. One of which being “we can’t do this for every release. Takes much too long”.

So that rule - simplified to “@Cacheable methods may never return a Try” - has to be automated in some way. Thankfully, other people have already done the groundwork by developing ArchUnit, a library for writing unit tests that make assertions about software architecture. Common examples for ArchUnit rules are “no controller may ever directly access the database without going though our service layer” or “the package structure must be free of dependency cycles.”

But ArchUnit can also be used to write rules for finding code that isn’t “only” architecturally unpleasant. It’s a perfect fit for rules like the @Cacheable thing described above. So we’ve written an ArchUnit test doing exactly that.
It looks similar to this:

 1private val notVavrControl = 
 2    object: DescribedPredicate<JavaClass>("[any value outside of io.vavr.control package]") 
 3{
 4    override fun apply(input: JavaClass): Boolean {
 5        return !input.packageName.startsWith("io.vavr.control")
 6    }
 7}
 8
 9private val allClasses = ClassFileImporter().importPackages("dev.krafczyk")
10
11@Test
12fun methodsReturningVavrClassesMayNotBeCached() {
13    val rule = ArchRuleDefinition.methods()
14        .that().areAnnotatedWith(Cacheable::class.java)
15        .should().haveRawReturnType(notVavrControl)
16    rule.check(allClasses)
17}

The first block (lines 1-7) defines an ArchUnit assertion that can be applied to types, e.g. the return types of methods. It checks that the passed type does not reside in the io.vavr.control package, where the monadic types like Try are.

After that, in line 9, we let ArchUnit collect all classes defined within our codebase, in this example all classes inside the dev.krafczyk package or its sub-packages.

In the actual test, from line 11 to 17… Well the code kind of speaks for itself. We define an ArchUnit rule requesting that any methods with a @Cacheable annotation must have a return type that is not within the io.vavr.control package, and then verify that this rule applies to the collection of classes we’ve previously found.

The original code was a bit more complex since we also had a whitelist of methods that have been verified to be okay even though their return value was Try or Option. We now have a policy that every such exception needs to be individually documented, including a reason why that would be okay in this case.
We could of course have re-written all such code, but that would have been prohibitively annoying.

More fun ArchUnit tests

Now that this issue can never happen again, let’s see what else we can do with similar rules…

Since we don’t use in-memory caches in all of our projects, we sometimes stumble on the requirement that all return values from @Cacheable methods need to implement Serializable (or something similar). The test for that looks very similar to the test above, so I’m just going to show it without any further explanation:

 1private val implementingSerializable = 
 2    object: DescribedPredicate<JavaClass>("[any value implementing Serializable]") 
 3{
 4    override fun apply(input: JavaClass): Boolean {
 5        return input
 6            .allInterfaces
 7            .filter { it.name == Serializable::class.java.name }
 8            .isNotEmpty()
 9    }
10}
11
12private val allClasses = ClassFileImporter().importPackages("dev.krafczyk")
13
14@Test
15fun cacheableMethodsMustReturnSerializables() {
16    val rule = ArchRuleDefinition.methods()
17        .that().areAnnotatedWith(Cacheable::class.java)
18        .should().haveRawReturnType(implementingSerializable)
19    rule.check(allClasses)
20}

Great. Now the non-serializable results don’t explode at runtime - instead the unit tests fail, like they should.

There’s one more trick up my sleeve (or error that i’ve made, depending on your perspective)…
This requires some more background about how spring works. Normally, the caching behavior is not added directly into the method, but resides in a dynamic proxy object instead². The dependency injector will then inject the dynamic proxy into anyone who asks, instead directly using the implementation class.
Which sounds great (and also works quite nicely)… Until you notice that a method calling another method directly on this will of course not go through the proxy, because this is always the actual implementation.

That means that even a method annotated with @Cacheable or @Transactional will completely bypass the caching or transaction management if it’s called from another method in the same object. A good way to get really unpleasant bugs (even though these at least tend to fail early during testing).

But that mistake can also be prevented by ArchUnit:

 1val notBeCalledFromThis = 
 2    object: ArchCondition<JavaMethod>("not be called via the this object") 
 3{
 4    override fun check(method: JavaMethod, events: ConditionEvents) {
 5        method.callsOfSelf
 6            .filter { it.originOwner == it.targetOwner }
 7            .forEach { events.add(SimpleConditionEvent(
 8                it.originOwner, 
 9                false, 
10                "Method <${method.fullName}> illegally called directly on this by <${it.origin.fullName}>"
11            )) }
12    }
13}
14
15private val allClasses = ClassFileImporter().importPackages("dev.krafczyk")
16
17@Test
18fun cacheableMethodsShouldNotBeCalledOnThis() {
19    val rule = ArchRuleDefinition.methods()
20        .that().areAnnotatedWith(Cacheable::class.java)
21        .should(notBeCalledFromThis)
22    rule.check(allClasses)
23}

Now this is a tiny bit more complex: We don’t define a custom assertion for types.
Instead, from line 1 to 13, we define an ArchCondition. This applies to a method and for all calls invoking that method, checks if the call happens from within the same object.
If such calls exist, a rule violation is recorded for each of these calls.

Then, we do the usual: Collect all our classes, find methods annotated with @Cacheable, and apply our rule to all of these.

Maybe this could be done even better. There’s probably other ways to bypass the AOP proxy. A safer test might verify that @Cacheable methods are only called on objects that were acquired via dependency injection. But that’s a bit more involved, and the solution we have is good enough for almost all relevant cases.

Additionally, the test might also want to check methods with a few other annotations, both from the caching abstraction (like @CacheEvict), and from other parts of spring (@Transactional).
It would probably safest to just demand that any method with a spring- or Java EE annotation is never called on this.

Conclusion

I hope that I could demonstrate not only that ArchUnit is really easy to use, but also that it can be used to find real-life bugs. The commonly advertised use case of keeping software architecture within specification is all good and well, but many programmers only really start to pay attention when the software is actually, demonstrably broken.

The above code examples, along with a class containing one violation for each of these rules, can be found on github.com/jkrafczyk/archunit-examples

Spring is a popular framework for developing java (or kotlin, or groovy) applications, especially backends for web applications. It provides a very convenient, mostly Java EE-like, method for dependency injection, a large amount of tools for auto-configuration of said injection, and lots of surrounding infrastructure from database access layers, transaction management, and caching, to health checks and metrics reporting.
Truly an all-in-one solution for your development needs. ↩︎
I think it’s also possible to use compile-time AOP weaving with fancy bytecode manipulation, but we don’t do that. ↩︎