Sunday, September 14, 2008

Temporal Coupling in Java and Some Solutions

Temporal Coupling.

I first came across the term in Thomas and Hunt's The Pragmatic Programmer, where they encourage readers to consider "...the role of time as a design element of the software itself." Hmph. Most the chapter is about designing processes for parallelization, and not APIs, but the end of the chapter shows some code comparing C's strtok and Java's StringTokenizer. Reviewing this made me thankful that C++ is well behind me except for the odd bit of JNI code...

So StringTokenizer has a "hasNext" and "next" method on it, so callers can easily iterate over bits of a string. In the strtok days, there was only one method: strtok, and you passed it a string the first time you called it and null every other time to get the next token.

char input[] = "Hello World";
char* hello = strtok(input, " ");
char* world = strtok(NULL, " ");
I invite you to contemplate how the one, global strtok method works when accessed from multiple threads. So to the Prag guys, avoiding temporal coupling meant designing for concurrency. Strtok could never, ever work in a multi-threaded world because of the global state hidden behind method invocations. Whereas objects like StringTokenizer and Iterators work because their state is local to the object.

Bob Martin's Clean Code carries on the torch of avoiding Temporal Coupling, but with a new suggested fix. Consider an object with an execute() and getErrors() method (example from this comment). Which order should these method be called in? Well, the naming makes that fairly clear: first you execute and then you get the error messages. What happens if you call getErrors before execute? Should it perform the calculation or throw an exeception? There is no single answer. Bob's solution is the API equivalent of the bucket brigade:
final class Computation {
public Future<Computation> execute() {
// perform some computation, collecting errors, and returning a future
}

public static Collection<String> getErrors(Future<Computation> future) {
// get the error list out of the computation
}
}
execute() returns a Future of the result... this is a handle to the result of the computation, which may or may not be calculated yet. The getErrors method now requires one of these handles on it's parameter list. This example statically compiles the temporal coupling of the two methods: the getError method requires as input the output of the execute method. With this API it's very hard to invoke the methods out of order.

So what about Iterator?

It's certainly wasn't temporal coupling to the Prag Guys in 1999, seeing as they used an iterator-like API as a motivating and correct example. But clearly, calling iterator.next() repeatedly will result in an exception. There is state hidden behind it in the form of a cursor to the recordset. If an iterator is accessed by multiple threads then all callers of hasNext/next must have the same synchronization policy or risk having the iterator advanced after the hasNext check was performed but before next is called. This looks like temporal coupling to me, and it looks like a violation of the Design for Concurrency tip from the Prag programmer. So in 2008 we should probably start looking for ways to avoid this API.

Except that Effective Java's 2nd Edition (pub: 2008) recommends this API in some cases; disagree with Mssr. Bloch at your own peril. He spins the API as a way to avoid unnecessary use of checked exception. Checked exceptions just produce nasty code:
try {
obj.action(args);
} catch (SomeException e) {
//handle exception
}
The checking-API of providing a hasNext or isValid allows you to change a checked exception into a runtime exception because your callers have presumable validated that the correct preconditions hold before invoking the method:
if (obj.isActionAllowed(args) {
obj.action(args);
}
And Bloch is careful to point out that this API is not appropriate in multi-threaded scenarios where the state of the object may change between invocations of the guard check. So either you're single threaded or your object is immutable. Yet another reason to prefer immutability. Now, how would you implement an immutable iterator? You're probably better off using an immutable list. Anyway...

Assuming we need iterators and their ilk, what can we do instead?

Sentinel values are the common approach: return null, or "", or NaN. But in the end, undermining your type system just isn't smart. If you broadcast to the world that you return a String, then reserving one particular String as an error condition is just mean. Someone, somewhere is going to forget to perform the check. And it will be me and I'll do it about once a month. NullPointerExceptions are just another variation of this, but instead of using one type to represent two different conditions ("" is error, while all other strings are not), you're using two types without declaring it to your compiler (String method returns null or String). Ricky Clarkson's Optional Values in Java contains a good discussion of this (including the comments). Returning an empty list is a great way to signal that no results were found. Returning an empty list to indicate an error occurred is a bad use of sentinels.

And now is the time when we come to the Option type.

F# and OCaml don't have null* (simplification alert); They have an Option type. An Option can either have a value or not. So a function returning a String or null in Java would instead return Option of String, and you have to ask the Option object whether or not there it has a value before using it. The compiler doesn't allow null values to masquerade as Strings, and it won't allow you to reference an optional String until you check that it is available. The result is an end to NullPointerExceptions. That I can live with. Does the API become more complicated?
Option<String> option = iterator.next();
if (option.hasValue()) {
//do something with value
}
Iterator gets simpler because it loses the hasNext function. It's also no longer a concurrency issue to share iterators between threads. Each invocation of next() is atomic and there is no time window between hasNext and Next where the internal state can change. The calling code is really no different... you either make an unsafe check before grabbing an item or you grab an item and then make a safe check. The latter seems better.

The calling code could be cleaned up, of course. But that would require adding pattern matching to Java, which I suspect is unlikely. But the F# code to perform the checks and handle None vs. Some is concise and not alien once you get used to it:
let printName name =
match name with
| (fname, Some(lname)) -> printfn "Hello %A %A" fname lname
| (fname, None) -> printfn "Hello %A" fname

printName ("Sonny" , Some("Bono"))
printName ("Cher", None)
But even with an odd calling syntax from Java, concurrency issues facing the industry make the Option type a welcome addition to codebases. The implementation can be quite simple:
public final class Option<T> {

private final T value;

public Option(T value) {
this.value = value;
}

public boolean hasValue() {
return value != null;
}

public T get() {
if (value == null) throw new RuntimeException("Option not valued");
return value;
}

public static <T> Option<T> Some(T value) {
return new Option<T>(value);
}

public static <T> Option<T> None() {
return new Option<T>(null);
}
}
And here is a usage example:
final class StringUtils {
public static Option<String> stringToOption(String input) {
if (input == null) return Option.None();
else return Option.Some(input);
}
}
Functional Java
has an Option type, documentation here. And this is called Maybe in other languages (Haskell?). And I should credit Reinier Zwitserloot (best... name... ever...) with getting the ball rolling with his initial comment.

Sorry for the length!

1 comment:

Hamlet D'Arcy said...

Alternate implementation here: http://tinyurl.com/4cd6lp