CodeSOD: Two Conversions

The father of the "billion dollar mistake" left us last month. His pointer is finally null. Speaking of null handling, Randy says he was "spelunking" through his codebase and found this pair of functions, which handles null.

public String getDataString() {
    if (dataString == null) {
        return Constants.NOT_AVAILABLE;
    }
    return asUnicode(dataString);
}

I assume Constants.NOT_AVAILABLE is an empty string, or something similar. It's reasonable to convert a null into something like that. I don't know where this fits in the overall stack; I'm of the mind that you should retain the null until you absolutely can't anymore; like it or not, a null means something different than an empty string. Or, if we're going that far, we should be talking about using Optional or nullable types.

But that call to asUnicode seems curious. What's happening in there?

private String asUnicode(String rawValue) {
    if (rawValue != null) {
        return HtmlUtils.htmlUnescape(rawValue);
    }
    else {
        return rawValue;
    }
}

This function, which is only called from getDataString, checks for a null. Which we know it won't get, but it checks anyway. If it isn't null, we unescape it. If it is null, we return that null.

Well, I suppose that fits my rule of "retaining the null", but like, in the worst way you could do it. It honestly feels like, if the "swap the null for an empty string" happens anywhere, it should happen here. If I ask for the unescaped version of a null string, an empty string is a reasonable return. That makes more sense that doing it in a property getter.

This code isn't a trainwreck, but it makes things confusing. Maybe it's because I've been doing a lot of refactoring lately, but confusing code with unclear boundaries between functions is a raw nerve for me right now, and this particular example is stepping on that nerve.

While we're talking about unclear boundaries, I object to the idea that this class is storing dataString as an HTML escaped string that we unescape any time we want to look at it. It implies that there's some confusion about which representation is the canonical one: unescaped or escaped. We should store the canonical one, which I think is unescaped. We should only escape it at the point where we're sending it into an HTML document (or similar). Convert at the module boundary, not just any time you want to look at a string.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!

This post originally appeared on The Daily WTF.

Leave a Reply

Your email address will not be published. Required fields are marked *