CodeSOD: By Any Other Name

One of the biggest challenges in working with financial data is sanitizing the data into a canonical form. Oh, all the numeric bits are almost always going to be accurate, but when pulling data from multiple systems, is the name "John Doe", "John Q. Doe", "J. Doe"? What do we do when one system uses their mailing address and another uses their actual street address, which might use different municipality names? Or in all the other ways that important identifying information might have different representations.

This is Andres' job. Pull in data from multiple sources, massage it into some meaningful and consistent representation, and then hand it off to analysts who do some capitalism with it. Sometimes, though, it's not the data that needs to be cleaned up- it's the code. A previous developer provided this Visual Basic for Applications method for extracting first names:

Function getFirstnames(Name) Dim temp As String Dim parts Dim i As Long parts = Split(Trim(Name), " ", , vbTextCompare) 'For i = LBound(parts) To UBound(parts) For i = UBound(parts) To UBound(parts) temp = parts(i) temp = Replace(Trim(Name), temp, "") Exit For Next i getFirstnames = Trim(temp) End Function

Setting aside the falsehoods programmers believe about names, this is… uh… one way to accomplish the goal.

We start by splitting the string on spaces. Then we want to loop across it… sort of.

Commented out is a line that would be a more conventional loop. Note the use of LBound because VBA and older versions of Visual Basic allow you to use any starting index, so you can't assume the lower-bound is zero. This line would effectively loop across the array, if it were active.

Instead, we loop from UBound to UBound. That guarantees a one iteration loop, which opens a thorny philosophical question: if your loop body will only ever execute once, did you really write a loop?

Regardless, we'll take parts(i), the last element in the array, and chuck it into a temp variable. And then, we'll replace that value in the original string with an empty string. Then, just to be sure our loop which never loops never loops, we Exit For.

So, instead of getting the "first names", this might be better described as "stripping the surname". Except, and I know I said we were going to set aside the falsehoods programmers believe about names, the last name in someone's name isn't always their surname. Some cultures reverse the order. Spanish tradition gives everyone two surnames, from both parents, so "José Martinez Arbó" should be shortened to just "José", if our goal is the first name.

But there's still a more subtle bug in this, because it uses Replace. So, if the last name happens to be a substring of the other names, "Liam Alistair Li" would get turned into "am Astair", which is a potentially funny nickname, but I don't think a financial company should be on a nickname basis with their clients.

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

This post originally appeared on The Daily WTF.

Leave a Reply

Your email address will not be published. Required fields are marked *