CodeSOD: An XML Parser

Since we were discussing XML earlier this week, it's a good time to take a peek at this submission from Mark.

Before we get into it, though, we need to talk briefly about some different XML parsing philosophies. XML documents are frequently large and deeply nested trees, and once loaded into memory, consume a lot of it. So there are two methods we might use to parse XML. We might parse and load the entire DOM- having all of that expensive processing and memory usage. The advantage is that we have the whole structure in memory, and can quickly and efficiently navigate through it. The other option is to read it as a stream, and process it one node at a time. This is both faster and a lot gentler on memory, but it means you have forward-only access to the contents.

Or, you can just do what Mark's co-worker did:

public void ReadXml(XmlReader reader)
{
	string xml = reader.ReadOuterXml();
	XElement element = XElement.Parse(xml);
	…
}

This method accepts an XmlReader, which is C#'s implementation of that forward-only, stream-oriented version of an XML parser. They then extract the OuterXml as a string- the contents of the file- and hand it off to XElement.Parse, which is the DOM-oriented, extremely expensive method of parsing.

For bonus points, they've greatly increased the memory footprint, since this has to have the whole file in memory as a string, and then the whole file in memory as an XML DOM tree. This is extra offensive as XElement has a Load method, which can take an XmlReader and parse the file without loading the whole string into memory. And, more than that, since the XElement type represents an XML element, it's great for holding portions of an XML document, but since we're loading (presumably) a whole document, the XmlDocument class (which also has a Load method) is probably the correct one to use.

The final note is where this XML data is coming from- it's coming from another C# application, using the same underlying model library, and the objects in question were tagged with .NET's serialization attributes. That's a long way of saying ".NET autogenerated code for serializing/deserializing this data, and this method doesn't need to exist at all."

In short, given the challenge of "loading XML from a document", they've managed to make the wrong choice at every opportunity.

[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.

This post originally appeared on The Daily WTF.

Leave a Reply

Your email address will not be published. Required fields are marked *