I wrote this today after looking for something to read XML data but didn’t want to use C Sharp’s 1MB XML library just for loading in some preferences. Was also a bit shocked that no one seems to have written a lightweight C# parser. I saw there was one on the wiki but it didn’t have attribute support which IMO is needed if you want to store more complex types of data.
So, I dragged out my perl archives (floppies FTW!) and updated my old XML token parser for C#.
The result is a tiny 20K script (10K without comments so it ought to be a spec when compiled) that reads an XML formatted string into a simple object list hierarchy.
Of course, this needs to be battle tested but so far so good. It parsed the entire script of The Tragedy of Richard the Third (link) which is an 8,600+ line (270K) XML document in around 0.1 second within Unity so speed shouldn’t be an issue.
Download the Script here: XMLParser.cs (right click and save as)
Usage Example:
Loads XML text into object hierarchy then loops though opjects and re-writes XML as string to console.
using UnityEngine;
using System.Collections;
public class XMLTest : MonoBehaviour {
// Use this for initialization
void Start () {
// Load a text asset XML file from the assets/resources folder
TextAsset xmlAsset = (TextAsset)Resources.Load("test", typeof(TextAsset));
// get xml formatted string from text asset
string xmlString = xmlAsset.text;
// create XMLParser instance
XMLParser xmlParser = new XMLParser(xmlString);
// call the parser to build the IXMLNode objects
XMLElement xmlElement = xmlParser.Parse();
// test string to re-build XML from XMLNode objects
string xmlOutputString = "";
// recursively re-construct XML string
WriteXMl(xmlElement, ref xmlOutputString, 0);
// log re-constructed xml string to the console.
Debug.Log(xmlOutputString);
}
// rebuilds xml string in output, ugly little method but it works
public void WriteXMl(IXMLNode element, ref string output, int depth) {
// tab strings for nicer formatting...
int i = 0;
string tabs = "";
while(i < depth) {
tabs += "\t";
i++;
}
// if textnode add content to output return early...
if(element.type == XMLNodeType.Text) {
output += tabs + element.value + "\n";
return;
}
// add opening tag to output
output += tabs + "<" + element.value;
// add attributes to opening tag
i = 0;
int attributeCount = element.Attributes.Count;
while(i < attributeCount) {
output += " " + element.Attributes[i].name + " = \"" + element.Attributes[i].value + "\" ";
i++;
}
// close opening tag
output += ">\n";
// recurse through all child elements
i = 0;
int childCount = element.Children.Count;
while(i < childCount) {
WriteXMl(element.Children[i], ref output, depth+1);
i++;
}
// add closing tag to output string
output += tabs + "</" + element.value + ">\n";
}
}
Bit of a pointless example but it demonstrates both reading XML strings and how to use the XML class objects to get at the data.
The XML Object hierarchy is composed of 2 classes that share an interface and 1 simple Struct:
IXMLNode is the main XML hierarchy interface. XMLText and XMLElement use it and nothing more.
Accessable properties:
string value - either the tag name or the text content
enum type - an enum to tell you if it’s a a Text node or an Element node that could have child nodes.
IXMLNode Parent - the parent node in the hierarchy (read only)
List Children - a list of child nodes. Text nodes will always return an empty list here.
List Attributes - a list of attributes. Text nodes will always return an empty list here.
XMLAttribute is a simple struct with two public fields:
string name - name of the attribute.
string value - value of the attribute.
I also implemented the XMLParser class to be used as an object instead of a static class so that it’s easier to extend and modify. There’s heaps of comments all the way thought it so it shouldn’t be too difficult to follow and modify if you need extra features.
XMLParser will break (and cry) if you feed it malformed XML documents!
This shouldn’t be an issue for games though. It also won’t strip extraneous white-space, again shouldn’t be an issue for games as you have tighter management over resources than other web/feeder based XML tasks.
Requires HTML reserved characters (< > ’ " ) to use entity references if you want to use those characters in the content (non-markup) or attribute values of your documents. XMLParser has some static class functions for converting these back and forth. The parser automatically handles and translates entity name references in xml documents for the above listed entities (it converts from " to " ). It does not support entity number references (such as " ). Any other entities will be stripped and replaced with a null char unless you modify the parser to handle them.
Please report any bugs here and I’ll try by best to fix them up as soon as possoble.