(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for January, 2007

Thinking about structure

Sunday, January 28th, 2007

Douglas Crockford left an excellent comment on my recent posting All markup ends up looking like XML, which he later made into its own blog posting, For the trees. I agree with his reworking of the structure: given the data that I provided, the JSON, LISP, and XML markup all could have been simpler.

If he’s right about the examples, though, he’s wrong about two things. First, my posting doesn’t represent any kind of softening to JSON among its opponents in the XML community, simply because I’ve never been one of those opponents. Second, I spend at least one order of magnitude more time working with SQL and programming languages (not processing XML) than I do with XML, so if anything, my perspective on XML would likely be tainted by them rather than the other way around. Instead, I think the examples were complicated because I built for tomorrow instead of today.

Tomorrow

So what might tomorrow look like for an application dealing with names? Consider, for example, this XML markup, moving gender out of the element/property name as Doug suggests, and eliminating the other attributes (since they don’t add much to the discussion):

<names>
  <name gender="male"><surname>Saddam</surname> Hussein</name>
  <name gender="female">Susan B. <surname>Anthony</surname></name>
  <name gender="male">Al <surname>Unser</surname> Jr.</name>
  <name gender=”male”>Don Alonso <surname>Quixote</surname>
    de la Mancha</name>
</names>

It’s surprisingly messy breaking each name down into a simple property list. If we tried the approach Doug used for my simpler examples, we’d end up with this (note that this is a list of names, not of people):

{"names": [
    {"gender": "male", "given-name": "Hussein", "surname": "Saddam"},
    {"gender": "female", "given-name": "Susan B.", "surname": "Anthony"},
    {"gender": "male", "given-name": "Al Jr.", "surname": "Unser"}
    {"gender": "male", "given-name": "Don Alonso Quixote de la",
      "surname": "Mancha"}
]}

This list needs a bit of patching. First, if we reconstruct the names as strings, we don’t want to end up with “Hussein Saddam” instead of “Saddam Hussein”, so we’ll have to add a property specifying whether the surname comes first or last:

{"gender": "male", "given-name": "Hussein", "surname": "Saddam",
  "surname-after-given-name": false}

Great — that’s all we need to fix that, and now we know to print “Saddam Hussein”. Now, let’s look at Susan — there’s no problem recreating the string “Susan B. Anthony” from these properties, but we probably should rename the property given-name to given-names, just to avoid confusion:

{"gender": "female", "given-names": "Susan B.", "surname": "Anthony",
  "surname-after-given-names": true}

Al Unser Jr. is a bit trickier, because there was no obvious place to put the “Jr.”. Strictly speaking, it’s neither a given name nor a surname, so for now, let’s just call it a postfix (although that assumes a physical position that might not apply to all languages):

{"gender": "male", "given-names": "Al", "surname": "Unser",
  "surname-after-given-names": true, "postfix": "Jr."}

Don Quixote, however, forces us to reconsider some of our assumptions, because “Don” is not a given name but an honorific. Assuming, however, that we don’t care whether it’s a name or an honorific, lets just call it prefix for now, to go with postfix:

{"gender": "male", "prefix": "Don", given-name: "Alonso",
  "surname": "Quixote", "surname-after-given-names": true,
  "postfix": "de la Mancha"}

Finally, just to throw a wrench into things, let’s assume that our list might contain things other than names, so that we need to add a type property:

{"type": "name", "gender": "male", "prefix": "Don",
  "given-name": "Alonso", "surname": "Quixote",
  "surname-after-given-names": true, "postfix": "de la Mancha"}

Granted, that sort-of works, but it’s really not very nice, and it’s extremely brittle: there are names with extra words in the middle (such as “de”) that are properly not part of the given name or surnames, for example. Then again, why overtag it? Perhaps we don’t need to know what’s a given name or honorific, as long as we can distinguish the surname. One possibility is simple to break it down to four properties:


{”type”: “name”, “gender”: “male”, “presurname”: “Don Alonso”,
  “surname”: “Quixote”, “postsurname”: “de la Mancha”}

While I’m a big fan of Agile development in principle, however, I’ve worked on enough broken legacy systems to leave a little wiggle room for future requirements, like, say, a need to isolate the primary given name for a mail merge or index, even if we’re not going to isolate it right now. Fortunately JSON, like XML, has a natural ability to represent ordered information much more elegantly — let’s make the name into an ordered array:

{"type": "name", "gender": "male",
  "value:" ["Don Alonso", {"type": "surname", "value": "Quixote}, "de la Mancha"]}

This approach provides us with almost limitless flexibility (for example, if we start isolating honorifics, we can deal with a language where the honorific comes at the end of the name with no extra trouble), and is just as simple and easy to read as the much less flexible presurname/postsurname approach. Building for today is great, but if you have a choice between two roughly equivalent approaches where one provides an easy future upgrade path and the other doesn’t, which is the best choice? JSON is new enough that the JSON community hasn’t yet had to deal much with the life cycle of information — once enough people have built apps relying on specific JSON formats, it will be very, very hard to make any changes: v.2 of any popular data format generally results in enormous costs (in money and goodwill), and v.3 rarely happens.

Some people might prefer to shorten the above example a bit by following a simple convention: the first member of each array is a label, the second is a map with properties describing the rest of the array, and the remainder is the value, where order may be significant:

["name", {"gender": "male"},
  "Don Alonso", ["surname", {}, "Quixote"],  “de la Mancha”]

That is trickier to dump straight into a data structure or database table, but it’s a much more natural way to represent the information, and a lot easier to read on the screen. And just in case it doesn’t look look familiar, compare:

<name gender="male">Don Alonso <surname>Quixote</surname>
  de la Mancha</name>

If your information isn’t this complicated, JSON, XML, or LISP can be simple, as Doug pointed out — the XML could just as easily be


<name gender=”male” presurname=”Don Alonso” surname=”Quixote”
  postsurname=”de la Mancha”/>

The reason you don’t see that much is not because XML people never thought of it — read the xml-dev archives from ten years ago to read megabytes of discussion — but because it kept breaking in production systems as soon as the customer (or users) thought of a new requirement. When the information gets complicated, as I pointed out, there’s a bit of a tendency for all markup to end up looking like XML; when the information is simple, of course, XML can just as easily look like JSON or LISP.

Tech botox

Thursday, January 25th, 2007

plastic surgery picture

Elliotte Harold is absolutely right when he suggests that people should leave Java alone. New technologies compete on features; mature technologies compete on deployment.

Let’s value our mature, middle-aged technologies for what they are, rather than destroying their dignity by pumping them full of features botox and slashing them up with plastic-surgery keyword changes to try to trick people into thinking they’re young and immature.

With some minor lapses, the W3C has done well avoiding the temptation to improve XML to death. XML is still, in every way that matters, the same as it was when the initial recommendation came out nine years ago — warts and all — and that’s why it’s so widely used. Sun should pay very close attention, since Java’s around the same age, and is deployed in many of the same places. The people who actually decide to use Java and XML to run organizations and do real work (not bloggers, but architects, project managers and even sometimes CTOs) appreciate them for precisely that stability and dependability.

MSIE MIA?

Wednesday, January 24th, 2007

What happened to Windows Internet Explorer?

Browser stats

I just took a peek at my server stats for megginson.com (I’m pretty lazy about following them) and had a huge surprise. I’ve adjusted these to exclude “Unknown”, which I assume are mostly spiders and blog aggregators:

  • MS Internet Explorer: 42%
  • Firefox: 37%
  • Mozilla: 7%
  • NetNewsWire: 6%
  • Opera: 3%
  • Safari: 2%
  • Netscape: 1%

I cut off the list at 1%. MSIE is still in the lead, but it has suffered a huge drop from a few months ago — could it be that my ISP’s version of AWStats doesn’t recognize MSIE 7 and is lumping it with “Unknown”, or is there a chance that the movement to Firefox is becoming a stampede? The last time I remember changes this fast was when MSIE was crushing Netscape in the late 1990s.

Operating system stats

My site is not primarily a Linux or Open-Source software site, and it does not seem to attract a disproportionately high share of non-Windows users. Again, excluding “Unknown”, here is the OS distribution for visitors:

  • MS Windows: 74%
  • Linux: 15%
  • MacOS: 12%

Linux might be a touch high here, but megginson.com is no SlashDot. Something’s going on — either Firefox is doing to MSIE what MSIE did to Netscape (knocking off a stale browser sitting smugly on its assumed monopoly), or, as I mentioned, it’s just a reporting glitch.

XML 2006 pickled and preserved

Friday, January 19th, 2007

The XML 2006 site is now pickled and preserved for long-term storage. Almost all of the presenters got their papers or slides in for the proceedings, if not on time, at least in time. Unfortunately, if you want to see a paper or slides from one of the few who didn’t send us anything, you’ll now have to pester them directly.

Recipe for pickling a web site

The original site was a hand-rolled LAMP implementation, but it was designed from the start to be amenable to a static copy. To pickle it, I started by doing a recursive slurp of the live site using wget (with the -m option) — that generated permanent, static HTML copies of the dynamic, database-driven pages on the site. At that point, I had an almost, but not quite perfect static copy of the site, because there were two things that wget missed:

  1. Images referred to only in CSS stylesheets (such as the banner).
  2. CSS stylesheets referred to by other CSS stylesheets.

It took only a few minutes to add all of that by hand, and the site was ready to go.

Why it worked

This will be old news to a lot of people reading, but a few simple advance steps (during site design) made later static preservation easy. Here’s what I did:

  • Every page has its own URL, period, end of discussion. No AJAX, no POST.
  • Every page (or at least, every page that we want to archive) is reachable, directly or indirectly, from the home page.
  • Script names are not shown to the public, so there are no URLs ending in “php” (hint: exposed script extensions like “php”, “asp”, or “jsp” are signs of gross incompetence in web design).
  • No web pages rely on exposed GET request parameters: for example, the URLs looked like /programme/presentations/123.html, not /programme/presentation?code=123, or even worse, /show-presentation.php?code=123.

And that’s it. Of course, if the site had included live forms, I would have had to remove those as well (and any links to them), but that wouldn’t have been much extra work.

On a final note, while the live site was hosted on an Apache server (the “A” in “LAMP”), the pickled site is hosted on a Microsoft IIS server. It made no difference at all — that’s the way Web standards are supposed to work.

Jon Bosak’s XML 2006 keynote now online

Thursday, January 11th, 2007

I’m happy to announce that Jon Bosak’s closing keynote from the XML 2006 conference is now online. We don’t require keynote speakers to contribute text to the proceedings, but we received a large number of requests for Jon’s talk and he kindly obliged.

In case anyone reading this doesn’t know, Jon chaired the original W3C group that developed XML. In his closing, post-dinner keynote, Jon gives a playful account of the controversies, strange behaviour, and general atmosphere leading up to the first public XML draft released in 1996. He then goes on to contrast the pioneer attitude (my phrase) of the implementors at the time with the vendor-dependence of most XML users today. It’s well worth a read, if you weren’t able to be there to listen — just remember to picture Jon saying everything with a slight smile at the edges of his mouth.

By the way, most of the other conference presentations also have slides and/or text available now. See the programme for links to papers or slides in the proceedings. And if you’re one of the few delinquent authors who has not yet sent in your proceedings, please get them to me as soon as possible.

Who’s searching for “XML”?

Tuesday, January 9th, 2007

Here are the top ten locations as of January 9 2007, according to Google trends:

  1. Pune, India
  2. Bangalore, India
  3. Hyderabad, India
  4. Chennai, India
  5. Mumbai, India
  6. Singapore, Singapore
  7. Delhi, India
  8. Tokyo, Japan
  9. Chiyoda, Japan
  10. Hong Kong, Hong Kong

Note that the top cities are all Asian. A search for “J2EE” returns almost exactly the same list. Now, compare the list for a representative new, trendy technology, Ruby on Rails:

  1. San Francisco, CA, USA
  2. Austin, TX, USA
  3. Pleasanton, CA, USA
  4. Seattle, WA, USA
  5. Salt Lake City, UT, USA
  6. Portland, OR, USA
  7. Vancouver, Canada
  8. Denver, CO, USA
  9. Oslo, Norway
  10. Auckland, New Zealand

This time, it’s 80% North American and 0% Asian, and more interestingly, all of those cities are west of the Mississippi. The easiest interpretation of this very small sample is that the Asian companies concentrate on established technologies that they can be paid for using, while the North American west coast companies are disproportionately interested in new, unproven technologies. What about a new technology that’s designed to work with an older one? Could we expect a mix of Asian and North American west coast cities? Here are the top cities searching for “XQuery”:

  1. San Jose, CA, USA
  2. Bangalore, India
  3. Singapore, Singapore
  4. Chennai, India
  5. San Francisco, CA, USA
  6. Mumbai, India
  7. Pleasanton, CA, USA
  8. San Diego, CA, USA
  9. Washington, DC, USA
  10. Hong Kong, Hong Kong

The implication of this very unscientific survey is that you can determine the relative maturity of a technology by looking at the weighting of search origins between western North America and eastern Asia.

Sneak peek at XML 2007

Tuesday, January 9th, 2007

With XML 2006 barely over, we’re already deep into planning XML 2007. Here’s your first peek at what we have planned.

Time and place

XML 2007 is confirmed for Monday 3 December to Wednesday 5 December 2007. We’ll be meeting in Boston again, but at a different hotel, the Boston Marriott Copley Place (located at the opposite end of the Prudential Centre from the 2006 hotel).

A lot of people asked about moving the conference to early November. I think that’s an excellent idea, but unfortunately, we have to book the hotel over a year in advance, so we cannot make that change until 2008.

Program

There will be a few significant program changes for 2007. First, there will be no tutorial day before XML 2007 begins. Attendance for the tutorial day has been declining for several years, and with the obvious lack of interest from our attendees, it no longer makes sense for IDEAlliance to offer it. However, we will try to incorporate more beginner-level and tutorial-style presentations into the main program.

The vendor pecha-kucha went very well in 2006, but for 2007, we’re considering replacing it with a standards pecha-kucha, either in the evening or during one of the days. Each standards committee will have 20 slides (at 20 seconds each) to give us a quick update on what they’ve been doing over 2007 and what to expect in 2008 — that will make it possible for attendees to learn a bit about a lot of standards in a relatively short time.

The publishing and web tracks at XML 2006 were extremely well attended (often overflowing out of the space), and the enterprise track put up a more modest but still respectable showing. However, with only a couple of exceptions, the hands-on track did not attract the same number of people, and we’ve decided to discontinue it in 2007. While we haven’t made a final decision, we may replace it with a vendor track. I personally don’t object to a vendor track as long as it’s well labeled — slipping vendor presentations into the main program is analogous to letting advertisers buy search-engine placement, while having a separate vendor track is more analogous to Google text ads, since it’s clearly distinct. In any case, it turns out that there are lots of people who do want to hear product-specific information and even sales pitches.

We will end the formal program on Wednesday 5 December with a closing keynote around noon. The afternoon will be available for user-organized activities, such as BOFs, committee meetings, or even pub crawls and karaoke — we’ll provide an online forum to help you organize these activities well in advance, and we’ll publicize them on the conference web site. In the past, these activities have been confined to evenings, when people are already tired; moving them to the afternoon should make it possible for more people to participate.

Speakers

XML 2007 will not have a late-breaking call for papers; instead, we’ll open the regular call for papers early (probably at XTech 2007 in Paris), and will keep it open to the end of August or even into September. As with XML 2006, I’m hoping for a mix of veteran and rookie speakers at the conference — I especially like it when we can bring people in from other fields.

Also, by popular request, we’re looking at providing individual evaluation forms for each speaker, so that attendees can help us identify the best and most entertaining among you. We’ll also go back to asking for proceedings before the conference, since that was overwhelmingly what people want; however, we will continue to accept papers in PDF or XHTML format so that speakers do not have to try to set up their own XML mini-publishing systems.

Comments?

I was very happy with how XML 2006 turned out, and I’m looking forward to an even better conference in 2007. Please let me know what you think about these changes — and if you have any new suggestions — by leaving a comment here.

ReiserFS

Tuesday, January 9th, 2007

A number of years ago I was working on the scenery system for the open source FlightGear flight simulator. Due to the nature of geodata and the scenery building system, I ended up with tens of thousands of tiny files on my hard drive, many only a few bytes long, and I was constantly running out of disk space.

Then I read about an alternative filesystem for Linux called ReiserFS, part of a new generation of journaling filesystems. Unlike the others, however, ReiserFS had a special innovation: it allowed multiple very small files to share the same block, so that a 5-byte file would not automatically take up 512 bytes (or whatever your block size was). I switched over, and bingo! There was suddenly a huge amount of free space on my previously-full hard drive, and I noticed no performance problems (aside from the occasional tiny zombie file that I couldn’t delete).

I’ve been running Reiser ever since, but the filesystem has fallen on hard times. On 14 September 2006 (via Tony Coates), Jeff Mahoney announced that the SuSE Linux distribution would no longer use ReiserFS as its default. Mahoney is also one of the principal ReiserFS developers, and he wrote that ReiserFS3 does not scale, that it has a small and shrinking developer community inadequate to maintain it, and that ReiserFS4 is “an interesting research file system, but that’s about as far as it goes.” Then, on 10 October 2006 Hans Reiser, the principal maintainer, was arrested and charged with the murder of his estranged wife Nina.

SuSE was the only Linux distribution that used Reiser as its default filesystem. This c|net story links the SuSE decision with the murder charges, but it’s worth noting that Mahoney’s message predates the charges by almost a month. Whatever the cause, however, Novell (SuSE’s owner) had contributed significant resources towards the maintenance of ReiserFS. It no longer looks like ReiserFS has any future at all, and in its current state, it has performance and scalability problems that prevent its use in high-demand environments. ReiserFS was a big help to me when I needed it a few years back, but the next time I install Ubuntu, I’ll use the default ext3 filesystem instead. Hard disks — even for notebook computers — are a lot bigger and cheaper now, anyway.

In praise of architecture astronauts

Thursday, January 4th, 2007

Six years ago, Joel Spolsky wrote a piece on Architecture Astronauts, people who get so obsessed with the big picture that they miss the important little details that actually make things work. More recently, Dare Obasanjo pointed to Spolsky’s piece in his posting XML Has Too Many Architecture Astronauts.

I’d like to start by agreeing with Dare: XML does have too many architecture astronauts, and almost everything that’s bad, ugly, or simply scary about the huge number of standards built around XML (WS-* springs immediately to mind, but it’s not alone) comes from gross overgeneralization. That said, architecture astronauts do have their place, and we ignore them at our peril.

Case 1: Napster

Let’s start by turning Spolsky’s main example (which Dare cites) on its head. Here are two different perspectives on Napster circa 2001:

Architecture pedestrian: Napster lets people find and download songs.

Architecture astronaut: Peer-to-peer networks let people find and download songs. Napster is (was) a peer-to-peer network.

Spolsky writes about how architecture astronaut perspective helped to fuel a mini-P2P bubble at the time, with investors pouring wasted money into P2P-everything, when Napster’s success was due not to the fact that it was P2P but to the fact that it let people get songs easily. However, consider what was happening at the same time in the music industry. Rightly or wrongly, they wanted to stop people from sharing songs. The architecture pedestrian perspective (my term, not Spolsky’s) told them that Napster lets people find and download songs, so the industry spent millions of dollars in legal fees, PR, etc. shutting down Napster. The result? People downloaded even more music. After all, as the astronauts said, it was P2P networks that let people share music, not Napster in particular. Since then, the music industry has been fighting the equivalent of an insurgency, putting down one uprising after another with no end in sight.

Case #2: The Netscape IPO

My second example took place over 11 years ago, kicking of the much larger dot.com bubble (the P2P mini-bubble was just a tiny part of its tail). It was around 1995 that most non-techies noticed the web, mostly through the lens of the Netscape browser. Again, the architecture pedestrian and the architecture astronaut looked at this differently:

Architecture pedestrian: Netscape lets people see text and pictures online.

Architecture astronaut: The web allows people to put text and pictures online. Netscape is a web browser.

This time, the investors listened to the architecture pedestrian rather than the architecture astronaut: Netscape was set to open at $14/share, doubled to $28/share, and climbed to $75/share on the first day, and eventually reached a peak market cap of $8 billion. The astronauts knew all along, however, that while people (at the time) thought of the web in terms of the Netscape browser, the web wasn’t Netscape. If Internet Explorer hadn’t knocked Netscape off its perch (resulting in layoffs as early as January 1998), some other browser soon would have.

Case #3: XML

So how does this all apply to XML? I think that there are two ways that architecture astronauts can approach XML, one good and one bad. The bad one is in line with Spolsky’s original piece, where people miss what made XML popular (relative simplicity, no need to create DTDS, etc.) and believe that if a bit of standardization is good, a lot must be even better. The good one is to step back and point out that most of the advantages that appear to come from XML actually come from generic tree markup, and that holy wars between XML, JSON, YAML, etc. are really beside the point. In various situations, one syntax may have an advantage due to software support — for example, web browsers have built-in support for parsing XML or styling it using CSS, and they can convert JSON directly to JavaScript data structures using the eval() function — but when you look at the whole world of generic markup, those are small blips on a very large screen, and all of the markup languages more-or-less look the same.

All markup ends up looking like XML

Wednesday, January 3rd, 2007

In the current JSON vs. XML debate (see Bray, Winer, Box, Obasanjo, and many others), there are three things that important to understand:

  1. There is no information that can be represented in an XML document that cannot be represented in a JSON document.
  2. There is no information that can be represented in a JSON document that cannot be represented in an XML document.
  3. There is no information that can be represented in an XML or JSON document that cannot be represented by a LISP S-expression.

They are all capable of modeling recursive, hierarchical data structures with labeled nodes. Do we have a term for that, like Turing completeness for programming languages? It would certainly be convenient in discussions like this.

Syntactic sugar

The only important differences among the three are the size of the user base (and opportunity for network effects), software support, and syntactic convenience or inconvenience. The first two are fickle — where are the Pascal programmers of yesteryear? — so let’s concentrate on syntax. Here’s a simple list of three names in each of the three representations:

<!-- XML -->
<names>
  <name>Anna Maria</name>
  <name>Fitzwilliam</name>
  <name>Maurice</name>
</names>
/* JSON */
{"names": ["Anna Maria", "Fitzwilliam", "Maurice"]}
;; LISP
'(names "Anna Maria" "Fitzwilliam" "Maurice")

Nearly all comparisons between XML and JSON look something like this, and I have to admit, it’s a slam dunk — in an example like this, XML seems to go out of its way to violate Larry Wall’s second slogan: “Easy things should be easy and hard things should be possible.” On the other hand, I rarely see any data structures that are really this simple, outside of toy examples in books or tutorials, so a comparison like this might not have a lot of value; after all, I could have written the XML like this:

<names>Anna Maria, Fitzwilliam, Maurice</names>

Let’s dig a bit deeper and see what we find.

Node labels

In the previous example, I made some important assumptions: I assumed that node label for the individual names (”name”) didn’t matter and could be omitted from the JSON and LISP, and I assumed that the node label for the entire list (”names”) was a legal XML and LISP identifier. Let’s break both of those assumptions now, and make the label for the list “names!” and the labels for the items “male-name” or “female-name”. Here’s what we can do now to handle this in XML, JSON, and LISP:

<!-- XML -->
<list label="names!">
  <female-name>Anna Maria</female-name>
  <male-name>Fitzwilliam</male-name>
  <male-name>Maurice</male-name>
</list>
/* JSON */
{"names!": [
  {"female-name": "Anna Maria"},
  {"male-name: "Fitzwilliam"},
  {"male-name": "Maurice"}]}
;; LISP
'(names!
  (female-name "Anna Maria")
  (male-name "Fitzwilliam")
  (male-name "Maurice"))

XML is forced to use a secondary syntactic construction (an attribute value) to represent the top-level label, because it no longer matches XML’s syntactic rules for element names. LISP simply switches from a token to a string to represent “names!”can still use names! as a token, and JSON doesn’t notice, because it has been using a string all along — XML syntax is convenient for trees of labeled nodes only when the labels are heavily restricted. That aside, however, note that as soon as we add any non-trivial complexity to the information — as soon as we assume that node labels matter — then all three formats start to look a little more like XML.

Additional node attributes

Now, let’s add the next wrinkle, by allowing additional attributes (beside a label) for each node. In this case, we’re going to add a “lang” (language) attribute to each of the nodes:

<!-- XML -->
<list label="names!">
  <female-name xml:lang="it">Anna Maria</female-name>
  <male-name xml:lang="en">Fitzwilliam</male-name>
  <male-name xml:lang="fr">Maurice</male-name>
</list>
/* JSON */
{"names!": [
  {"female-name": [{"lang": "it"}, "Anna Maria"]},
  {”male-name: [{"lang": "en"}, "Fitzwilliam"]},
  {”male-name”: [{"lang": "fr"}, "Maurice"]}]}
;; LISP
'(names!
  (female-name (((lang it)) "Anna Maria"))
  (male-name (((lang en)) "Fitzwilliam"))
  (male-name (((lang fr)) "Maurice")))

Now, while XML is still using ad-hoc convention to represent the “name!” label, JSON and LISP are forced to use ad-hoc conventions to represent attribute lists (a dictionary list for JSON, and an a-list for LISP). It’s also worth noting that JSON and LISP now look so much like XML, both in length and complexity, that it’s hardly possible to distinguish them. Node attributes are not esoteric — they’re the basis of such simple things as hyperlinks.

Data typing

XML certainly looks better for the attributes, but now let’s jump to data typing. Let’s assume that there is a country where people use real numbers as names, and we need to find a way to distinguish names that are real numbers from names that just happen to look like real numbers (say, a person named “1.7″ in a country where names are strings). JSON and LISP can make that distinction naturally using first-class syntax, while XML has to use a different standard that is not part of the core language:

<!-- XML -->
<list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <female-name xml:lang="it">Anna Maria</female-name>
  <male-name xml:lang="en">Fitzwilliam</male-name>
  <male-name xml:lang="fr">Maurice</male-name>
  <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name>
</list>
/* JSON */
{"names!": [
  {"female-name": [{"lang": "it"}, "Anna Maria"]},
  {”male-name: [{"lang": "en"}, "Fitzwilliam"]},
  {”male-name”: [{"lang": "fr"}, "Maurice"]},
  {”female-name”: [{"lang": "de"}, 7.9]}]}
;; LISP
'(names!
  (female-name (((lang it)) "Anna Maria"))
  (male-name (((lang en)) "Fitzwilliam"))
  (male-name (((lang fr)) "Maurice"))
  (female-name (((lang de)) 7.9)))

XML loses badly on this particular example; however, if the extra data were (say) a date or currency, we would have to make up an ad-hoc way to label its type in JSON and LISP as well, since they have no special syntax to distinguish a date or monetary value from a regular number or string. For anything other than simple numeric data types, this one’s actually a draw.

Mixed content

And now, finally, for mixed content. I will add surnames to all of the (non-numeric) names in the list, and (here’s the kicker) will put those in their own labeled nodes:

<!-- XML -->
<list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <female-name xml:lang="it">Anna Maria <surname>Mozart</surname></female-name>
  <male-name xml:lang="en">Fitzwilliam <surname>Darcy</surname></male-name>
  <male-name xml:lang="fr">Maurice <surname>Chevalier</surname></male-name>
  <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name>
</list>
/* JSON */
{"names!": [
  {"female-name": [{"lang": "it"}, "Anna Maria", {surname: "Mozart"}]},
  {”male-name: [{"lang": "en"}, "Fitzwilliam", {surname: "Darcy"}]},
  {”male-name”: [{"lang": "fr"}, "Maurice", {"surname": "Chevalier"}]},
  {”female-name”: [{"lang": "de"}, 7.9]}]}
;; LISP
'(names!
  (female-name (((lang it)) "Anna Maria" (surname "Mozart")))
  (male-name (((lang en)) "Fitzwilliam" (surname "Darcy")))
  (male-name (((lang fr)) "Maurice" (surname "Chevalier")))
  (female-name (((lang de)) 7.9)))

Character for character, the JSON and LISP are still shorter, but the difference is not nearly as dramatic as it was in the very first example. In fact, typing all of these examples by hand, I find myself appreciating the redundant end tags on the XML parts, because it’s getting very hard to keep track of all the closing “]”, “}” and “)” for JSON and LISP.

No silver bullet

There are a few morals here. First, with markup, as with coding, there’s no silver bullet. JSON (and LISP) have the important advantage that they make the most trivial cases easy to represent, but as soon as we introduce even the slightest complexity, all of the markup starts to look about equally verbose. That means that the real problems we have to solve with structured data are no longer syntactic, and anyone trying to find a syntactic solution to structured data is really missing the point: JSON, XML (and LISP) people would be best making common cause to start dealing with more important problems than whether we use braces, pointy brackets, or parentheses. That’s why I was excited to have JSON inventor Doug Crockford speak at XML 2006, and why I hope that we’ll get more submissions about JSON as well as XML for 2007.

Personally, I like XML because it’s familiar and has a lot of tool support, but I could easily (and happily) build an application based on any of the three — after all, once I stare long enough, they all look the same to me.