All markup ends up looking like XML
January 3rd, 2007In the current JSON vs. XML debate (see Bray, Winer, Box, Obasanjo, and many others), there are three things that important to understand:
- There is no information that can be represented in an XML document that cannot be represented in a JSON document.
- There is no information that can be represented in a JSON document that cannot be represented in an XML document.
- There is no information that can be represented in an XML or JSON document that cannot be represented by a LISP S-expression.
They are all capable of modeling recursive, hierarchical data structures with labeled nodes. Do we have a term for that, like Turing completeness for programming languages? It would certainly be convenient in discussions like this.
Syntactic sugar
The only important differences among the three are the size of the user base (and opportunity for network effects), software support, and syntactic convenience or inconvenience. The first two are fickle — where are the Pascal programmers of yesteryear? — so let’s concentrate on syntax. Here’s a simple list of three names in each of the three representations:
<!-- XML --> <names> <name>Anna Maria</name> <name>Fitzwilliam</name> <name>Maurice</name> </names>
/* JSON */
{"names": ["Anna Maria", "Fitzwilliam", "Maurice"]}
;; LISP '(names "Anna Maria" "Fitzwilliam" "Maurice")
Nearly all comparisons between XML and JSON look something like this, and I have to admit, it’s a slam dunk — in an example like this, XML seems to go out of its way to violate Larry Wall’s second slogan: “Easy things should be easy and hard things should be possible.” On the other hand, I rarely see any data structures that are really this simple, outside of toy examples in books or tutorials, so a comparison like this might not have a lot of value; after all, I could have written the XML like this:
<names>Anna Maria, Fitzwilliam, Maurice</names>
Let’s dig a bit deeper and see what we find.
Node labels
In the previous example, I made some important assumptions: I assumed that node label for the individual names (”name”) didn’t matter and could be omitted from the JSON and LISP, and I assumed that the node label for the entire list (”names”) was a legal XML and LISP identifier. Let’s break both of those assumptions now, and make the label for the list “names!” and the labels for the items “male-name” or “female-name”. Here’s what we can do now to handle this in XML, JSON, and LISP:
<!-- XML --> <list label="names!"> <female-name>Anna Maria</female-name> <male-name>Fitzwilliam</male-name> <male-name>Maurice</male-name> </list>
/* JSON */
{"names!": [
{"female-name": "Anna Maria"},
{"male-name: "Fitzwilliam"},
{"male-name": "Maurice"}]}
;; LISP '(names! (female-name "Anna Maria") (male-name "Fitzwilliam") (male-name "Maurice"))
XML is forced to use a secondary syntactic construction (an attribute value) to represent the top-level label, because it no longer matches XML’s syntactic rules for element names. LISP simply switches from a token to a string to represent “names!”can still use names! as a token, and JSON doesn’t notice, because it has been using a string all along — XML syntax is convenient for trees of labeled nodes only when the labels are heavily restricted. That aside, however, note that as soon as we add any non-trivial complexity to the information — as soon as we assume that node labels matter — then all three formats start to look a little more like XML.
Additional node attributes
Now, let’s add the next wrinkle, by allowing additional attributes (beside a label) for each node. In this case, we’re going to add a “lang” (language) attribute to each of the nodes:
<!-- XML --> <list label="names!"> <female-name xml:lang="it">Anna Maria</female-name> <male-name xml:lang="en">Fitzwilliam</male-name> <male-name xml:lang="fr">Maurice</male-name> </list>
/* JSON */
{"names!": [
{"female-name": [{"lang": "it"}, "Anna Maria"]},
{”male-name: [{"lang": "en"}, "Fitzwilliam"]},
{”male-name”: [{"lang": "fr"}, "Maurice"]}]}
;; LISP '(names! (female-name (((lang it)) "Anna Maria")) (male-name (((lang en)) "Fitzwilliam")) (male-name (((lang fr)) "Maurice")))
Now, while XML is still using ad-hoc convention to represent the “name!” label, JSON and LISP are forced to use ad-hoc conventions to represent attribute lists (a dictionary list for JSON, and an a-list for LISP). It’s also worth noting that JSON and LISP now look so much like XML, both in length and complexity, that it’s hardly possible to distinguish them. Node attributes are not esoteric — they’re the basis of such simple things as hyperlinks.
Data typing
XML certainly looks better for the attributes, but now let’s jump to data typing. Let’s assume that there is a country where people use real numbers as names, and we need to find a way to distinguish names that are real numbers from names that just happen to look like real numbers (say, a person named “1.7″ in a country where names are strings). JSON and LISP can make that distinction naturally using first-class syntax, while XML has to use a different standard that is not part of the core language:
<!-- XML --> <list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <female-name xml:lang="it">Anna Maria</female-name> <male-name xml:lang="en">Fitzwilliam</male-name> <male-name xml:lang="fr">Maurice</male-name> <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name> </list>
/* JSON */
{"names!": [
{"female-name": [{"lang": "it"}, "Anna Maria"]},
{”male-name: [{"lang": "en"}, "Fitzwilliam"]},
{”male-name”: [{"lang": "fr"}, "Maurice"]},
{”female-name”: [{"lang": "de"}, 7.9]}]}
;; LISP '(names! (female-name (((lang it)) "Anna Maria")) (male-name (((lang en)) "Fitzwilliam")) (male-name (((lang fr)) "Maurice")) (female-name (((lang de)) 7.9)))
XML loses badly on this particular example; however, if the extra data were (say) a date or currency, we would have to make up an ad-hoc way to label its type in JSON and LISP as well, since they have no special syntax to distinguish a date or monetary value from a regular number or string. For anything other than simple numeric data types, this one’s actually a draw.
Mixed content
And now, finally, for mixed content. I will add surnames to all of the (non-numeric) names in the list, and (here’s the kicker) will put those in their own labeled nodes:
<!-- XML --> <list label="names!" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <female-name xml:lang="it">Anna Maria <surname>Mozart</surname></female-name> <male-name xml:lang="en">Fitzwilliam <surname>Darcy</surname></male-name> <male-name xml:lang="fr">Maurice <surname>Chevalier</surname></male-name> <female-name xsd:type="xsi:float" xml:lang="de">7.9</female-name> </list>
/* JSON */
{"names!": [
{"female-name": [{"lang": "it"}, "Anna Maria", {surname: "Mozart"}]},
{”male-name: [{"lang": "en"}, "Fitzwilliam", {surname: "Darcy"}]},
{”male-name”: [{"lang": "fr"}, "Maurice", {"surname": "Chevalier"}]},
{”female-name”: [{"lang": "de"}, 7.9]}]}
;; LISP '(names! (female-name (((lang it)) "Anna Maria" (surname "Mozart"))) (male-name (((lang en)) "Fitzwilliam" (surname "Darcy"))) (male-name (((lang fr)) "Maurice" (surname "Chevalier"))) (female-name (((lang de)) 7.9)))
Character for character, the JSON and LISP are still shorter, but the difference is not nearly as dramatic as it was in the very first example. In fact, typing all of these examples by hand, I find myself appreciating the redundant end tags on the XML parts, because it’s getting very hard to keep track of all the closing “]”, “}” and “)” for JSON and LISP.
No silver bullet
There are a few morals here. First, with markup, as with coding, there’s no silver bullet. JSON (and LISP) have the important advantage that they make the most trivial cases easy to represent, but as soon as we introduce even the slightest complexity, all of the markup starts to look about equally verbose. That means that the real problems we have to solve with structured data are no longer syntactic, and anyone trying to find a syntactic solution to structured data is really missing the point: JSON, XML (and LISP) people would be best making common cause to start dealing with more important problems than whether we use braces, pointy brackets, or parentheses. That’s why I was excited to have JSON inventor Doug Crockford speak at XML 2006, and why I hope that we’ll get more submissions about JSON as well as XML for 2007.
Personally, I like XML because it’s familiar and has a lot of tool support, but I could easily (and happily) build an application based on any of the three — after all, once I stare long enough, they all look the same to me.
January 3rd, 2007 at 05:58:58
Nits: you left off the ! in the JSON examples, and few if any Lisp dialects have any trouble with ! in a symbol, so stringification is not necessary. (When there *is* a problem, as with an embedded space, you want to wrap the symbol in |…| instead of quotes anyhow, so that internally it’s still a symbol.)
There’s no doubt about your general conclusion, though JSON does lose really badly on mixed content, which is outside its intended domain of application. JSON is about sequences and maps whose (leaf) domains are Unicode strings, real numbers, booleans, and the null value. Using it to represent, say, HTML, is just silly.
I would also point you to SXML at http://okmij.org/ftp/Scheme/SXML.html , which is a complete representation of arbitrary DTDless XML in standard S-expressions that preserves the document infoset in full. It’s pretty well accepted in the Lisp community.
January 3rd, 2007 at 06:00:01
This is brilliant, David! I recently implemented a “JSON mode” for a RESTful style xml based service. I went in with the same preconceptions about the simplicity of JSON and was taken aback by how complex it became. I didn’t fully appreciate the point though until reading this.
I think it would also be instructive to compare the three formats in terms of how data is accessed from them; i.e. xml via parsers or some usage of XPath, JSON through javascript or a specific language library and s expressions via LISP. My initial thought is that LISP wins here because it is itself made up of s expressions, and javascript comes second because JSON uses the native array and list structures. But I think that the javascript/JSON advantage over xml may suffer similarly to the way you have illustrated here. I’ll have to think more on that.
January 3rd, 2007 at 06:00:32
You XML folks completely miss the point. JSON is important because it is better supported in the browser than XML. That’s why it has taken hold. Arguing that angle brackets, S-expressions and JSON syntax are all semantically equivalent is the height of architecture astronautics[0] and COMPLETELY misses the point.
[0] http://www.joelonsoftware.com/articles/fog0000000018.html
January 3rd, 2007 at 07:27:05
John: thanks, I’ve corrected the omission in the JSON examples, but have left the LISP symbols quoted for now, until I have time to confirm that “!” is legal in a symbol in most dialects.
Dare: my point is not that JSON isn’t important, but rather that the functional differences between JSON and XML are so insignificant that I don’t care much which (if either) wins — either one will accomplish what I need to do. As far as the specifics of tool support go, JSON is supported well in the browser only if you use eval(), which is inherently extremely dangerous, as you’ve pointed out elsewhere. If you’re not willing to do that (and as a semi-responsible web developer, I wouldn’t be), then you have to use a separate library — albeit a small one — to do the JSON parsing. On the other hand, all modern browsers have support built-in for parsing XML safely, though the interface they present to the programmer (DOM) is low-level and awkward, so in practice, you have to install some kind of separate library to simplify processing. Browser-side data support is a lose-lose situation right now, whether you’re using JSON or XML.
I remember reading that early PHP started out with some apparently friendly features like prepopulating top-level variables automatically from GET or POST parameters, so that if someone invoked http://example.org/foo.php?x=y, in the PHP script $x would be set to “y”. Guess how fast that was deprecated. I do think that JSON will probably continue to grow in browser-side use at XML’s expense, but ironically, as it gets more popular it will lose most of the features that make it seem simple right now. Don’t be surprised if JavaScript eval() disappears completely (or at least requires explicit user authorization after reading a warning dialog) in future browser versions if there are high-visibility exploits in the next couple of years. Perhaps they’ll add an equivalent of PHP’s unserialize function, which is just as convenient but much safer.
January 3rd, 2007 at 08:04:09
You can use eval with JSON safely, you just need check the input string for evilness. In the official json.org code this is a one-line regexp. Not sure if this will come through in the comment, but you can find it yourself in http://www.json.org/json.js: /^(”(\\.|[^"\\\n\r])*?”|[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t])+?$/
January 3rd, 2007 at 08:23:21
Don’s link in your first line of your post is not pointing to his blog entry. Here it is:
http://pluralsight.com/blogs/dbox/archive/2007/01/03/45560.aspx
January 3rd, 2007 at 08:51:21
David,
I could swear that Safari didn’t have support for XML parsing from javascript. Not sure if this is still true nor if it matters given firefox penetration on Mac OS.
DB
January 3rd, 2007 at 09:03:19
Thanks to Dilip — I’ve fixed the link.
January 3rd, 2007 at 09:15:48
Are we all seriously still comparing the syntax of a data format to the syntax of an object literal in a programming language? why?
I ignored this debate when I saw it start but I’m sure it’s gonna get me somehow. I bet the next company I walk into one of the developers will ask me what I think is best? JSON or XML? *sigh* At least it may replace my most hated apples/oranges question: Which do you think is best? PHP or Ruby on Rails?
January 3rd, 2007 at 10:07:14
It looks to me that out of the 3, only LISP stays clear with extra complexity.
I assume there’s no security problem with using LISP here; after all, you’re treating code as data, not data as code.
The magic about LISP is that writing a parser (for at least basic syntax) is so easy that the average coder could probably do it in JavaScript. It’s just nested CSV effectively. Writing a parser for JSON or XML is much more tricky.
January 4th, 2007 at 12:10:41
Nice comparison.
While the nominal complicatedness is similar, note the clarity differences — particularly in the last, most complicated example. The xml example is almost completely un-scannable and so has to be (slowly, carefully) read. The Lisp is easy/fast to scan. The JSON is somewhere in-between.
January 4th, 2007 at 03:54:05
I’m curious to know how YAML stacks up in this comparison.
January 4th, 2007 at 06:32:06
JSON works pretty well in practice, even if it’s something of a hack. It’s meant for fast delivery of simple structured data to a browser, and it works nicely for that purpose. If you use an encoder with proper escaping there shouldn’t be any security risk. Badly escaped XML can also lead to cross-site scripting attacks. You probably shouldn’t use JSON for cross-domain data transfer, which requires some ugly security policy abuse (e.g. JSONP) and isn’t easy to do for XML.
January 4th, 2007 at 08:15:47
John Mitchell: I think that’s pretty subjective. I have about 20 years experience with LISP and C, and 17 with perl, SGML and XML. I find the LISP and XML examples about equal for scanning (the LISP looks cleaner, but it’s a pain having to count closing parentheses to figure out nesting), while the JSON is a fair bit harder. I don’t claim that anyone else should have exactly the same experience as me, but it is worth noting that this is a pretty heavily subjective area.
Kevin: I decided to pick just three examples for this posting, but I imagine that if I added YAML (or even LaTeX) they’d end up looking about the same. The hard part of structured markup has never been the syntax, at least not since we got rid of SGML with its excessive number of variants and configuration options.
JML: Agreed — in 2000 or 2001 I gave a keynote talk on XML security where I outlined a whole series of exploits that were possible using XML on the Web, and the only reason we haven’t had a problem with them is that general-purpose XML on the web hasn’t become popular (it’s mainly messaging and data dumps). That said, I don’t think that simply using a proper encoder is enough, because that protects only on the server side — you also need to do something on the browser side, either using a dedicated JSON parsing library, or at least (as John Cowan suggests) running the JSON through a regex before passing it to eval().
January 4th, 2007 at 10:04:14
[...] All markup ends up looking like XML by David Megginson - argues that XML is just like JSON except with the former we use angle brackets and in the latter we use curly braces + square brackets. Thus they are “Turing” equivalent. Academically interesting but not terribly useful information if you are a Web developer trying to get things done. [...]
January 4th, 2007 at 10:17:40
[...] So how does this all apply to XML? I think that there are two ways that architecture astronauts can approach XML, one good and one bad. The bad one is in line with Spolsky’s original piece, where people miss what made XML popular (relative simplicity, no need to create DTDS, etc.) and believe that if a bit of standardization is good, a lot must be even better. The good one is to step back and point out that most of the advantages that appear to come from XML actually come from generic tree markup, and that holy wars between XML, JSON, YAML, etc. are really beside the point. In various situations, one syntax may have an advantage due to software support — for example, web browsers have built-in support for parsing XML or styling it using CSS, and they can convert JSON directly to JavaScript data structures using the eval() function — but when you look at the whole world of generic markup, those are small blips on a very large screen, and all of the markup languages more-or-less look the same. [...]
January 4th, 2007 at 11:05:47
Re: Subjectivity in Readability/Scannability
I think you’re helping to prove my point… You’ve been “reading” XML for so long that you think that it’s scannable.
Seriously, there’s a huge difference between in the ability to carefully read something and the ability to glance at it and get it that is, IMHO, grossly underestimated by advocates of “dense” languages. This cost goes up as the length and complicatedness of the documents increases.
Re: Counting parens
Um, er, why are you having to do that? Ah, must not be using any good editors that do that matching for you.
I generally find the “counting parens” argument to be a bit disingenuous given how people list XML-aware editors as a “pro” for XML.
January 4th, 2007 at 11:25:26
Here are your Lisp references:
Common Lisp: http://www.lisp.org/HyperSpec/Body/sec_2-3-4.html (not explicit, because in Common Lisp symbols are defined by what they aren’t, and because it would be possible to define “!” to have special meaning to the reader)
R5RS Scheme: http://schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-10.html#%_sec_7.1
Emacs Lisp: http://www.gnu.org/software/emacs/elisp-manual/html_node/Symbol-Type.html#Symbol-Type
January 4th, 2007 at 11:46:09
I would first argue with the terminology in your first statement: It is not correct to say “JSON document”. JSON is not a document format.
One of the advantages of JSON is that its adoption invites you to reconsider the design of your structures. There are cases, such as here, where dependence on XML has caused an unnecessary injection of structural complexity. So I would want to render the final example differently:
{”names!”: [
{"gender" : "female", "given-name": "Anna Maria", "lang": "it", "surname": "Mozart"},
{"gender" : "male", "given-name: "Fitzwilliam", "lang": "en", "surname": "Darcy"},
{"gender" : "male", "given-name": "Maurice", "lang": "fr", "surname": "Chevalier"},
{"gender" : "female", "given-name": 7.9, "lang": "de"}
]}
This comes from a different perspective, regarding the information as data rather than a document. Of course you could make these same improvements in XML.
January 4th, 2007 at 11:50:38
John (Mitchell): It might be fair to argue that both of us have been reading LISP too long if we find it easy to scan.
Seriously, I use emacs for both LISP and XML (and would do so for JSON as well), but if we’re talking about the ability to scan a syntax, we’re talking about the ability to interpret it without mechanical help. That means counting parens for LISP, or matching start/end tags for XML. Clever indentation can help in both cases, of course. I’ve never belonged to the faction that says syntactic complexity doesn’t matter because tools can hide it — I think that XML, JSON, and LISP all benefit from being relatively easy to scan and create without special tools.
January 4th, 2007 at 12:34:11
Great comparison!
I hadn’t thought about it much, but for my eyes the S-expression wins for simplicity and compactness.
Using a decent editor (e.g., if IntelliJ magically had EMACS’s LISP support), proper formatting and breaking up of the strings of parentheses make the S-expression very readable.
Alas, I do not have such magic tools at my fingertips.
January 4th, 2007 at 12:35:52
JSON as a General Purpose Alternative?
The discussion at Dare Obasanjo’s blog, and the references to some other recent posts motivate me to look at some of the broader arguments being made. It is clear that processing JSON data using JavaScript is trivial, and that is what makes JSON a pop…
January 5th, 2007 at 07:26:50
There are experiments in making Lisp indentation-based rather than parenthesis-based - or a combination of the two.
http://www.dwheeler.com/readable/readable-s-expressions.html
January 6th, 2007 at 02:14:47
Hi David - and every tree structure can be normalised to a relational structure (if we allow nulls) so I suggest we already have the equivalent of a Turing Machine in relational structures and therefore we can apply a complete computational model. I suspect this is the real reason that XML is working so well.
ie Turing for computation, relational for data.
January 7th, 2007 at 02:48:40
[...] Dare links to a post by David Megginson that compares the data exchange syntax of JSON, XML, and Lisp, but dismisses it as being irrelevant to why some people are choosing JSON over XML. I think he’s right, but I also think the syntax comparison is helpful. The key insight into the syntax is that JSON is indeed simpler to both write and visually parse for simple data structures, but that XML’s explicitness, particularly with close tags, is actually helpful when composing documents. Trying to write Lisp code without an editor that helps match up closing parentheses is an exercise in needless frustration, and I’ve seen countless C-style programs that include comments next to each closing brace to indicate the block that they are closing. Such helpers indicate a possible deficiency of clarity in the syntax of the language, and XML addresses this by naming close tags. The conclusion of the post is that all three languages are equally expressive in terms of functionality, and that syntax differences shouldn’t be a significant issue in deciding which to use for data exchange. [...]
January 8th, 2007 at 01:39:37
[...] Although I have been talking about this topic for some time, some XML bloggers have recently started to discuss the topic (Tim Bray, Don Box, David Megginson, Dare Obsanjo and all the comments). For the most part the discussion is pretty tame and nothing really new. [...]
January 8th, 2007 at 01:24:53
[...] LexiMédia2007 : L’outil LexiMédia2007 permet de suivre l’actualité des élections présidentielles de 2007 semaine après semaine (réalisé par Didier Bourigault et Franck Sajous du laboratoire CLLE-ERSS, unité mixte du CNRS et de l’Université Toulouse-Le Mirail). LexiMédia2007 analyse en permanence les articles des journaux Le Monde, Libération et Le Figaro issus du flux Présidentielle 2007 de Jean Véronis . LexiMédia2007 donne l’évolution au fil des semaines de la fréquence d’utilisation de certains syntagmes, globalement et par journal. Pour chaque semaine, il donne les syntagmes les plus utilisés, les syntagmes en forte hausse, les syntagmes en forte baisse et ceux dont la variation (hausse et baisse confondues) est la plus importante. Pour chaque syntagme, on peut voir le détail de son évolution (courbe de fréquence sur l’ensemble des semaines) et les liens vers les articles dans lesquels il apparaît (détail de la démarche). Google Patent Search : With Google Patent Search, you can now search the full text of the U.S. patent corpus and find patents that interest you. XML versus Json : “All markup ends up looking like XML”. The Future Of Education : What is Project Based Learning? Classer, trier et taguer pour retrouver : les enjeux documentaires du Web (2.0). Comment lire les documents électroniques sur écrans d’ordinateur ? Une solution proposée par Yves Guiard, directeur de recherche au laboratoire « Mouvement et perception » de Marseille, qui développe un « prototype qui permet d’incliner les documents électroniques et donc de les voir en perspective. Et le plus important, c’est que l’on peut faire défiler les pages avec cette vue perspective…”. Accès au prototype. Deux lectures complémentaires et pertinentes pour finir : Indexation et compréhension (sur Les Petites Cases.net) et Données, Information, Connaissance… et champignons ?. [...]
January 9th, 2007 at 06:37:39
[...] All markup ends up looking like XML by David Megginson - argues that XML is just like JSON except with the former we use angle brackets and in the latter we use curly braces + square brackets. Thus they are “Turing” equivalent. Academically interesting but not terribly useful information if you are a Web developer trying to get things done. [...]
January 25th, 2007 at 04:26:48
Wow…
David, have you at no point considered refactoring your Lisp and JSON markups? They’re horrible, you’re piling crap on crap it’s idiotic.
I won’t talk about Lisp because i’m not a specialist in Lisp data structures, but
* Syntactic Sugar section
The final XML you suggest is definitely not equivalent to the JSON and Lisp markups: in JSON and Lisp the `names` element directly holds independent names as full fledged object of the markup, while in the final XML markup the `name` element only holds a single string that you have to parse and split manually, you can’t just iterate on it or manipulate it in a simple way.
* Additional node attributes
What the hell’s the point of creating lists of objects here?
How about mapping every key to an object/alist/… and removing the complexity of having thousands of layers? e.g.
{”names!”: [
{"female-name": {"lang": "it", "value":"Anna Maria"},
{"male-name: {"lang": "en", "value":"Fitzwilliam"},
{"male-name": {"lang": "fr", "value":"Maurice"}]}
And this holds true for the Mixed Content section as well, put the surname in its own node in XML if you with, but in JSON you should just add an other entry to the base *-name objects/maps, creating yet-another-object in a useless redundant list is not only an annoyance for the user, but it makes the access code *much* more verbose and less readable than it needs be.
January 25th, 2007 at 06:20:57
Try adding unescaped binary data to a JSON-document; poissible with CDATA in XML, but of course a monkey like you only knows about 3% of the XML-spec, right? Thought so.
January 25th, 2007 at 06:58:45
Masklinn: you’re right that my examples were not very complicated and could have been represented more simply in all three markup syntaxes — see Crockford’s comment earlier, which made the same point as yours.
Just to throw a wrench into it, though, you cannot always assume that order doesn’t matter. For example, try this:
If you break this down to a map then try to reconstruct the name, you’ll end up with “Hussein Saddam” instead of “Saddam Hussein”. The solution then, of course, is to add *another* property specifying whether the surname comes first or last.
Next, you see a name like this:
A simple ordering property won’t work now. Instead, for your map, you’ll have to include a surname, pre-surname, and post-surname property. And then you have the problem of honorifics, which can come before (”Dr.”, “Sir”) or after (”PhD”, “OBE”) the name, or prepositions like “de” in French and Spanish, which go with the surname but drop off for collation, etc.
I think most people who work with real, non-trival data have experienced these kinds of problems, even when XML has never entered the equation — notice how relational database schemas in production systems usually end up resembling a ball of tangled string more than the neat diagrams in textbooks.
January 25th, 2007 at 07:00:29
To the commenter calling him/herself sumfag: I appreciate your desire to defend XML, but CDATA sections cannot hold arbitrary binary data.
January 25th, 2007 at 07:09:52
It’s not important to the point of the piece, but lisp alists are lists of pairs, so if you were using an alist to model the attribute bit it would be:
‘(names!
(female-name (((lang . it)) “Anna Maria”))
(male-name (((lang . en)) “Fitzwilliam”))
(male-name (((lang . fr)) “Maurice”)))
(unless lisp is different to scheme, in which case ignore this!)
January 25th, 2007 at 09:44:17
‘( names!
( :name
:gender female
:lang it
:given “Anna Maria”
:sur “Mozart” )
( :name
:gender male
:lang en
:given “Fitzwilliam”
:sur “Darcy” )
( :name
:gender male
:lang fr
:given “Maurice”
:sur “Chevalier” )
( :name
:gender female
:lang de
:given 7.9
:sur NIL ) )
? The lisp can be made clearer, I think.
January 25th, 2007 at 02:36:41
the thing I like about JSON is that it maps to a object oriented data structure, and the tooling surrounding it is just about serial/deserialization. in comparison, the DOM data structure used by XML is immensely frustrating to deal with. multiple text node children hanging off elements, content validation before you can .InnerText, theres just a never ending rigamaroll to do what is ultimately a very very simple task, and i for one do not enjoy writing such verbose kludgtastic code. the data structures built by xml tooling just suck horrendous ass.
i’m still unsure as to the ultimate feasibility, but projects like jsonml to bridge json and xml seem like they could be vital in the future. there’s a lot of xml out there already, but at this moment, i can say i would like very much to be able to ignore it for the rest of my days, and consume it as json.
January 25th, 2007 at 04:35:20
rektide: I think you hit on a major problem with XML in the browser — it’s not XML itself, but the DOM, which is an extremely awkward interface to use. Part of that is because DOM has to be able to handle mixed content as well as fielded data, but a lot of it is simply the DOM’s design history. Personally, I always use some kind of a simple helper library with browser-based DOM.
January 25th, 2007 at 11:24:51
I used to hate XML. Then I discovered XPath.
The funny thing is that XPath conversely is such a nice API for querying XML that now I find myself wishing I had it available for native datastructures. Manually writing code to traverse them is tedious at best. The CPAN has several modules that implement (some subset of) XPath for document types it was not designed for, even one that adds an XPath interface to regular classes so that you can use XPath to walk object hierarchies.
January 26th, 2007 at 08:16:27
[...] Vai alla pagina segnalata. [...]
January 28th, 2007 at 07:31:33
Very nice comparison.
So here’s another challenge (should you choose to accept it
Given “There is no information that can be represented in an XML document that cannot be represented in a JSON document”. Ok, we all know RDF/XML syntax is really ugly. So how would you do RDF in JSON?
(There are a few different systems which use RDF in Lisp, so that’s already covered. There’s also Turtle RDF syntax which is JSON-like in terms of simplicity, and I’m pretty sure Javascript browsers are available, but an eval() would be so much easier…)
January 28th, 2007 at 07:50:31
Danny: The easiest approach would be simply to encode the triples in JSON. The only trouble is that, at least for RDF 1.0, the triples data model didn’t actually represent all of the information in the XML syntax. Maybe that’s been fixed in a later RDF revision.
January 28th, 2007 at 05:50:45
[...] Trees Are Better Than Others”, which implies you gotta read Megginson’s piece “All Markup ends up looking like XML” and Doug Crockfords comment on that last one “For the trees”, gets me thinking. JSON [...]
January 28th, 2007 at 10:47:47
[...] Crockford left an excellent comment on my recent posting All markup ends up looking like XML, which he later made into its own blog posting, For the trees. I agree with his reworking of the [...]
January 29th, 2007 at 12:22:29
[...] Megginson posted a follow up on his earlier take on JSON, which contains an excellent example/explanation on the advantages of markup compared to hard wired [...]
February 6th, 2007 at 04:56:28
Is there an equivalent to XPath for JSON?
February 26th, 2007 at 09:32:17
>> Manually writing code to traverse them is tedious at best.
hmm. that’s why Dr. Codd devised the relational model. as one other commenter pointed out. folks may continue to stuff data into hierarchy, whether it belongs there or not. eventually, someone will “rediscover” that the magic of the relational algebra. sigh.
February 28th, 2007 at 01:53:42
Might be worth considering HL Mencken’s thoughts on XML vs JSON vs LISP et all:
“We must accept the other fellow’s religion, but only in the sense and to the extent that we respect his theory that his wife is beautiful and his children smart.”
XML vs JSON vs LISP, that would be the “religion” part.