(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for April, 2005

More on RSS as the HTML for data …

Wednesday, April 27th, 2005

A short while ago, I reluctantly acknowledged that RSS 2.0 will likely fill the same role for data that HTML fills for documents, providing a single, shared format across the web (the big missing piece of the puzzle for REST apps). Now, it appears that someone a lot smarter than I am — no one less than Adam Bosworth — is suggesting exactly the same thing.

If I’m wrong about RSS, at least I’ll be wrong in excellent company.

Collateral Damage

Thursday, April 14th, 2005

I am a bystander in the war between spammers and virus writers on the one side, and Microsoft and the antivirus companies on the other. I have never in my life read or sent an e-mail message using Microsoft Outlook, I spend, perhaps 4 hours/year using the Windows operating system (mostly helping other people with computer problems), and I never read e-mail or browse the web as root, so I should live in a fairly safe area, far from the battlefield. Nevertheless, I lost e-mail services for my whole domain this morning because of Outlook viruses on other people’s systems, and it will take at least a few hours before I can receive e-mail here again.

In fact, I’ve been hit by a lot of collateral damage over the years. I had to shut down my old e-mail account at this domain, david, when the volume of messages passed 1,000 per hour; even now, the megginson.com domain can receive as over 30,000 messages a day — it’s a day-to-day challenge to keep the domain working at all, involving frequent changes of ISP.

What happened? Because my old e-mail address was well known, it ended up in a lot of people’s Outlook address books; then, predictably, some of those systems got infected, so their Outlook installations started sending out virus messages with my return address forged, and those messages infected more systems, which started sending out more, and so on. Those didn’t affect me directly (aside from writing the occasional polite reply to an irate message asking why I was mailing viruses), but then the warnings from the antivirus software at other people’s sites started pouring in. The antivirus makers know perfectly well that the return addresses on virus attacks are nearly always forged, but still cannot resist a marketing opportunity by warning me that my non-existant Outlook installation is infected with a virus.

I don’t know how many more direct hits I’ll be able to withstand at megginson.com — I’ll never know, of course, how much business I’ve lost over the past couple of years because of these e-mail problems, and sometimes I’m tempted just to abandon the domain, or at least, any attempt at using it for e-mail.

If there’s a moral to this, it’s that sloppy design hurts more people than the immediate users — simply choosing not to use bad software does not protect you from its flaws. Security holes in Outlook hurt me, though I’ve never used the program; virus-warning spam from antivirus software makers repeatedly shut me down, though I’ve never bought their products. If we mess up too badly designing our next generation of XML-based systems (blogs, REST, Web Services, or what-have-you), it’s hard to predict how many people we’ll hurt beyond our immediate user base.

Gmail without AJAX, part 1

Monday, April 11th, 2005

I noticed today that Gmail is now offering an alternative, non-AJAX interface, selectable by choosing “basic HTML” below the message listing. This is actually a great opportunity to experiment and see whether AJAX (or any other kind of heavy DHTML-style interaction) actually makes a enough of a difference to justify the extra implementation work.

I’ll do all my Gmail browsing using old-style HTML forms until next week and observe how much I miss the extra features, then will report back here.

(First note: Gmail does not allow you to change account settings using the non-AJAX interface.)

Self-classification on the web

Monday, April 11th, 2005

Coordinator: Crucifixion?
Prisoner: Er, no, freedom actually.
Coordinator: What?
Prisoner: Yeah, they said I hadn’t done anything and I could go and live on an island somewhere.
Coordinator: Oh I say, that’s very nice. Well, off you go then.
Prisoner: No, I’m just pulling your leg, it’s crucifixion really.
Coordinator: [laughing] Oh yes, very good. Well…
Prisoner: Yes I know, out of the door, one cross each, line on the left.

From Monty Python, Life of Brian (1979)

And now, for the pure joy of killing the joke by trying to explain it, this scene of Life of Brian is funny for two reasons:

  1. the Romans allow the prisoners to self-classify themselves as condemned-to-death-by-crucifixion or free-to-go, even though the prisoners have every incentive to lie and save their own lives and no incentive to tell the truth; but
  2. the prisoners all classify themselves correctly anyway.

A lesser wit (like me) would have stopped at the first part of the joke and let all of the prisoners run off; however improbable the first part, however, it’s always the second part that gets the laugh.

Tim Bray is still wondering about tags, but what he’s really wondering about, I think, is the whole idea of self-classification on the web. Should we be as trusting as the Roman coordinator? Will web content creators classify themselves honestly? So far, the record has not been good — for example, web search engines quickly learned to ignore Dublin-Core-style information in the HTML meta element because, unlike the prisoners in Life of Brian, doomed by their own honesty, people who create content for the web lie. In fact, they lie a lot.

At this point, folksonomy tags are a bit of a cottage industry, so the incentive for lying is low (people are happy to tell the truth when it doesn’t cost them much). Self-classification can work when the costs of lying are unacceptably high and the benefits of lying are low or non-existant — for example, a departmental web site inside a government or large company, a member of a supply chain, or a major vendor with a reputation to protect would lose much and gain nothing by using deceptive metadata to pull in more traffic. That does not apply to the web as a whole, though. Once you move beyond established relationships (enterprise or inter-enterprise), trust is much more difficult to manage.

What will happen when tags become more popular? Will the current model be sustainable? Is there any future for using any kind of metadata to self-classify on the web? The answer probably has something to do with reputation management, though people are doing a good job gaming even that with link farms and comment-/wiki-spam. The crucifixion line looks rather empty right now.

Post in REST: create, update, or action?

Sunday, April 3rd, 2005

(Personally, I think it would be healthier if we worked out the wrinkles in REST by writing lots of code rather than writing lots of blog entries, but blog entries are easier, and XML people have never been shy about pontificating, so here goes …)

Joe Gregorio has an excellent article on REST at XML.com, and I recommend that anyone interested in building an XML data app with REST rather than RPC (Web Services, etc.) take a look at it. However, one point in the article jumped out at me. Joe mapped CRUD to the standard HTTP verbs like this:

CRUD HTTP
Create POST
Retrieve GET
Update PUT
Delete DELETE

Personally, I’ve always seen the mapping more like this:

CRUD HTTP
Create PUT
Retrieve GET
Update PUT
Delete DELETE

In other words, PUT does double duty for both Create and Update. Of course, the sad point is that almost no actual REST applications work this way — most of them are read-only, so GET is the only verb that counts, and when they do allow updating information, they do not do it by sending an entire resource representation (i.e. XML file) via PUT.

What about POST?

I think there are actually two roles that POST can play in a data application:

  1. Partial, in-place updates (i.e. send just pieces of changed information, rather than a whole resource).
  2. Actions (i.e. buy a book).

For example, consider this resource representation (fancy REST talk for “file”) using POX:

<pet-record xml:base="http://www.example.org/pets/lassie.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
 <name>Lassie</name>
 <gender>f</gender>
 <dame xlink:href="spot.xml">Spot</dame>
 <sire xlink:href="jenny.xml">Jenny</sire>
 <offspring xlink:href="marmaduke.xml">Marmaduke</offspring>
 <offspring xlink:href="snoopy.xml">Snoopy</offspring>
</pet-record>

Now, let’s say that Lassie has another puppy. Using REST, there are two obvious ways to update this information. The first is to download the XML file (using HTTP GET), add an extra offspring element, then upload the modified file (using HTTP PUT):

<pet-record xml:base="http://www.example.org/pets/lassie.xml" xmlns:xlink="http://www.w3.org/1999/xlink">
 <name>Lassie</name>
 <gender>f</gender>
 <dame xlink:href="spot.xml">Spot</dame>
 <sire xlink:href="jenny.xml">Jenny</sire>
 <offspring xlink:href="marmaduke.xml">Marmaduke</offspring>
 <offspring xlink:href="snoopy.xml">Snoopy</offspring>
 <offspring xlink:href="clifford.xml">Clifford</offspring>
</pet-record>

The second option is simply to send the REST server a message that it should add an extra offspring, by using HTTP POST to a URL like http://www.example.org/pets/updates/add-offspring and using parameters to identify sire or dame and offspring. I don’t know that it’s possible to say that either of these approaches is better; obviously, the POST approach would make more sense for very large resources.

The other use of POST would be to execute options that do not have an obvious correspondence with resource representations/files. A good example would be posting to a URL http://www.example.org/pets/actions/buy including parameters describing the pet you want to buy (i.e. the URL of the pet’s XML file — we’re RESTful, after all) and the price you are willing to pay.

The one use I don’t see for POST is uploading entire XML files, except as a way to work around firewalls that block PUT. Maybe we should fix the firewalls, or maybe we’ll just have to learn to live with this (ab)use of POST as a practical necessity for making REST work with the current web infrastructure.