(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for March, 2005

REST: is RSS the HTML for data?

Thursday, March 31st, 2005

As I’ve mentioned before, REST offloads complexity from the protocol (HTTP) to the content (XML). That makes REST look simple as long as you focus only on the protocol, but RESTafarians cannot get away forever with leaving the content format for data unspecified.

REST works with the existing document web because we have HTML to hold everything together — in other words, we have a standard protocol and a standard format. What’s the equivalent of HTML for the RESTful data web? RDF? XML Topic Maps? POX (Plain Old XML) with XLink? Nope — love it or hate it, I get the impression that it’s going to be RSS 2.0. People are starting to push the boundaries of RSS in serious ways, and so far, it’s not breaking. I have trouble imagining how we’re going to use RSS to encode information (say, a data record) rather than just pointing to information, but I’m ready to be surprised.

On the topic of RSS, I noticed that Open Search has introduced some RSS 2.0 extension properties (confusingly labelled OpenSearch RSS 1.0 Specification) to handle result paging, which was at the centre of another of my REST design questions. The spec is admirably minimalist, introducing only three new child elements of channel: openSearch:totalResults, openSearch:startIndex, and openSearch:itemsPerPage. That way, a RESTful web app can return (say) results 65-98 of 200,000 in a reasonably portable way:

<rss version="2.0" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">
 <channel>
  <title>Example.org search: REST</title>
  <link>http://www.example.org/search?q=REST&amp;start=65</link>
  <description>Search results for REST.</description>
  <openSearch:totalResults>200,000</openSearch:totalResults>
  <openSearch:startIndex>65</openSearch:startIndex>
  <openSearch:itemsPerPage>33</openSearch:itemsPerPage>
   ...
 </channel>
</rss>

This is exactly the way people are supposed to use Namespaces (nicely done!), I’m impressed that they require including the GET URL that can reproduce the search results. It would be even better, I think, if A9 added just two more elements to their RSS extensions:

  <openSearch:previousLink>http://www.example.org/search?q=REST&amp;start=32</openSearch:previousLink>
  <openSearch:nextLink>http://www.example.org/search?q=REST&amp;start=99</openSearch:nextLink>

That way, I would be able to page through the results without having to know how to construct query GET URLs for that particular site.

I like RSS for syndication, but it wasn’t exactly what I had in mind for general data handling (I would at least have liked a common attribute identifying URLs, like xlink:href); then again, HTML wasn’t exactly what I had in mind for Hypertext in 1990 either, and it took me two years to stop being sniffy and start working with it. I won’t wait that long this time.

Admin: Comment and Pingback Limits

Thursday, March 31st, 2005

I’ve been spending a lot of time deleting comment and pingback spam from my two blogs (most of it from the moderation queue). My first impulse was to ban comments and pingbacks completely — after all, some blogs seem to do fine without them, and most people technically-oriented enough to read Quoderat already have their own blogs that they can use to comment on mine.

After some thought, however, I’ve decided on a compromise — I’m going to leave postings from the current and previous month open for comments, but close any older ones. That should eliminate a lot of the spam, but still allow discussion on recent postings. I might tighten that up a bit more, but I’ll give it a chance, first.

How is everyone else dealing with comment/pingback/traceback spam? My blog isn’t all that popular — it must be much worse for blogs with high rankings.

Tech Fashions: What’s in a name?

Tuesday, March 22nd, 2005

Dare Obasanjo complains that new names like SOA, AJAX, and REST have more to do with fashion than software. He’s right, but his posting might be missing the point.

There are two reasons that a fuzzy, general approach to things (as opposed to a concrete standard or application) gets a name:

  1. the approach represents something a lot of people are actually doing; or
  2. the approach represents something someone wants a lot of people to do.

The second reason has no real value except hype, and Dare and I would probably agree in condemning it as vacuous marketing drivel. The first reason, though, has some real value: it represents the moment of self-recognition for a community of practise, a group of people who discover that they’re thinking and working in the same way. No doubt, many woodworkers made fine furniture when they were all still called carpenters; however, some of them must once have realized that they were spending most of their time doing fine, detailed work rather than laying studs and joists, and that the skills and even tools they were using were different — calling those people joiners or cabinet makers gives us a way to recognize that community of practise and the distinctive approaches and skills that set it apart from other woodworkers. (Of course, the distinction almost certainly originated in a language other than modern English, but I don’t feel like researching the original terms right now.)

So it’s true, as I mentioned in an earlier posting, that AJAX is nothing new — in fact, leaving aside XMLHttpRequest altogether, AJAX is nothing more than old-fashioned client-server under a new guise, just as cabinet making is nothing more than fussy carpentry. It’s also true that people were doing precisely what is now called AJAX long before the name was invented. But both statements are missing the point: the name AJAX is catching on because it represents a problem-solving approach that a lot of people now like and either use or want to start using; it represents the emergence of a community, not a standard or a chunk of code. Ditto for REST (though in this case Dare is complaining about the popular usage, plain XML over HTTP, rather than the original meaning, the principle that every resource must be directly addressable). People posted online diaries and columns before we called them blogs; people built hypertext systems on the Internet before we called one the Web; people were using event-based parsing API’s long before they were called SAX-like. In each case, though, the arrival of a widely-accepted name helps us to pinpoint the movement when a social change — a fashion to use Dare’s own word — emerged in the technological community. These fashions are far more important than new standards or new code, at least when they represent genuine changes in people’s thinking.

So what about SOA? Does it represent another moment of self-realization among a group of people doing the same kind of thing, or is it an attempt to steer people the way someone wants them to go, a bit of vain marketing buzz? I’m still trying to decide.

AJAX as a privacy solution

Friday, March 18th, 2005

There’s a lot of noise about AJAX recently, ranging from positive to negative to what’s the big deal?

It’s true that architecturally, AJAX is nothing new — basically, it’s just the old, pre-Web client-server model wrapped up in the browser using Javascript and XML. It’s also true that people were doing this kind of thing with Java applets or DHTML back in the late 1990s, avoiding the need to install custom client software on every workstation. So what’s the big deal? Think back to the late 1990s — these applications were horribly unstable. First, they were rarely cross-platform, or even cross-version — you had to (say) be running exactly the right version of MSIE under Windows with the right DLLs, or exactly the right version of Netscape and Java, even to start up the apps, and then they generally crashed before too long anyway. Web developers are excited about AJAX now because applications like GMail are actually working on just about everyone’s computer (*nix/Windows/MacOS, MSIE/Firefox/Opera/Safari), and they almost never crash. New ideas aren’t worth much on the web; it’s stable, running, cross-platform implementations that count. We’ve never had good, stable, platform-independent client-server before, period.

Moving past the specific technologies, though, what are the advantages of abandoning our traditional thin-client web model and going back to client-server? One of the most interesting will be the ability to do information aggregation while preserving privacy. Imagine, for example, that I’d like to see a single, consolidated view of all my finances — my stocks, bonds, bank accounts, retirement savings, and credit cards. Using a thin-client approach, I have to give some web site, somewhere, the ability to access all of my private financial information for me; using a client-server approach, my browser itself could go out and retrieve the information separately from each institution and then aggregate it right on my screen. I have all the advantages of a single view, without giving up any personal information.

Privacy is going to be a bigger and bigger deal on the web over the next decade: as technology gets even better at violating it, governments will come under pressure to pass more and more legislation, limiting what corporations are allowed to ask for and do. AJAX in particular, and the client-server model in general, gives us one way to respect privacy without giving up the advantages of information aggregation.

The RESTafarians should be happy as well, since this involves using the browser as an XML+REST client.

Canadian Flag in CSS

Saturday, March 12th, 2005

Canadian flag in CSS (screenshot).

Via Anne van Kesteren (again), I have found a site with a pure-CSS rendition of the Canadian flag (the image here in my blog is a screenshot, not the live CSS). It’s a little squished, granted, but at least it’s the right way up.

Now, let’s see the XSL-FO version of the Canadian flag: any volunteers?

Attributes and Namespaces

Saturday, March 12th, 2005

Anne van Kesteren complains that the relationship between XML Namespaces and XML attributes bugs him, and I think that his annoyance might be justified. It’s been many years since we did the 1.0 Namespaces spec in the old XML working group, but as far as I can remember, the thinking that won out (not unanimously) was that attributes were similar to variables in programming languages: unqualified attributes were like automatic variables, scoped to a single element instead of a single function, while namespace-qualified attributes were like global variables:

int foo;

void
adjustFoo ()
{
    int bar = 3;
    foo = foo + bar;
}

The foo variable has a scope outside the adjustFoo function, while the bar variable does not. Similarly, in

<n:info xmlns:n="http://www.example.org/ns/n">
  <n:record n:foo="3" bar="4"/>
</n:info>

The n:foo attribute has a meaning independent of the meaning of the n:record element, while the bar attribute does not.

Sense or Nonsense?

Does that make sense to you? If not, don’t feel bad: drawing analogies between markup and programming code is a dubious undertaking at the best of times. Unlike the foo variable, for example, the n:foo attribute doesn’t have a global value space, only a global meaning.

I do think that the idea of globally-defined, namespace-qualified attributes (like rdf:about, xml:id, xlink:href, etc.) is a very useful one. We messed up, though, by adding this extra level of complexity with unqualifed attributes (they should have just inherited the parent element’s namespace) — in other words, we didn’t choose the simplest thing that could possibly work. It’s too late to change Namespaces now, though, and any attempts to codify or clarify things since Namespaces 1.0 seem only to increase the confusion.

Big, public REST application: Seniors Canada Online

Wednesday, March 9th, 2005

[Update: partial contact information at bottom.] Yesterday I found out about a major government XML+HTTP (i.e. REST) web application that has been open to the general public since October 2004 but was never formally announced — I’m posting about it here with permission from the federal department that’s hosting it.

The Seniors Canada Online web site is designed to provide amalgamated information for senior citizens from all levels of government — currently it contains seniors’ information from the Canadian federal government, the provincial and territorial governments, and the city of Brockville, but more municipalities and NGOs will likely be joining in the future. Instead of simply providing an HTML interface for human readers, however, the site’s maintainers decided to make information available via XML as well so that other jurisdictions (such as provinces and cities) could include the same seniors’ information in their web sites. In fact, since it’s wide open, anyone can experiment with using the XML data.

According to the developer, the implementation was trivial — the REST application shares its database and application logic with the HTML web site, so the XML part is just a thin view written on top of all that, running in parallel with the HTML view.

Simple Example

Currently, the REST interface is read-only, and all requests are HTTP GETs, so they are bookmarkable, cacheable, linkable, and all the other good stuff that comes with REST. Here’s a simple example that searches for the word “sport”:

http://www.seniors.gc.ca/servlet/SeniorsXMLSearch?search=sports

The result is an XML-encoded list of URLs and Dublin-Core-style metadata; here’s an example of one item in the result list:

<listing>
<realcount>1</realcount>
<offsetcount>1</offsetcount>
<referenceid>277103</referenceid>
<language>en</language>
<url>http://www.active2010.ca/index.cfm?fa=english.homepage</url>
<dctitle>ACTIVE2010</dctitle>
<priority></priority>
<dcdescription>ACTIVE2010 is a comprehensive strategy to increase participation in sport and physical activity throughout Ontario.</dcdescription>
<dcsource>ACTIVE2010</dcsource>
</listing>

Canada is a bilingual country, however, so you will reasonably expect that you could make the same query for French-language resources. Give it a try:

http://www.seniors.gc.ca/servlet/SeniorsXMLSearch?search=sports&lang=fr

Nuts and Bolts

I’m not going to describe the XML format here, since anyone who knows XML and Dublin Core will be able to puzzle it out in a few seconds.

Here are some request parameters that work with many of the REST URLs:

lang
“en” (the default) to request English-language results, or “fr” to request French-language results.
geo
An identifier from the coverage metadata table (see below) to restrict results to a specific area.
cat
An identifier from the category metadata table (see below) to restrict results to a specific hierarchical category.

Here are the GET URLs with any local request parameters:

http://www.seniors.gc.ca/servlet/SeniorsXMLDCCoverages
Get a listing of three-level coverage metadata (i.e. geographical locations). Use the request parameter dccoverageid instead of geo to restrict the results to a specific subset.
http://www.seniors.gc.ca/servlet/SeniorsXMLCategories
Get the a listing of three-level categorization metadata.
http://www.seniors.gc.ca/servlet/SeniorsXMLKeywords
Get an alphabetical listing of search keywords available. Use the request parameter letter to restrict the results to keywords beginning with a specific letter.
http://www.seniors.gc.ca/servlet/SeniorsXMLSearch
Get XML-encoded search results. Use the request parameter search to specify the search string, searchop to specify the search type (”all”, “or” or “exact”), and recfrom to specify the starting position in the results (defaults to 1).

For example, here is a list of French-language keywords beginning with “L”:

http://www.seniors.gc.ca/servlet/SeniorsXMLKeywords?lang=fr&letter=l

I’m not quite sure how the keywords relate to the search, but I’ll play around a bit and try to find out.

Update: Contact Information

After my posting, the government department that maintains this REST application has already started receiving enquiries from others considering the same thing. The site is maintained by Veterans Affairs Canada (VAC) on behalf of the Canadian Seniors Partnership (which involves multiple departments and levels of government). The technical contact for this project at VAC is Ron Broughton.

Cascading RSS

Wednesday, March 2nd, 2005

The idea of Cascading RSS (or aggregation aggregation) is so obvious that it has probably already been blogged to death or even implemented by well-known web sites; unfortunately, my short attention span ran out before I think up the right search words for Google, so I’ll pretend that it’s my own, original idea for now. We use RSS or Atom to tell people when a web resource has changed, but that can still involve polling dozens or hundreds of RSS files frequently. With only a few tiny tweaks, we could also use master RSS files to tell us when other RSS files have changed, cutting the polling by (potentially) orders of magnitude.

I can think of a couple of places where this approach could allow RSS into places where it hasn’t been able to go yet:

Information Management

A very enlightened company might realize that RSS gives it an excellent way to manage information from all of its divisions, branches, subsidiaries, partners, and so on. Everyone simply puts data (sales figures, inventory, projects, and so on) on the company intranet as XML data files (presumably with appropriate authentication and authorization requirements) and then uses RSS to announce when new information is available or old information has changed. If division X needs to monitor inventory data from division Y, it polls division Y’s inventory RSS file every 5 minutes to see if there’s anything new.

The problem is that the network will get messy if the company ends up with thousands of RSS files, and everyone is polling everyone else’s every 5 minutes, especially if some of them are on old, slow servers. To simplify things (and speed them up), the company could have one fast server that polls all the RSS files in the company and then produces its own RSS file with the most recent change dates for each one. Now, everyone can poll only that central server, but the divisions still own their own data. Of course, it would be possible to build this up in several cascading layers to avoid one RSS file with 1,000 entries.

Personal RSS

Sooner or later, we’ll have personal RSS for reporting information like credit card and bank transactions, as Tim Bray predicted almost two years ago. One of the biggest problems here, though, is that people might be reluctant to give personal passwords to online aggregators like Bloglines. People might, however, allow online aggregators to request the last-modified time of their credit card or bank feeds, and they could use these to build cascading RSS files, allowing users to reduce the amount of polling they have to do from home or on the road.

In other words, both advantages have to do with getting RSS into the business world (either B2B or B2C), not with improving the current blogosphere. I’ll look forward to finding out who has thought about this idea in more detail, or even implemented it.