(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for the 'web' Category

Strange web exploit attempt (?)

Monday, February 4th, 2008

In the search logs for OurAirports, I noticed a series of searches for URLs:

http://www.feliciano.de/Webgalerie/bilder/Italy/une/yiwul/
http://www.unduetretoccaate.it/codice/aseje/wocobo/
http://www.altaiseer-eg.com/ar/articles/jed/umut/

At first, I thought they might be a kind of link spam — some sites display recent searches — but when I checked one of the URLs, I found something totally unexpected:

<?php echo md5("just_a_test");?>

They’re all the same. This is almost certainly related to passwords: is there a known flaw in a PHP content-management system like Drupal, or in the PHP API for a search engine like Lucene, where this would do some damage, or is it just a test probing for weaknesses? Is the PHP code supposed to be served up literally like that, or should I be seeing the MD5 instead?

Is the problem Wikipedia, or David Megginson?

Wednesday, January 23rd, 2008

The Wikipedia article about me was vandalized yesterday (vandalized version) by someone from the IP address 24.225.66.95, which seems to be in or near Raleigh, North Carolina.

What should I do?

  1. Edit the article myself to remove the vandalism? — OK, that’s a really bad idea
  2. Go in anonymously and edit the article? — also a bad idea
  3. Rejoice in the fact that my article is important enough to be vandalized?
  4. Despair in the fact that my article is not important enough for anyone else to have noticed and fixed it?
  5. Reconcile myself to the idea that the edits are not vandalism at all, and I am, in truth, “a freaking looser who knows nothing” and “a noob”

I’m leaning towards #5, though I’m disappointed that kids these days seem to have forgotten how to swear properly: “a freaking loser”???

Google analytics for XML 2007

Monday, January 21st, 2008

I forgot that I’d enabled Google analytics for the XML 2007 web site. Even though the conference is long over, I though it would be interesting to look and see what some of the trends were from September 2007 to January 2008 (keeping in mind that these stats apply to the kind of web users interested in a tech conference, not to the web at large).

MacOS is still #3

Despite the halo effect from the iPod and the widespread use of Mac notebooks among speakers, MacOS still hasn’t managed to make much of a dent in the visitor logs:

  1. Windows: 80.70%
  2. Linux: 9.57%
  3. MacOS: 9.44%

If MacOS can’t beat Linux on the desktop, I don’t know if it has a bright future.

Internet Explorer below 50%

Firefox is still #2 behind MSIE, but for this crowd, the gap is small:

  1. MSIE: 49.61%
  2. Firefox: 41.14%
  3. Safari: 3.50%
  4. Mozilla: 3.22%
  5. Opera: 1.76%

If you’re designing or maintaining a web site with a tech audience, you’d better be testing on Firefox as well as MSIE.

Screen resolution and colour depth

I know that web designers like big layouts, but the sad fact remains that 1024×768 is still the most common resolution (and remember that the browser window may be much smaller than the screen):

  1. 1024×768: 28.32%
  2. 1280×1024: 25.84%
  3. 1280×800: 10.61%

A long tail of resolutions follows, but it’s worth noting that the classic 800×600 has only 1.96%. Better news comes from colour depth, where almost everyone has 16bpp or better:

  1. 32bpp: 80.29%
  2. 24bpp: 11.89%
  3. 16bpp: 7.37%

Traffic

Search engines, referrers, and direct access were all important traffic sources:

  1. Search engines: 36.77%
  2. Referring sites: 34.97%
  3. Direct traffic: 28.22%

Blogs did show up among the referring sites, but the biggest traffic producers were traditional links from partner organizations (other conferences, IDEAlliance itself, etc.) — these were also the stickiest, since most people coming from these links went on to read more than one page.

As far as search engines go, I was surprised to find that nothing really matters but Google (assuming that Google Analytics isn’t biasing the numbers):

  1. Google: 94.16%
  2. Yahoo!: 3.46%
  3. Live: 1.51%
  4. MSN: 0.45%

I knew that Yahoo! and MSN were behind in search, but I had no idea just how bad it was (at least in the tech crowd). More than half of the people who found the site via a search engine went on to read more than one page.

The top search phrases were rather dull and predictable:

  1. “xml 2007″: 28.50%
  2. “xml conference”: 8.22%
  3. “xml conference 2007″: 3.20%
  4. “xml conferences” 3.04%

And so on through a very long tail. Individual speakers’ names start appearing soon, but none with more than 10 searches. I trolled through the low-frequency search phrases for something funny (and maybe risque), but all I came up with was the number “736″, which resulted in three visits. I gave up trying to find the site in the Google results for that number. Does anyone really search for a single three-digit integer, and if so, how many pages of results will that person scroll through?

LAMP stack stability

Thursday, January 10th, 2008

I’m using a single dedicated server to host ourairports.com, megginson.com, and a couple of minor domains. OurAirports is a database-heavy application using (currently) a MySQL v.5 database hosted on the same server. I’ll offload the database to a separate server if traffic keeps increasing, but as long as I’m getting compliments from tech people for my fast response times (mainly thanks to MySQL’s built-in query caching), there’s no point paying for extra hardware.

Uptime

My ISP set up the server for me last summer with a bare-bones Ubuntu distro, then I installed the extra packages I needed using aptitude over ssh. Since then, I’ve done many Ubuntu in-place upgrades, rolled out hundreds of changes and upgrades to the web apps and dozens to the database schema (some very significant), and upgraded WordPress n-teen times. Check this out:

$ uptime
 13:08:31 up 175 days, 10:02,  1 user,  load average: 0.23, 0.06, 0.02

That’s right — since my ISP first set up the server with a basic Ubuntu system, I’ve never had to restart it. In fact, if Apache and mod_php (PHP5) had ‘uptime’ commands, they’d show almost the same amount of time, since I restarted them only to make configuration changes in the first few days of setting up the server (unless apt stopped them to install a newer version during one of my upgrades). I’ve restarted MySQL more recently, but again, only to experiment with configuration changes (especially for fulltext).

-1 for being cool, +10 for having a life

Using reliable old technologies like Linux, Apache, MySQL, and PHP doesn’t win any cool points, but it certainly makes maintaining a web server and its applications easy. I can go on vacation, for example, without worrying about being able to get online to fix or restart my server every couple of days. I don’t have to stay up until 3:00 am on Sunday night so that I can take the server offline to roll out new software versions or bug fixes (aptitude installs any security fixes in place). I spend lots of time with my family. I go to my kids’ school concerts. I learned banjo and mandolin (why not, since I have the free time?).

It’s the developer, not the language

And yes, my PHP web app is easy to maintain and extend, because I designed it to be that way (I can often implement, test and roll out new features in a matter of minutes, even when they require database schema changes) — it’s the developer, not the programming language, that determines the quality and maintainability of an app. A lot of newbies use PHP, so there’s a lot of bad PHP out there, but the same can be said for any language, even Ruby.

Social web sites: the new Proprietors?

Thursday, January 3rd, 2008

Image: Thomas Penn, second proprietor of Pennsylvania, not as nice as his dad William.

Almost a year ago, I wrote that Open data matters more than Open Source — it doesn’t matter (to you, the end user) whether a web site is using Open Source software or not, if they still keep your data locked up.

Here’s a nasty example: Robert Scoble has just had his Facebook account disabled for running a script to try to scrape his personal information off the site (since Facebook doesn’t provide him with any other way to get it).

I understand that Facebook needs to protect against malicious bots — and they might decide to restore his account once they know what Robert was actually trying to do (though for now all traces of him have vanished) — but do we really want to have hope for the good will of social sites and beg for our own data every time we want it? Are web site owners the new version of the Proprietors in the early American colonies, who can grant rights as favours when they see fit?

E-mail users fight back

Sunday, December 16th, 2007

A bit over a year ago, I ran into an unusual problem — for several days, I stopped receiving messages from a customer (in the middle of an important project), then I discovered the messages all hidden deep in my (gmail-hosted) spam box. Everything from that domain was suddenly being flagged as spam.

What happened? This customer had a large mailing list that they used for announcements, etc. My guess is that they sent out an announcement, a lot of other gmail-users flagged it as spam, and whatever weighting algorithm gmail uses tipped it over so that the messages were no longer considered legit by default. I was able to train gmail not to treat those messages as spam (for me, specifically), but it took a week or two before I could trust that some of them weren’t being sent to the spam box.

Hard-core spammers have always had to deal with this kind of thing, and they spend a lot of time trying to figure out a way around it. What’s happening now, though, is that companies with legit (or semi-legit) e-mail lists are also starting to get into trouble, because web-mail makes it possible for hundreds or thousands of people to get together and all vote your e-mail to be undesirable.

The letter of the law isn’t enough

That this isn’t a legal thing. It doesn’t matter at all if your e-mail list is opt-in or opt-out, if the “Send me announcements” checkbox was checked by default or not, or if the recipient originally clicked 10 screens of disclaimers before buying your product/signing up for your service. If they don’t like the e-mail you’re sending them, they’ll just click “Spam”, even if you had a legal right to send it; and if enough of them do it, the e-mail value of your domain fast approaches nil.

You’d better make sure that your mass e-mails have stuff that people actually want to read:

  • I don’t care that your company just won five awards — SPAM! (even if I said before that it was OK to send me e-mails)
  • I probably do care that someone wants to connect with me on a social networking site that I actually use.
  • I don’t care that a merchant I did business with from 2 years ago has a Christmas special on something I’d never buy — SPAM!.
  • I don’t care that your web site has a new look — SPAM!
  • I don’t care that your company has a training session coming up in Tulsa, since I don’t live anywhere near there (and probably wouldn’t go anyway) — SPAM!
  • Yes, I am interested in the tracking info for the books I just ordered. Thanks.
  • I do care that there’s a substantive change to a site that I use a lot.
  • I don’t care about a change on a site I haven’t logged into for a year — SPAM!.

And so on.

This new collaboration is an unexpected side-effect of the shift from desktop e-mail clients to web mail, and it would be foolish for companies not to pay attention. If you consider your domain name to be a valuable part of your corporate identity, don’t piss it away by sending out poorly-targeted mass e-mails, because no matter what prior permission you have, people now can … and will … punish you. After all, it takes only a single mouse click.

Amazon SimpleDB (not very Codd-y)

Friday, December 14th, 2007

This might be of interest:

Amazon SimpleDB

Amazon’s announcement

Dear AWS Developers,

This is a short note to let a subset of our most active developers know about an upcoming limited beta of our newest web service: Amazon SimpleDB, which is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity.

Were excited about this upcoming service and wanted to let you know about it as soon as possible. We anticipate beginning the limited beta in the next few weeks. In the meantime, you can read more about the service, and sign up to be notified when the limited beta program opens and a spot becomes available for you. To do so, simply click the “Sign Up For This Web Service” button on the web site below and we will record your contact information.

Not much there, though

It’s not SQL, or even SQL-like, though, supporting only the operators “=, !=, <, > <=, >=, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION”. I’m no relational expert, but I don’t think Codd would have been impressed. A distributed database is one of the big missing pieces from Amazon’s services, but I’m not sure if this will be it.

How to spend all your free money

Tuesday, November 27th, 2007

Update: the site shopping cart is broken, and doesn’t properly remove items from the total owing — too bad.

Here’s one easy way: via TechCrunch, Deutsche Grammophon, the gold standard in renaissance/ baroque/ classical/ romantic/ orchestral/ opera/ etc. music (often confusingly referred to collectively as “classical”, roughly equivalent calling all popular music since 1890 “rap”), will start selling their catalogue as unprotected MP3s at midnight German time tonight (6:00 pm in New York City) at their new site dgwebshop.com.

As a teenager in the late 1970s, I used to visit the House of Sound in Kingston (Canada), where they had thousands of DG records — probably most of the catalogue — packed in tight on on shelves lining a wall of the store. I couldn’t always afford them, but I loved being able just to pull them out and take a look at the covers of the different famous recordings. These days, the so-called classical music section of any but a couple of specialized stores in big cities like New York or London have maybe one or two rows of worthless classical-pop compilations hidden behind the DVDs of TV series nobody watched in the 1980s — no wonder people don’t shop at record stores any more.

We tech types have been claiming for a while that music companies could make more money selling unprotected digital music, so here’s the test. I plan to give them a lot of my own money if the site actually works, though I should note a couple of caveats:

  1. Many current DG buyers are audiophiles who won’t be satisfied with the sound quality of MP3s (which are optimized more for boom-boom music), so this will probably open a new market for DG rather than leaching their current one.
  2. DG’s market is mostly affulent people outside the intense social environment of high school or university, so people will be less likely to share these MP3s — and even if they do, it will probably just act as a promo for the higher quality recordings.

I hope the site can handle the traffic. Rock on, Deutsche Grammophon!

First looks at OpenSocial: part 4 (content for persistence data)

Thursday, November 8th, 2007

Earlier postings:

I didn’t have time to look at the OpenSocial API yesterday, so I’m continuing today looking at the data format for the last major area, persistence data.

A vision thing?

My first impression of the persistence data API is that it doesn’t belong in v.1 of OpenSocial — unlike the member/friends and activities APIs, it doesn’t seem to be solving a core problem for social-site app writers (I have no way to get at a friends list except through the site’s API, but I can store my own data, thanks). I can see only two reasons that it’s here, neither of them very admirable:

  1. Because someone has a vision of a world where people can write social apps that run entirely on the client side with HTML/CSS/JavaScript, using only resources provided by the social site itself.
  2. Because the GData group in Google co-opted the designers to promote GData in the spec, the same way that the Blu-Ray group in Sony co-opted the PS3 to advance their agenda.

I’ll give Google the benefit of a doubt and assume that it’s a vision thing, but that’s still very unhealthy — specs should solve the real problems of the present, not the speculative problems of the future, especially bare-bones v.1 specs like this.

The format

Now that that’s out of my system, let’s take a look at what you get back from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global (and its many variants). From the spec, here’s what you get when you request a single piece of information from the API:

<entry xmlns='http://www.w3.org/2005/Atom'>
<title type="text">somekey</title>
<content type="text">somevalue</content>
</entry>

Or, in non-XML terms,

$globals{'somekey'} = 'somevalue'

That comes from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global/somekey which requests a single value. Using the first URL mentioned gets you a feed of name=value pairs, sort-of like an associative array:

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global</id>
<updated>2007-10-30T20:53:20.086Z</updated>
<title>Persistence</title>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<link rel='self' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<generator version='1.0' uri='/feeds'>Orkut</generator>
<entry>
  <id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey</id>
  <title>somekey</title>
  <content>somevalue</content>
  <link rel='self' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/>
  <link rel='edit' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/>
</entry>
</feed>

There’s only one entry in the spec’s example, but there could be a lot more. Basically, this is the equivalent of something like

$globals = { 'somekey' => 'somevalue' }

The comparison isn’t quite fair, because there are also some links explaining what you can do to modify this information, etc., but it still seems like a lot of markup for not much value (pun intended). I wonder if this would be a good place to use JSON instead of Atom+XML? After all, the serious apps will be doing their own data storage anyway, and the client-only apps will probably use a JavaScript API that hides the Atom from the developer.

Scope

As hinted at, at least, in my URL posting, there are several different data scopes:

  • All (global) data for this application on this social site (equivalent of static global variables?).
  • Data for this instance of the application only (equivalent of local or object variables?).
  • Data for this user in this application (i.e. your own profile info about the user, available every time your app runs).
  • Data for this user’s friends in this application (i.e. your own profile info about the friends, available every time your app runs).

It seems like a reasonable division of scope, especially since the app can’t get anything out that it didn’t put in.

Final thought (for now)

I do believe that, eventually, many web apps will be about to outsource storage as a service instead of having to maintain their own databases and database clusters — in fact, Amazon’s S3 and its competitors already provide precisely this service, though they might not be optimized for a lot of name=value look ups. I’m surprised though, that this could be considered a key feature of a social app spec, when so much else was left out.

First looks at OpenSocial: part 3 (content for activities)

Tuesday, November 6th, 2007

Earlier postings:

This is the third part of a series where I’m working through the OpenSocial specs as I write — that means that I haven’t preread and predigested this stuff, but am creating a record of how I approach a new set of specifications and try to understand them. First, I looked at the basic URLs for data access, since they provide the best high-level description of the OpenSocial capabilities (read-only info on members and their friends, read/write info on a member’s activity notifications, and a simple data-storage API). Next, I looked at the data format for the most important content, the member profile and friends lists. This time, I’ll look at the format for activity notifications, which is also based on the Atom syndication format.

Activities

To get a list of a member’s recent activities (uploaded a photo, poked a friend, got a new job, or stuff like that, I guess) an OpenSocial application uses the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId} according to the specs, though I suspect that might be intended to be http://{DOMAIN}/feeds/activities/user/{userId} for consistency with the other data-access URLs — it’s hard to be certain. The host should return an Atom feed of activities, like this template example lifted from the spec:

<atom:feed xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
    xmlns:gact='http://schemas.google.com/activities/2007'>
  <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID</atom:id>
  <atom:updated>1970-01-01T00:00:00.000Z</atom:updated>
  <atom:category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/activities/2007#activity'/>
  <atom:title>Feed title</atom:title>
  <atom:link rel='alternate' type='text/html' href='http://sourceID.com/123'/>
  <atom:link rel='http://schemas.google.com/g/2005#feed'
    type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:link rel='http://schemas.google.com/g/2005#post'
    type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:link rel='self' type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:author>
    <atom:name>unknown</atom:name>
  </atom:author>
  <openSearch:totalResults>1</openSearch:totalResults>
  <openSearch:startIndex>1</openSearch:startIndex>
  <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
  <atom:entry>
    <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1</atom:id>
    <atom:updated>2007-10-27T19:41:51.574Z</atom:updated>
    <atom:category scheme='http://schemas.google.com/g/2005#kind'
      term='http://schemas.google.com/activities/2007#activity'/>
    <atom:title>Activity title</atom:title>
    <atom:link rel='self' type='application/atom+xml'
      href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/>
    <atom:link rel='edit' type='application/atom+xml'
      href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/>
    <gact:received>2007-10-27T19:41:51.478Z</gact:received>
  </atom:entry>
</atom:feed>

There’s a lot of front-matter in this, so it’s hard to realize at first glance that it lists only a single activity (in the atom:entry element near the bottom). The entry itself uses mostly standard Atom elements, except for one extension element from the Google activities namespace, giving the date that the notification was received (received date is also important in the news industry, so maybe this is something Atom needs to add to its core). Other than that, the activity itself is easy enough to understand: it has a unique id, a couple of dates, a title (which seems also to serve as the sole description), and web links for viewing and editing.

Unlike the member and friends info, which was read-only, OpenSocial allows apps to post new activities and edit or delete existing ones, but only in what is called a “source-level feed” — that’s a list of a user’s activities limited to a single source (which, I assume, is the application), using the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId} (which, again, may be a typo with an extra “activities” path element at the beginning). In other words, an application can read activities from any source, but it can mess around only with the ones it created. I’m not sure yet how the application knows its source id, or how the host verifies the app’s identity, but I’ll be looking at those issues in a later posting.

For members and friends, I noted that the spec’s example included the OpenSearch namespace but didn’t use it. This time, the namespace is used for the totalResults, startIndex, and itemsPerPage elements. These suggest that it’s possible to page through long lists of activities, though I could find no mention of that in the spec. Again, I don’t know much about Atom, but I think that Atom-blessed way to handle paging would involve using “first”, “next”, and “last” links.

Still learning

I’m not deeply into social networking myself — with my adolescent children using Facebook, my joining that site would be like showing up in a leather jacket at their highschool dance, and 99% of the time I spend on the more grown-up sites like Plaxo, LinkedIn, and Dopplr is used approving connection requests. As a result, I wasn’t aware of how important activity notifications were for a social-networking site.

Whatever happens with OpenSocial, I have found it to be a good architectural introduction to social networking in 2007, though I suspect that the next thing I’m going to look at — the persistence data API — has more to do with Google’s business requirements than with social networking itself.

First looks at OpenSocial: part 2 (content for members and friends)

Monday, November 5th, 2007

See also First looks at OpenSocial: part 1 (URLs)

This is the second part of a series of postings describing how I’m trying to understand the technical specs for the new Google-led OpenSocial initiative. In the first part, I cut down through all the text in the specs to get at the basic URLs, which represent the raw skeleton of services defined by the spec. This time, I’m going to look at the data formats, starting with the real bread and butter of social networking, people and their friends.

The atomic age

The content format for OpenSocial is always the Atom syndication format, a competitor to RSS for syndicating blogs and other similar information. I haven’t spent very much time with Atom yet — I appreciate that it’s more fully-specified than RSS 2.0, but I already know RSS and have run into no practical problems with it (through I’m aware of the potential ones) — so I’m probably not going to notice if or where the OpenSocial specs are violating the spirit or even letter of the Atom specs. I’ve occasionally seen complaints from Atom-heads about Atom-compliance in Google’s GData, and assume those apply to OpenSocial as well.

People

When you ask an OpenSocial provider for information about a member (using the URL pattern http://{DOMAIN}/feeds/people/{userId}), the spec says you get back something like this, assuming you’re authorized to make the request (lifted straight from the spec, and not namespace-compliant):

<entry xmlns='http://www.w3.org/2005/Atom'
  xmlns:georss='http://www.georss.org/georss'
  xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569</id>
  <updated>2007-10-28T14:01:29.948-07:00</updated>
  <title>Elizabeth Bennet</title>
  <link rel='thumbnail' type='image/*'
    href='http://img1.orkut.com/images/small/1193601584/115566312.jpg'/>
  <link rel='alternate' type='text/html'
    href='http://orkut.com/Profile.aspx?uid=17583631990196664929'/>
  <link rel='self' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569'/>
  <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
      <gml:pos>51.668674 -0.066235</gml:pos>
    </gml:Point>
  </georss:where>
  <gd:extendedProperty name='lang' value='en-US'/>
  <gd:postalAddress/>
</entry>

Aside from the fact that the tech writer is a Jane Austen fan, a couple of other points jump out:

  1. In addition to the Atom namespace, they’re using the GeoRSS namespace to provide lat/lon information (so that you could place the person on a map, for example), the GML namespace (which the example forgets to declare), and the GData namespace for generally unimportant information like the postal address (who gives that out?).
  2. The two most important pieces of information seem to be the thumbnail picture/buddy icon and the member’s HTML profile page, both of which are the targets of typed links.

Of course, in reality, the most important information about a member is the member’s friends list, but that information comes through a separate URL, http://{DOMAIN}/feeds/people/{userId}/friends.

Friends

This example is also lifted from the spec (and is still missing the declaration for the GML namespace):

<feed xmlns='http://www.w3.org/2005/Atom'
  xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
  xmlns:georss='http://www.georss.org/georss'
  xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends</id>
  <updated>2007-10-28T21:01:03.690Z</updated>
  <title>Friends</title>
  <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/>
  <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/>
  <author><name>Elizabeth Bennet</name></author>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/02938391851054991972</id>
    <updated>2007-10-28T14:01:03.690-07:00</updated>
    <title>Jane Bennet</title>
    <link rel='thumbnail' type='image/*' href='http://img1.orkut.com/images/small/null'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=574036770800045389'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/02938391851054991972'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>51.668674 -0.066235</gml:pos></gml:Point></georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/12490088926525765025</id>
    <updated>2007-10-28T14:01:03.691-07:00</updated>
    <title>Charlotte Lucas</title>
    <link rel='thumbnail' type='image/*' href='http://img2.orkut.com/images/small/null'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=5799256900854924919'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/12490088926525765025'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>0.0 0.0</gml:pos></gml:Point></georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/15827776984733875930</id>
    <updated>2007-10-28T14:01:03.692-07:00</updated>
    <title>Fitzwilliam Darcy</title>
    <link rel='thumbnail' type='image/*' href='http://img3.orkut.com/images/small/1193603277/115555466.jpg'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=14256507824223085777'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/15827776984733875930'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>53.017016 -1.424363</gml:pos></gml:Point>
    </georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
</feed>

Again, very straight-forward, if not namespace-compliant (due to the missing GML namespace declaration). There’s also a declaration of an OpenSearch namespace URI that’s never used, suggesting a feature that was removed in haste just before release. The friends list is simply a feed of person entries, just like the single entry returned for the member query, with a title, date, etc. at the top. Note that you always get the full friends list — there’s no support for filtering — so this might not be fun for someone who has 10,000+ friends.

What I don’t see, either in the example or the spec, is a way to provide typed relationships, like “spouse”, “colleague”, “classmate”, etc. I don’t know how important that is to application developers — simply getting the list of friends is probably the most important thing.

First looks at OpenSocial: part 1 (URLs)

Saturday, November 3rd, 2007

In a year or two, we’ll know whether the Google-lead OpenSocial initiative was a turning point in the social web or just a weak shot fired across Facebook’s bow. In the meantime, I think it’s worth taking some time to digest the API docs, which are still pretty rough.

I don’t know what I’m talking about…

Instead of reading and understanding everything first and then posting from a (virtual) podium, I’m going to try to work out my own understanding of the APIs right here on the web. That means that I’ll be asking questions that I’ll find the answers for later, that I’ll be making incorrect assumptions, and that I’ll be deferring hard stuff (like authorization/authentication) until I understand the basics. This is not, then, an OpenSocial primer by any stretch, since I don’t actually know what I’m talking about, but it might be useful as a snapshot of how a developer approaches a new API.

It’s the URLs, stupid

OpenSocial is designed so that any app can get information from any site as long as it has permission (I’ll figure out how that works later) — to accomplish that, it uses standard URL patterns on every site returning Atom entries and feeds. So after digging through a lot of crackerjack, I finally found the prize buried in the docs. Here are the URL patterns:

Information about a person
http://{DOMAIN}/feeds/people/{userId}
GET only.
List of a person’s friends
http://{DOMAIN}/feeds/people/{userId}/friends
GET only.
List of a person’s activities
(Wrong?) http://{DOMAIN}/activities/feeds/activities/user/{userId}
GET only
List of a person’s activities from a single source
(Wrong?) http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId}
GET, POST, PUT, DELETE
Application-global data
http://{DOMAIN}/feeds/apps/{appId}/persistence/global
http://{DOMAIN}/feeds/apps/{appId}/persistence/global/{partKey}
GET, POST, PUT, DELETE
Per-instance data
http://{DOMAIN}/feeds/apps/{appId}/persistence/{userId}/instance/{instanceId}/{partKey}
GET, POST, PUT, DELETE
Shared user data
http://{DOMAIN}/feeds/apps/{appID}/persistence/{userId}/shared/{partKey}
GET, POST, PUT, DELETE
Friends’ shared data
http://{DOMAIN}/feeds/apps/{appID}/persistence/{userId}/friends
GET, POST, PUT, DELETE

Did I miss anything?

Listing all the URLs together like this, instead of spreading them out over pages and pages of docs, is the best way to start with a REST API. For example, you can tell at a glance what what kind of information is available and what is and isn’t writable (you can’t add new friends for a user, but you can add a new activity). Sure, Javascript libraries, etc. are nice, but the class hierarchies can obscure how simple the underlying data actually is (or “are”, if you’ve studied Latin). You can also spot possible typos in the docs — for example, what are the odds that the activity URLs are really supposed to start with “/activities/feeds/” when everything else starts with “/feeds/”? It could be poor, inconsistent design, but I suspect cut-and-paste errors.

Next time: content

The next time I get around to looking at OpenSocial, I’ll try to figure out the formats — it shouldn’t be too hard, since they’re all Atom entries or feeds. Then I’ll get into messier stuff like auth/auth, and I may eventually try adding OpenSocial support to my OurAirports hobby site, though it doesn’t even support friends yet.

Two problems with Google Maps for aviation

Wednesday, August 29th, 2007

I love Google Maps and their API, and am using it extensively in my new web site OurAirports. However, there are two problems that keep coming up for using Google Maps with an aviation application:

[Diagram of Mercator projection]

  1. Google Maps uses a Mercator Projection, grossly distorting the northern and southern parts of the world, and cutting off the area near the poles so that a few of the Antarctic airports don’t show up on my maps at all. I can understand the reasons for their choice, with simple panning and tile paging and a rectangular area, but it can make things look pretty silly sometimes (such as Greenland and Africa appearing the same size).

  2. Google Maps does not provide an API call to draw a great-circle path. This seems to me to be almost a no-brainer, and it’s especially important in a Mercator projection, where the apparently straight paths drawn by the API are anything but (especially east-west). After messing with some out-of-date third-party libraries, I finally found some JavaScript at one site that does a good job on efficient, approximate great-circle paths, and am waiting to hear from the author about terms for reuse. Google might want to just go ahead and add this, though.

[Diagram of Mercator projection]

Aviation charts mostly use a Lambert conformal conic projection, which ensures that distances are preserved (any two points the same distance apart on the chart are the same distance apart in the real world); however, by definition this projection can’t show more than half the world at once, and generally shows much less than that, so it wouldn’t work for something like Google Maps.

[not] Protecting web sites and services from DNS rebinding attacks

Wednesday, August 1st, 2007

Update: Nope, my solution won’t work. As Christian Matthies points out in the comments, it is possible to spoof the HTTP Host header as well (his link in the comment is broken because of an extra comma, but this one works). As a kludge, browsers could be modified to prevent Host header spoofing, but (a) it would take a long time to deploy to the world at large, and (b) it would be only a bandaid for a much bigger problem.

Summary: While there’s no way to protect browsers against the DNS rebinding attack, you can protect web sites and web services by forcing them to check the HTTP Host header with every request. This is easy to do for RESTful services going through a regular web server like Apache — you get it by default with virtual hosts — but might be trickier for WS-* services.

If you or your company is using HTTP-based web services (either WS-* or REST), you might be in trouble — a new exploit allows a web site from outside your firewall to use a web browser as a proxy to read any web site or service inside your firewall.

Artur Bergman at O’Reilly has a posting on the DNS rebinding (aka anti-DNS-pinning) attack that works against all major browsers, including all versions of Firefox and MSIE. There’s no obvious general fix for this, though there’s a Firefox extension that helps a tiny bit.

The attack

In a DNS-rebinding attack, the attacker is able to force your browser to read data from any IP address that your browser has access to, even if you’re behind a router/firewall, by changing the IP address associated with a domain name you’ve connected to. That means that given an IP address, an outside attacker can read your local website (at 127.0.0.1), anything behind your corporate firewall (such as an Intranet accounting page or a web service), or — I think (haven’t tested yet) — a website that you’re logged into using a cookie (HTTP authentication will force a popup, since the browser will see a different domain name, even if you’re logged into the site in another tab/window). If you run a local web server on your computer (say, at 127.0.0.1), you can go to http://www.jumperz.net/index.php?i=2&a=1&b=7, type in the local address, and see jumperz.net use the exploit display the source of your home page.

The defence

There’s no way to protect the browser yet, but you can protect your HTTP-based sites and services from this attack very easily — in fact, many sites on the web are already unknowingly protected, though I don’t know if most enterprise web services are.

The trick is in the HTTP Host header. While the DNS rebinding attack can associate a new IP address with a hostname, it cannot change the hostname itself, so the browser will still send the original hostname to the new host. Nearly all shared-hosting servers — and many servers at dedicated hosts as well — will check the Host header to decide what pages to serve out. As long as the site does something harmless when it gets an unrecognized hostname (such as returning a “501 Not implemented” HTTP status code), the site will be safe the attack. In Apache, for example, you use the ServerName directive for each virtual host, and just make sure that there’s a default virtual host that returns an error or at least does nothing harmful.

For Web Services, the same thing applies. It’s often tempting to use IP addresses instead of hostnames for web services (including RESTful services), especially during development, but doing so opens you right up to a DNS-rebinding attack, which could be very harmful if you’re using real data for development and testing. To protect your HTTP-based services from this attack, you need to make sure that every web service is accessed via a hostname rather than a raw IP address, and that every service checks its hostname. For RESTful services, this is trivially easy (since you’re probably going through Apache or something similar anyway, just as with a web site); for WS-* services, I don’t know the implementations well enough to be sure, but it should be possible to force them to check the Host header somehow.

Even if you’re not building web services, managing an enterprise intranet, or running a public web site, don’t forget to protect the web server on your local computer, if you have one.

Three simple tips for LAMP web site developers

Saturday, July 21st, 2007

You’ve learned to write some basic HTML, CSS, PHP/Python/Perl and SQL, found a hosting service, and are ready to create your first LAMP web application. You’ve already read a bit about security (you know always to escape user-supplied parameters, etc.). Here are a three very simple tips that will help you along right at the start, without getting caught up in religious wars about frameworks, MVC, REST, abstraction, object orientation, etc.:

  1. Keep all the database code together. Put all your database calls into a single source file if you can — functions like mysqli_query (PHP) should never appear anywhere else but in this file — and create neutral functions like get_member() or delete_cart() for the rest of your code to call. The reason for this is not so that you can switch databases in the future (that’s easy enough to fix), but so that you can easily do a search/replace when you rename or modify tables. If all your database code is in the same place, your application will be orders of magnitude easier to maintain and upgrade a few months from now. Seriously.

  2. Make an extra database for junk. If your hosting account allows more than one database, create at least two, say “foo” and “foo_cache” — put all the tables you need to back up into the first one, and all the stuff you don’t need to back up (views, caching tables, session states, etc.) into the second. Write a SQL script to automatically regenerate any required tables in “foo_cache” when you restore. That way, you won’t waste time and bandwidth every day backing up megabytes or gigabytes of stuff you don’t need and can easily regenerate.

  3. Make GET harmless. If you use HTTP GET (e.g. $_GET in PHP) to do things like deleting or modifying records, bad things will happen to your application — search engines will start randomly changing your database by following links (robots.txt might not be enough to protect you), browsers will delete records by trying to precache pages, etc. Always use POST (normally from a form button) for anything that can make a change. More here.

My biggest problem with Wikipedia

Friday, June 22nd, 2007


Summary: You can’t partition a web site’s users into discrete groups by language.

I don’t worry much about Wikipedia’s objectivity or reliability — no sources (especially not newspapers or Britannica) are objective or reliable, and at least Wikipedia preserves its conflicts and controversies in comments and edit history — but I do have one bit problem with the project: WHY THE *^%*& DON”T THEY HAVE SINGLE-SIGNON?

I usually edit in English, but I can also make at least minor contributions to Wikipedia in French, German, Spanish, Italian, and Latin, and sometimes also contribute to Wikimedia. Every one of those requires me to create a separate account! It is absurd that my username and password for en.wikipedia.org won’t work for fr.wikipedia.org.

Don’t make this mistake with your own webapps, kids. Lots of people in the world are comfortable working in more than one language, even if they’re not fluent in all. It’s good to make a site available in more than one language, but don’t expect language to partition your users into discrete groups. Don’t lock them into a single language with a cookie, or limit their accounts to one language domain — multilingualism is extremely common around the world, even in the U.S. (how many American users would want to be able to use a site in English and Spanish if given the opportunity?)

REST, the Lost Update Problem, and the Sneakernet Test

Saturday, June 9th, 2007

Dare Obasanjo is giving a bit of pushback on the Atom Publishing Protocol, but the part that caught my attention was the section on the Lost Update Problem. This doesn’t have to do with REST per se as much as with the choice not to use resource locking, but since REST people tend to like their protocols lightweight, the odds are that you won’t see exclusive locks on RESTful resources all that often (it also applies to some kinds of POST updates as well as PUT).

How to lose a REST update

  • I check out a resource about “John Smith” (as a web form or an XML document, for example), and correct the first name field to “Jon”.
  • You check out the same resource, and correct the last name field to “Smyth”.
  • I check in my changes.
  • You check in your changes.

You have corrected the last name to “Smyth”, but have inadvertently overwritten my correction of the first name with the old value “John”, because you never saw my update.

Detection, not avoidance

Without exclusive locks, there’s no way to avoid this problem, but it is possible to detect it. What happens after detection depends on the application — if it’s interactive, for example, you might redisplay the form with both versions side by side. I don’t mean to diminish the difficulty of dealing with check-in conflicts and merges — it’s a brutally hard problem — but it’s one that you’ll have whenever you chose not to use exclusive resource locks (and even with resource locks, the problem still comes if someone’s lock expires or is overridden). Managing multi-user resource locks properly can require a lot of extra infrastructure, and they have all kinds of other problems (ask an enterprise developer about the stale lock problem), so there are often good reasons to avoid them.

State goes in the resource, not the HTTP header

Dare points to an old W3C doc that talks about doing lost-update detection using all kinds of HTTP-header magic, requiring built-in support in the client (such as a web browser). That doesn’t make sense to me. A better alternative is to include version information directly in the resource itself. For example, if I check out the record as XML, why not just send me something like this?

<record version="18">
  <given-name>John</given-name>
  <family-name>Smith</family-name>
</record>

If I check it out as an HTML form, my browser should get something like this:

<form method="post" action="/actions/update">
  <div>
    <input type="hidden" name="version" value="18" />
    Given name: <input name="given-name" value="John" />
    Family name: <input name="family-name" value="Smith" />
    <button>Save changes</button>
  </div>
</form>

When you check out the resource, you’ll also get version 18. However, when I check in my changes (using PUT or POST), the server will bump the resource version to 19. When you try to check in your copy (still at version 18), the server will detect the conflict and reject the check-in. Again, what happens after that depends on your application.

The Sneakernet Test

I think that this is far better than the old W3C solution, because it (1) it’s already compatible with existing browsers, and (2) it passes what I call the Sneakernet Test — I can take a copy of the XML (or JSON, or CSV, or whatever) version of the resource to a machine that’s not connected to the net, edit it (say, on the plane), then check it back in from a different computer — I can copy it onto a USB stick, take it to the beach, edit it on my laptop, then take it back to work and check it back in — all the state is in the resource, not hidden away in cryptic HTTP headers.

By the way, if you don’t trust programmers to be honest when designing their clients, you can use a non-serial, pseudo-random version so that they can’t just guess the next version and avoid the merge problem, but serial version numbers should be fine most of the time.

Country codes: a spreadsheet-sharing experiment

Monday, April 23rd, 2007

I’ve just uploaded a spreadsheet of country codes (plain HTML view) to Google documents and spreadsheets. The spreadsheet includes ISO 3166-1 alpha-2, alpha-3, and numeric codes together with FIPS 10-4 codes, and the country names as provided in each spec. I originally created it to help me map FIPS to ISO codes from some air navigation data.

I’m interested in online data collaboration — what tools people need, how it will work in practice, etc. — and this seems like an easy way to experiment. If you’d like to make any corrections to the spreadsheet, let me know, and I’ll add you as a collaborator. I might also upload some spreadsheets of general geodata in the future, where there’s more opportunity for contributions.

Open Data matters more than Open Source

Wednesday, March 28th, 2007

Dare Obasanjo just put up a posting with the title Open Source is Dead. Dare does happen to be a Microsoft employee, but his posting is none of the standard anti-Linux/OpenOffice/Apache/Firefox FUD. Instead, he voices a question that’s been floating around for a while:

… how much value do you think there is to be had from a snapshot of the source code for eBay or Facebook being made available? This is one area where Open Source offers no solution to the problem of vendor lock-in.

Let me out!!!

In other words, as the Web replaces Microsoft Windows as the world’s favorite desktop/laptop software platform (it may be there already), what good is Open Source to ordinary computer user? Even if a web site happens to be built on Open Source software (like the LAMP stack), I’m still locked in:

  • How can I move my address book and archived e-mail from Hotmail to Yahoo or GMail?
  • How can I move my blog (with all postings and comments) from Blogger to Bloglines or WordPress?
  • How can someone move her contact list and comments from MySpace to Facebook?
  • How can a buyer in Yahoo’s auction thingy verify my reputation on eBay?
  • How can I move my old flight plans from Aeroplanner to FBOWeb?
  • How can I move my sales contacts and data from Salesforce.com to Highrise?
  • How can I move my pictures with their tags from Flickr to Smugmug?

A crack of light under the door

These are huge problems, and the solution is probably going to have a lot more to do with Open Data than with Open Source. There are already a couple of minor successes:

  • Blog reading sites almost universally support OPML import and export, so that you can save the list of blogs you read from one site and move it to another.
  • Online wordprocessors and spreadsheets, of course, support the Microsoft Office formats and/or the OpenDocument formats and/or RTF and CSV.

That’s not much, though. Open Source (and its predecessor buzzword, Free Software) have been very important over the past couple of decades, giving us choices beyond the Microsoft/Apple duopoly that chained our desktops (and forcing the duopoly to open up a lot) and smashing the big-iron vendor cartel that owned our servers, but as the world shifts from desktop to web-hosted software, it can’t take us much further.

REST: the quick pitch

Thursday, February 15th, 2007

Now that the Java world is noticing REST, the low-pain alternative to RPC standards like WS-*, people are starting to blog about it again. Gossip with other IT folks also tells me that people’s customers are actually asking for REST explicitly (rather than having to be convinced to use it). With that in mind, I’m going to try to explain what I think matters about REST, and what you can safely ignore.

The elevator pitch

With REST, every piece of information has its own URL.

If you just do that and nothing else, you’ve got 90%+ of REST’s benefits right off the bat. You can cache, bookmark, index, and link your information into a giant, well, web. It works — you’re reading this, after all, aren’t you? Betcha got here by following a link somewhere, not by parsing a WSDL to find what ports and services were available.

Real best practices

If you want to do REST well (rather than just doing REST), you can spend 2-3 minutes after your elevator ride learning a few very simple best practices to get most of the remaining 10% of REST’s benefits:

Use HTTP POST to update information. Here’s the simple rule: GET to read, POST to change. That way, no body deletes or modifies something by accident when trying to read it.

Make sure your information contains links (URLs) for retrieving related information. That’s how search engines index the web, and it can work for other kinds of information (XML, PDF, JSON, etc.) as well. Once you have one thing, you can follow links to find just about everything else (assuming that you understand the file format).

Try to avoid request parameters (the stuff after the question mark). It’s much better to have a URL like

http://www.example.org/systems/foo/components/bar/

than

http://www.example.org/get-component.asp?system=foo&component=bar

Search engines are more likely to index it, you’re less likely to end up with duplicates in caches and hash tables (e.g. if someone lists the request parameters in a different order), URLs won’t change when you refactor your code or switch to a different web framework, and you can always switch to static, pregenerated files for efficiency if you want to. Exceptions: searches (http://www.example.org/search?q=foo) and paging through long lists (http://www.example.org/systems/?start=1000&max=200) — in both of these cases, it’s really OK to use the request parameters instead of tying yourself in a knot trying to avoid them.

Avoid scripting-language file extensions. If your URLs end with “.php”, “.asp”, “.jsp”, “.pl”, “.py”, etc., (a) you’re telling every cracker in the world what exploits to use against you, and (b) the URLs will change when your code does. Use Apache mod-rewrite or equivalent to make your resources look like static files, ending in “.html”, “.xml”, etc.

Avoid cookies and URL rewriting. Well, maybe you can’t, but the idea of REST is that the state is in the thing the server has returned to you (an HTML or XML file, for example) rather than in a session object on the server. This can be tricky with authentication, so you won’t always pull it off, but HTTP authentication (which doesn’t require cookies or session IDs tacked onto URLs) will work surprisingly often. Do what you have to do to make your app work, but don’t use sessions just because your web framework tells you to (they also tie up a lot of resources on your server).

Speculative stuff (skip this)

The strength of REST is that it’s been proven through almost two decades of use on the Web, but not everything that some of the hard-core RESTafarians (and others) try to make us do has been part of that trial. Stop reading now if you just want to go ahead and do something useful with REST. Really, stop! Some of this stuff is moderately interesting, but it won’t really help you, and will probably just mess up your project, or at least make it slower and more expensive.

[maybe some day] Use HTTP PUT to create a resource, and DELETE to get rid of one. These sound like great ideas, and they add a nice symmetry to REST, but they’re just not used enough for us to know if they’d really work on a web scale, and firewalls often block them anyway. In real-life REST applications, rightly or wrongly, people just use POST for creation, modification, and deletion. It’s not as elegant, but we know it works.

[don't bother] Use URLs to point to resources rather than representations. Huh? OK, a resource is a sort-of Platonic ideal of something (e.g. “a picture of Cairo”), while a representation is the resource’s physical manifestation (e.g. “an 800×600 24-bit RGB picture of Cairo in JPEG format”). Yes, as you’d guess, it was people with or working on Ph.D.’s who thought of that. For a long time, the W3C pushed the idea of URLs like “http://www.example.org/pics/cairo” instead of “http://www.example.org/pics/cairo.jpg“, under the assumption that web clients and servers could use content negotiation to decide on the best format to deliver. I guess that people hated the fact that HTTP was so simple, and wanted to find ways to make it more complicated. Fortunately, there were very few nibbles, and this is not a common practice on the web. Screw Plato! Viva materialism! Go ahead and put “.xml” at the end of your URLs.

[blech] Use URNs instead of URLs. I think even the hard-core URN lovers have given up on this now — it’s precisely the kind of excessive abstraction that sent people running screaming from WS-* into REST’s arms in the first place (see also “content negotiation”, above), and it would be a shame to scare them away from REST as well. URLs are fine, as long as you make some minore efforts to ensure that they don’t change.

[n/a] REST needs security, reliable messaging, etc. The RESTafarians don’t say this, but I’m worried that the JSR (the Java REST group) will. We already have a secure version of HTTP TLS/SSL, and it works fine for hundreds of thousands or millions of web sites. Reliable messaging can be handled fine in the application layer, since everyone’s requirements are different anyway, or maybe we want a reliable-messaging spec for HTTP in general. In either case, please don’t pile this stuff on REST.

So to sum up, just give every piece of information its own URL, then have fun.