(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for the 'web' Category

Country codes: a spreadsheet-sharing experiment

Monday, April 23rd, 2007

I’ve just uploaded a spreadsheet of country codes (plain HTML view) to Google documents and spreadsheets. The spreadsheet includes ISO 3166-1 alpha-2, alpha-3, and numeric codes together with FIPS 10-4 codes, and the country names as provided in each spec. I originally created it to help me map FIPS to ISO codes from some air navigation data.

I’m interested in online data collaboration — what tools people need, how it will work in practice, etc. — and this seems like an easy way to experiment. If you’d like to make any corrections to the spreadsheet, let me know, and I’ll add you as a collaborator. I might also upload some spreadsheets of general geodata in the future, where there’s more opportunity for contributions.

Open Data matters more than Open Source

Wednesday, March 28th, 2007

Dare Obasanjo just put up a posting with the title Open Source is Dead. Dare does happen to be a Microsoft employee, but his posting is none of the standard anti-Linux/OpenOffice/Apache/Firefox FUD. Instead, he voices a question that’s been floating around for a while:

… how much value do you think there is to be had from a snapshot of the source code for eBay or Facebook being made available? This is one area where Open Source offers no solution to the problem of vendor lock-in.

Let me out!!!

In other words, as the Web replaces Microsoft Windows as the world’s favorite desktop/laptop software platform (it may be there already), what good is Open Source to ordinary computer user? Even if a web site happens to be built on Open Source software (like the LAMP stack), I’m still locked in:

  • How can I move my address book and archived e-mail from Hotmail to Yahoo or GMail?
  • How can I move my blog (with all postings and comments) from Blogger to Bloglines or WordPress?
  • How can someone move her contact list and comments from MySpace to Facebook?
  • How can a buyer in Yahoo’s auction thingy verify my reputation on eBay?
  • How can I move my old flight plans from Aeroplanner to FBOWeb?
  • How can I move my sales contacts and data from Salesforce.com to Highrise?
  • How can I move my pictures with their tags from Flickr to Smugmug?

A crack of light under the door

These are huge problems, and the solution is probably going to have a lot more to do with Open Data than with Open Source. There are already a couple of minor successes:

  • Blog reading sites almost universally support OPML import and export, so that you can save the list of blogs you read from one site and move it to another.
  • Online wordprocessors and spreadsheets, of course, support the Microsoft Office formats and/or the OpenDocument formats and/or RTF and CSV.

That’s not much, though. Open Source (and its predecessor buzzword, Free Software) have been very important over the past couple of decades, giving us choices beyond the Microsoft/Apple duopoly that chained our desktops (and forcing the duopoly to open up a lot) and smashing the big-iron vendor cartel that owned our servers, but as the world shifts from desktop to web-hosted software, it can’t take us much further.

REST: the quick pitch

Thursday, February 15th, 2007

Now that the Java world is noticing REST, the low-pain alternative to RPC standards like WS-*, people are starting to blog about it again. Gossip with other IT folks also tells me that people’s customers are actually asking for REST explicitly (rather than having to be convinced to use it). With that in mind, I’m going to try to explain what I think matters about REST, and what you can safely ignore.

The elevator pitch

With REST, every piece of information has its own URL.

If you just do that and nothing else, you’ve got 90%+ of REST’s benefits right off the bat. You can cache, bookmark, index, and link your information into a giant, well, web. It works — you’re reading this, after all, aren’t you? Betcha got here by following a link somewhere, not by parsing a WSDL to find what ports and services were available.

Real best practices

If you want to do REST well (rather than just doing REST), you can spend 2-3 minutes after your elevator ride learning a few very simple best practices to get most of the remaining 10% of REST’s benefits:

Use HTTP POST to update information. Here’s the simple rule: GET to read, POST to change. That way, no body deletes or modifies something by accident when trying to read it.

Make sure your information contains links (URLs) for retrieving related information. That’s how search engines index the web, and it can work for other kinds of information (XML, PDF, JSON, etc.) as well. Once you have one thing, you can follow links to find just about everything else (assuming that you understand the file format).

Try to avoid request parameters (the stuff after the question mark). It’s much better to have a URL like

http://www.example.org/systems/foo/components/bar/

than

http://www.example.org/get-component.asp?system=foo&component=bar

Search engines are more likely to index it, you’re less likely to end up with duplicates in caches and hash tables (e.g. if someone lists the request parameters in a different order), URLs won’t change when you refactor your code or switch to a different web framework, and you can always switch to static, pregenerated files for efficiency if you want to. Exceptions: searches (http://www.example.org/search?q=foo) and paging through long lists (http://www.example.org/systems/?start=1000&max=200) — in both of these cases, it’s really OK to use the request parameters instead of tying yourself in a knot trying to avoid them.

Avoid scripting-language file extensions. If your URLs end with “.php”, “.asp”, “.jsp”, “.pl”, “.py”, etc., (a) you’re telling every cracker in the world what exploits to use against you, and (b) the URLs will change when your code does. Use Apache mod-rewrite or equivalent to make your resources look like static files, ending in “.html”, “.xml”, etc.

Avoid cookies and URL rewriting. Well, maybe you can’t, but the idea of REST is that the state is in the thing the server has returned to you (an HTML or XML file, for example) rather than in a session object on the server. This can be tricky with authentication, so you won’t always pull it off, but HTTP authentication (which doesn’t require cookies or session IDs tacked onto URLs) will work surprisingly often. Do what you have to do to make your app work, but don’t use sessions just because your web framework tells you to (they also tie up a lot of resources on your server).

Speculative stuff (skip this)

The strength of REST is that it’s been proven through almost two decades of use on the Web, but not everything that some of the hard-core RESTafarians (and others) try to make us do has been part of that trial. Stop reading now if you just want to go ahead and do something useful with REST. Really, stop! Some of this stuff is moderately interesting, but it won’t really help you, and will probably just mess up your project, or at least make it slower and more expensive.

[maybe some day] Use HTTP PUT to create a resource, and DELETE to get rid of one. These sound like great ideas, and they add a nice symmetry to REST, but they’re just not used enough for us to know if they’d really work on a web scale, and firewalls often block them anyway. In real-life REST applications, rightly or wrongly, people just use POST for creation, modification, and deletion. It’s not as elegant, but we know it works.

[don't bother] Use URLs to point to resources rather than representations. Huh? OK, a resource is a sort-of Platonic ideal of something (e.g. “a picture of Cairo”), while a representation is the resource’s physical manifestation (e.g. “an 800×600 24-bit RGB picture of Cairo in JPEG format”). Yes, as you’d guess, it was people with or working on Ph.D.’s who thought of that. For a long time, the W3C pushed the idea of URLs like “http://www.example.org/pics/cairo” instead of “http://www.example.org/pics/cairo.jpg“, under the assumption that web clients and servers could use content negotiation to decide on the best format to deliver. I guess that people hated the fact that HTTP was so simple, and wanted to find ways to make it more complicated. Fortunately, there were very few nibbles, and this is not a common practice on the web. Screw Plato! Viva materialism! Go ahead and put “.xml” at the end of your URLs.

[blech] Use URNs instead of URLs. I think even the hard-core URN lovers have given up on this now — it’s precisely the kind of excessive abstraction that sent people running screaming from WS-* into REST’s arms in the first place (see also “content negotiation”, above), and it would be a shame to scare them away from REST as well. URLs are fine, as long as you make some minore efforts to ensure that they don’t change.

[n/a] REST needs security, reliable messaging, etc. The RESTafarians don’t say this, but I’m worried that the JSR (the Java REST group) will. We already have a secure version of HTTP TLS/SSL, and it works fine for hundreds of thousands or millions of web sites. Reliable messaging can be handled fine in the application layer, since everyone’s requirements are different anyway, or maybe we want a reliable-messaging spec for HTTP in general. In either case, please don’t pile this stuff on REST.

So to sum up, just give every piece of information its own URL, then have fun.

MSIE MIA?

Wednesday, January 24th, 2007

What happened to Windows Internet Explorer?

Browser stats

I just took a peek at my server stats for megginson.com (I’m pretty lazy about following them) and had a huge surprise. I’ve adjusted these to exclude “Unknown”, which I assume are mostly spiders and blog aggregators:

  • MS Internet Explorer: 42%
  • Firefox: 37%
  • Mozilla: 7%
  • NetNewsWire: 6%
  • Opera: 3%
  • Safari: 2%
  • Netscape: 1%

I cut off the list at 1%. MSIE is still in the lead, but it has suffered a huge drop from a few months ago — could it be that my ISP’s version of AWStats doesn’t recognize MSIE 7 and is lumping it with “Unknown”, or is there a chance that the movement to Firefox is becoming a stampede? The last time I remember changes this fast was when MSIE was crushing Netscape in the late 1990s.

Operating system stats

My site is not primarily a Linux or Open-Source software site, and it does not seem to attract a disproportionately high share of non-Windows users. Again, excluding “Unknown”, here is the OS distribution for visitors:

  • MS Windows: 74%
  • Linux: 15%
  • MacOS: 12%

Linux might be a touch high here, but megginson.com is no SlashDot. Something’s going on — either Firefox is doing to MSIE what MSIE did to Netscape (knocking off a stale browser sitting smugly on its assumed monopoly), or, as I mentioned, it’s just a reporting glitch.

XML 2006 pickled and preserved

Friday, January 19th, 2007

The XML 2006 site is now pickled and preserved for long-term storage. Almost all of the presenters got their papers or slides in for the proceedings, if not on time, at least in time. Unfortunately, if you want to see a paper or slides from one of the few who didn’t send us anything, you’ll now have to pester them directly.

Recipe for pickling a web site

The original site was a hand-rolled LAMP implementation, but it was designed from the start to be amenable to a static copy. To pickle it, I started by doing a recursive slurp of the live site using wget (with the -m option) — that generated permanent, static HTML copies of the dynamic, database-driven pages on the site. At that point, I had an almost, but not quite perfect static copy of the site, because there were two things that wget missed:

  1. Images referred to only in CSS stylesheets (such as the banner).
  2. CSS stylesheets referred to by other CSS stylesheets.

It took only a few minutes to add all of that by hand, and the site was ready to go.

Why it worked

This will be old news to a lot of people reading, but a few simple advance steps (during site design) made later static preservation easy. Here’s what I did:

  • Every page has its own URL, period, end of discussion. No AJAX, no POST.
  • Every page (or at least, every page that we want to archive) is reachable, directly or indirectly, from the home page.
  • Script names are not shown to the public, so there are no URLs ending in “php” (hint: exposed script extensions like “php”, “asp”, or “jsp” are signs of gross incompetence in web design).
  • No web pages rely on exposed GET request parameters: for example, the URLs looked like /programme/presentations/123.html, not /programme/presentation?code=123, or even worse, /show-presentation.php?code=123.

And that’s it. Of course, if the site had included live forms, I would have had to remove those as well (and any links to them), but that wouldn’t have been much extra work.

On a final note, while the live site was hosted on an Apache server (the “A” in “LAMP”), the pickled site is hosted on a Microsoft IIS server. It made no difference at all — that’s the way Web standards are supposed to work.

Who’s searching for “XML”?

Tuesday, January 9th, 2007

Here are the top ten locations as of January 9 2007, according to Google trends:

  1. Pune, India
  2. Bangalore, India
  3. Hyderabad, India
  4. Chennai, India
  5. Mumbai, India
  6. Singapore, Singapore
  7. Delhi, India
  8. Tokyo, Japan
  9. Chiyoda, Japan
  10. Hong Kong, Hong Kong

Note that the top cities are all Asian. A search for “J2EE” returns almost exactly the same list. Now, compare the list for a representative new, trendy technology, Ruby on Rails:

  1. San Francisco, CA, USA
  2. Austin, TX, USA
  3. Pleasanton, CA, USA
  4. Seattle, WA, USA
  5. Salt Lake City, UT, USA
  6. Portland, OR, USA
  7. Vancouver, Canada
  8. Denver, CO, USA
  9. Oslo, Norway
  10. Auckland, New Zealand

This time, it’s 80% North American and 0% Asian, and more interestingly, all of those cities are west of the Mississippi. The easiest interpretation of this very small sample is that the Asian companies concentrate on established technologies that they can be paid for using, while the North American west coast companies are disproportionately interested in new, unproven technologies. What about a new technology that’s designed to work with an older one? Could we expect a mix of Asian and North American west coast cities? Here are the top cities searching for “XQuery”:

  1. San Jose, CA, USA
  2. Bangalore, India
  3. Singapore, Singapore
  4. Chennai, India
  5. San Francisco, CA, USA
  6. Mumbai, India
  7. Pleasanton, CA, USA
  8. San Diego, CA, USA
  9. Washington, DC, USA
  10. Hong Kong, Hong Kong

The implication of this very unscientific survey is that you can determine the relative maturity of a technology by looking at the weighting of search origins between western North America and eastern Asia.

Templating languages and XML

Saturday, December 23rd, 2006

Erich Schubert is talking about web templating languages. He’s looking for a pure-XML templating solution, but that might not be necessary for simple web-page design, where we don’t need all the extra benefits of heavy-duty transformation standards like XSLT.

Keeping it simple

For PHP-driven web sites, I’m a big fan of Smarty, which uses braces (”{” and “}”) to delimit template constructions. Braces have no special meaning to XML parsers (they’re just character data), so it’s possible to put a template expression inside an attribute value (for example), while keeping the template itself as well-formed XML and not requiring the elaborate paraphrastic expressions you need to set up attribute values in XSLT:

<p id="x-{$myvalue|escape}">Hello, world!</p>

Concurrent markup resurrected

Really, Smarty adds a second set of concurrent markup on top of the XHTML. Smarty constructs don’t have to balance with XML element boundaries, and with only a little care, I’ve never ended up with a Smarty template that wasn’t well-formed. JSP’s mistake was using something that looks like XML but isn’t quite, messing up parsers. Even the old SGML CONCUR feature would not have allowed markup inside attribute values. Sometimes there’s something to be said for using two different syntaxes when you’re trying to represent two different things.

Yahoo stands firm behind its search API

Saturday, December 23rd, 2006

Early in the week, I posted about the end of the Google search API, and speculated that — since everyone else tends to copy Google — it might be the start of a general trend away from open data APIs and in favour of server-side AJAX widgets. In response, Amit Kumar of Yahoo sent me an e-mail message (after failing to get past Spam Karma in the comment system for my blog):

You don’t have to worry. We just posted a blog entry on this topic. Yahoo Search APIs are going strong - we welcome developers to use our APIs.

http://www.ysearchblog.com/archives/000393.html

Amit Kumar
Manager, Site Explorer

Thanks, Amit. Fortunately, megginson.com isn’t popular enough that it will break Yahoo’s 5,000 queries/day quota.

SOAP, REST, JSON, XML, and Serialized PHP

Note that Yahoo has a REST interface that can deliver results in XML, JSON, or serialized PHP, so if people get tired of the REST vs. SOAP perma-debate, there’s some alternative material for you here (if you want a good roaring debate, be careful to avoid reading Tim Bray’s carefully balanced view).

Beginning of the end for open web data APIs?

Monday, December 18th, 2006

[Update: hacking the Google Search AJAX API — see below.]

[Update #2: Don Box is thinking along the same lines as I am.]

[Update #3: Rob Sayre points out that there is, in fact, a published browser-side JavaScript API underlying the AJAX widget.]

Over on O’Reilly Radar, Brady Forrest mentioned that Google is shutting down its SOAP-based search API. Another victory for REST over WS-*? Nope — Google doesn’t have a REST API to replace it. Instead, something much more important is happening, and it could be that REST, WS-*, and the whole of open web data and mash-ups all end up on the losing side.

It’s not about SOAP

Forget about the SOAP vs. REST debate for a second, since most of the world doesn’t care. Google’s search API let you send a search query to Google from your web site’s backend, get the results, then do anything you want with them: show them on your web page, mash them up with data from other sites, etc. The replacement, Google AJAX API, forces you to hand over part of your web page to Google so that Google can display the search box and show the results the way they want (with a few token user configuration options), just as people do with Google AdSense ads or YouTube videos. Other than screen scraping, like in the bad old days, there’s no way for you to process the search results programmatically — you just have to let Google display them as a black box (so to speak) somewhere on your page.

A precedent for widgets instead of APIs

An AJAX interface like this is a great thing for a lot of users, from bloggers to small web site operators, because it allows them to add search to their sites with a few lines of JavaScript and markup and no real coding at all; however, the gate has slammed shut and the data is once again locked away outside the reach of anyone who wanted to do anything else.

Of course, there are alternatives still available, such as the Yahoo! Search API (also available in REST), but how long will they last? Yahoo! has its own restructuring coming up, and if Nelson Minar’s suggestion (via Forrest) is right — that Google is killing their search API for business rather than technical reasons — this could set a huge precedent for other companies in the new web, many of whom look to Google as a model. Most web developers will probably prefer the AJAX widgets anyway because they’re so much less work, so by switching from open APIs to AJAX widgets, you keep more users happy and keep your data more proprietary. What’s an investor or manager not to like?

What next?

Data APIs are not going to disappear, of course. AJAX widgets don’t allow mash-ups, and some sites have user bases including many developers who rely on being able to combine data from different sources (think CraigsList). However, the fact that Google has decided that there’s no value playing in the space will matter a lot to a lot of people. If you care about open data, this would be a good time to start thinking of credible business cases for companies to (continue) offer(ing) it.

Update: Hacking the Google AJAX API (or, back to Web ‘99)

The AJAX API is designed to allow interaction with JavaScript on the client browser, but not with the server; however, as Davanum Srinivas demonstrates, it’s possible to hack on the API to get programmatic access from the server backend. I’m not sure how this fits withThis violates Google’s terms of service, and obviously, they can make incompatible changes at any time to try to kill it, but at least there’s a back door for now. Thanks, Davanum.

Personally, I was planning to use the Yahoo (REST) search API for site search even before all this broke, because I didn’t want to waste time trying to figure out how to use SOAP in PHP. I’m glad now I didn’t waste any time on Google’s API, and I’ll just keep my fingers crossed that Yahoo’s API survives.

Good/bad/good/good news

Monday, December 11th, 2006

Good news: the XML 2006 web site was far more popular than we anticipated.

Bad news: the site was so popular during the conference that we exceeded our bandwidth limit and went off line.

Good news: the site didn’t go down until two days after the conference was finished.

More good news: the site is back up now.

Apologies to everyone for the inconvenience. In a couple of weeks, we’ll be putting the proceedings online, and I’ll watch bandwidth closely in case we get linked to from somewhere popular. Maybe next year we should use Amazon’s Elastic Compute Cloud (EC2) instead of conventional shared hosting.

Now, hurry up and get your proposals in for XTech 2007, because the deadline is only four days away (if you liked Boston in December, you’ll love Paris in May).

XML hot topics: the 10 most viewed XML 2006 presentation summaries

Saturday, November 25th, 2006

With the XML 2006 conference just over a week away, I took another look at the server logs to see what presentation summaries were getting the most page views:

  1. Web Services Policy Expression Alternatives
  2. W3C XML Schema Patterns for Databinding
  3. Social Semantic Mashups: Exploring Social Networks with Microformats and GRDDL
  4. XQueryP: An XML Application Development Language
  5. Getting There — The XML/XQuery Ecosystem (opening keynote)
  6. The ODF Plugin for MS Office
  7. Panel: XML Pipeline Processing
  8. Panel: XML Project Management Best Practices
  9. Making the Most of XML with Adobe InCopy and InDesign
  10. JSON, The Fat-Free Alternative to XML

Vendors, consultants, journalists, and book publishers, take note. XQuery, in particular, makes the list twice (congrats to the W3C working group on its recent release), and there clearly is an intersection between the set of people who care about MS Office and the set of people who care about ODF.

Wikipedia and trust

Monday, November 6th, 2006

Update: corrected Encyclopedia Britannica link.

A lot of people — publishers, the press, public figures, and bloggers — spend a lot of time agonizing over Wikipedia, and the general conclusion is either (a) Wikipedia is dangerously untrustworthy (from its detractors), or (b) Wikipedia is great, but don’t trust everything you read there (from its supporters).

Here’s a different perspective: don’t trust anything you read or hear anywhere, guys. If you have the stomach for it, take a look at the 1911 Encyclopedia Britannica article for NEGRO, remembering that this edition was published within living memory, 48 years after the American Emancipation Proclamation, and 104 years after the end of slavery in the British Empire, in what was probably the world’s most authoritative and trusted reference source. What do you think the odds are that our grandchildren will react with the same disgust and disbelief when they look back at how our mainstream media and other publications covered the issues of our day, from their almost total ignorance of Iran (guys in black with long beards and nuclear bombs) to their glorification of war (support our troops, too bad about [non-first-world] victims) to their lazy republishing of the spin and just simple lies from the press releases of just about every public-interest pressure group (from the environmental to the gun lobby, from the gay rights movement to the fundamentalist Christian movement).

If the occasional (and rare) error or vandalism in Wikipedia finally teaches people that they are responsible for verifying everything they read, that will be a good thing. Wikipedia is still usually my first source for information, but nothing is ever my last source. Overall, however, because Wikipedia has an international authorship, I find that the information in it is generally of a much higher quality than I can get from the mainstream North American publishers or media (and I’m not talking only about Fox News).

XML 2006: most viewed presentation and tutorial summaries

Monday, October 30th, 2006


XML 2006 Conference logo

It’s just over a month now until XML 2006, so make sure you register and reserve your room soon.

Web site stats

For a slightly different look at the conference, I popped dug through the web site’s server logs to see which individual paper descriptions were being viewed the most. Note that these are not necessarily the best presentations, or even the ones that will have the highest attendance, but they are attracting some web traffic. Here are the top five as of yesterday:

  1. Prud’hommeaux and Le Hegaret, Web Services Policy Expression Alternatives (3,491 hits)
  2. Halpin, Social Semantic Mashups: Exploring Social Networks with Microformats and GRDDL (756 hits)
  3. Edson and Stevenson, Making the Most of XML with Adobe InCopy and InDesign (158 hits)
  4. Chamberlin, XQueryP: An XML Application Development Language (147 hits)
  5. Hahn, Peaceful Coexistence: The SGML/XML Transition at Cessna Aircraft (137 hits)

Several other paper summaries have attracted more than 100 hits, as have two of the tutorials:

Do you think your presentation or tutorial should have been in one of these lists? Then do something about it — talk about it in your blog or on mailing lists, post the link to your company intranet, etc., and make sure that people who would want to come and hear you know about it.

What’s popular, and why?

So what can we conclude from all this? Certainly, given the disproportionately high number of hits for Eric Prud’hommeaux’s and Philippe Le Hegaret’s presentation (about four to five times as many as the second-place one), there’s a lot more interest out there in Web Services than some of us might have suspected. Otherwise, the presentations seem to be spread nicely among the three thematic tracks, publishing, web, and enterprise, suggesting that the conference will be a good meeting place from people coming from those three different worlds.

Announcing Newmatica Barcode (testers needed)

Friday, October 6th, 2006

[Update: I've shut down the site after nearly a year of inactivity. No regrets — it was a good learning experience, and cost very little (aside from spare time).]

Newmatica Barcode

This summer I had an idea for a site where people could tag and discuss basic consumer items (like, say, boxes of pens, or breakfast cereal) the same way that they can discuss books or CDs on Amazon or tag web pages on del.icio.us. The first version of the site is now online here:

http://www.newmatica.com/ No longer available

I’d be grateful if a few of my blog readers could try it out and give me their opinions (either in comments or by private e-mail). Here are some good starting points:

  • Search for food
  • View the tag cereal
  • Look up the product with the barcode 0 11361 50506 6
  • Grab a Web-2.0-y XML view of a product, with XLink links suitable for web crawling.
  • Subscribe to the RSS 2.0 feed for a product’s comments.

Anti-Goals

I don’t intend this site as a competitor to the Internet UPC Database: my main goal is to let people share their own observations and opinions about consumer products (e.g. “these diapers don’t leak”, “brand X chocolate tastes better”), and to classify and, effectively, vote for products by tagging them, and the barcodes are only secondary to that. Likewise, sites like ScanBuy and Qode, which concentrate on using cell phones for on-the-spot price comparison, are also working in a different area: I want to give people a chance to share their own information, not provide point-of-sale information to them.

Definitely not a stealth startup

In his article Stealth Startups Suck, Bloglines founder Mark Fletcher wrote that “stealth mode for a web start-up is the kiss of death.” I’m taking Mark’s advice to heart: I first sketched out the idea for Newmatica Barcode on a notebook (the paper kind) in a cabin in Perce, Quebec on the Gaspe Peninsula 10 weeks ago, and now here’s a fully-functioning site for people to try. There has been no pre-announcement, no careful dispatch of advance information to industry mavens and investors, or anything like that — you, my blog readers, are the very first people outside my immediate family to hear of this project.

So please, try out the site, create an account, enter some products, do some commenting and tagging, and be patient if you encounter any bugs (I promise to fix them as fast as possible). I’m looking forward to hearing back from you soon.

Stephens vs. Wikipedia

Thursday, August 3rd, 2006

Stephen Dubner is the co-author of Freakonomics, a book that stands out for its ability to move past conventional wisdom and commonplaces to look at evidence that others either ignored or couldn’t understand. Dubner recently posted a blog entry about Stephen Colbert’s attack on Wikipedia.

On his show, Colbert edited Wikipedia to introduce deliberately false information into the article about his show, and then encouraged his viewers to do the same for articles about elephants. Many viewers took Colbert up on his offer.

Is that proof that Wikipedia is undependable, as Dubner suggests? In fact, all of the incorrect information was almost immediately removed, some articles were temporarily locked to avoid vandalism, and Colbert’s account was suspended. Wikipedia can be temporarily undependable, but (at least for any frequently-read article) it is quickly self-correcting — its biggest problem is the articles that are rarely read, where vandalism or errors can last for a longer time. Conventional encyclopedias have no (or extremely little) deliberate vandalism, but their information is usually out of date, they have significantly less coverage (a tiny fraction of Wikipedia’s), and unintentional errors can take years or decades to correct.

I’m a bit disappointed that Dubner was satisfied simply to repeat the obvious, commonplace criticisms about Wikipedia without any critical thought — that’s not the Freakonomics way.

Firefox vs. PRG

Wednesday, May 31st, 2006

[Update: it's working now, after upgrading Ubuntu. Here's an online test for your own browser.]

Post/Redirect/Get (PRG) is a common web-application design pattern, where a server responds to an HTTP POST request not by generating HTML immediately, but by redirecting the browser to GET a different page. At the cost of an extra request, PRG allows users safely to bookmark, reload, etc.

When someone attempts to reload a page generated by a POST request, browsers will generally pop up a warning that reloading will cause a form to be resubmitted, possibly causing you to purchase two sports cars (etc.) — that warning is a good thing. Strangely, however, Firefox 1.5.03 will pop up the same warning after a PRG operation, when reloading should not cause anything bad to happen. I can think of a few possible reasons:

  1. Firefox wants to repeat the entire PRG operation rather than just the final GET
  2. Because the GET was the (indirect) result of a POST operation, Firefox still wants to warn you that there might be something fishy.
  3. An obscure bug.

I’m leaning towards #3, but I’m curious about whether anything thinks that Firefox is doing the right thing here, and whether other browsers (MSIE, Opera, Safari, etc.) act the same way.

Continuations, cont’d

Saturday, May 20th, 2006

[Update: see further contributions to the discussion from Ian Griffiths, Avi Bryant, James Robertson, and Joe Duffy; note also John Cowan's excellent comment below, pointing out that hidden fields work with the back button but not with bookmarks.]

It looks like continuations are back on the discussion board (Gilad Bracha, Tim Bray, and Don Box). I spent some time with Scheme a decade ago and continuations were one of the new features I had to try to understand. Then, as now, I found them more clever than practical.

Gilad sets up a use case for continuations before he goes on to oppose them: in essence, a web application could use continuations to maintain separate stacks, so that as a user hits the back button and then starts down new paths, the web application would not become confused, selling the user a trip to Hawaii instead of Alaska. I can see how continuations would work for that, just as I can see how a bulldozer could turn over the sod in my garden, but I’m far from convinced that either is the right tool for what is really a much simpler problem.

Explicit state

First, a continuation preserves the entire state of a program, including the stack, instruction counter, local variables, etc. How much of that do you really need for a hypothetical travel web app? In reality, you probably need, maybe, 1-5 variable values to restore a previous state in the travel app, so why not just save those explicitly? It would be faster, more secure (less information being saved), and much easier to performance tune and debug (since no magic is happening behind the scenes). Save those variables in a database, in a hash table, in an XML or CSV file, in memcached, or wherever happens to be most convenient. You may be looking at under 100 bytes for each saved state, so if you really want to do this, it’s not going to hurt too badly.

REST

But do you really want to do this? Most of the discussion around REST has focussed on the use of persistent URLs and how to use HTTP verbs like GET, POST, PUT, and DELETE, but there’s another, perhaps more critical idea behind REST — that the resource your retrieve (a web page, XML document, or what-have-you) contains its own transition information.

Let’s say that you load a web page into your browser, load more web pages, then use the back button to return to the original one. Now, select a link. What happens? Did you browser have to go back to the original web server, which was using continuations (or other kinds of saved state) to keep track of the links from every page you visited, so that it won’t send you to the wrong one? Of course not. The web page that you originally downloaded already included a list of all its transitions (links), and intuitive things just happen naturally when you hit the back button.

The web is stateless, but web application toolkits maintain pseudo-sessions (using cookies, URL rewriting, or what-have-you) that makes them look stateful, and that makes programmers lazy. Obviously, you don’t want to stick information like ‘isauthenticated’ on a web page, since it could be forged; likewise, you don’t want to put a credit-card number there. But it is trivially simple to make sure that forms, like links, go to the right place even when you hit the back button — just make the transitions fully independent of any session stored on the server side. For example, consider this:

<form method="post" action="/actions/book-trip">
  <button>Book this trip!</button>
</form>

Presumably, the trip the person was looking at is stored somewhere in a session variable on the browser. DON’T DO THIS! As Gilad pointed out, someone hitting the back button might end up booking the wrong trip. There are gazillions of ways to push all of the context-sensitive stuff into the web page itself, where it belongs. Here’s one example:

<form method="post" action="/actions/book-trip">
  <label>Book your economy trip to Alaska!</label>
  <input type="hidden" name="destination" value="alaska"/>
  <input type="hidden" name="package" value="economy"/>
  <button>Book it.</button>
</form>

Here’s another:

<form method="post" action="/actions/book-trip/alaska/economy">
  <label>Book your economy trip to Alaska!</label>
  <button>Book it.</button>
</form>

This is 100% backbutton-proof and it’s trivially simple to implement. It took me a while after reading Gilad’s (admittedly, strawman) example to realize that there are people who do not develop webapps this way. If they do this much damage just with a Session stack, how much pain will they be able to cause with continuations?

The REST people are right, at least on this point: there’s no need to drive a continuation bulldozer through your webapp, when a little REST garden spade will work quite nicely (and won’t tear up your lawn in the process). Don suggests that there may be other, more legitimate use cases for continuations outside of web applications, and I have no reason to disagree, but I would like to look at them pretty carefully.

Getting the point of Skype and chat

Tuesday, May 16th, 2006

I signed up for Skype a while ago, put EUR 10 into my account, and made a few calls. It was cute, it worked, but after a couple of experiments I couldn’t see the big deal. After all, Skype lags by a second or two (like the old trans-Atlantic cables), it has poor sound quality even compared to my cell phone, and phone calls in North America, even long distance, are so close to free that Skype hardly matters.

Over here in the Amsterdam this week, it’s a different story. My North American cell phone doesn’t work (of course), calling from the hotel is ridiculously expensive (even calling a toll-free number), and there are very few public phones. All I need is a wireless Internet signal, though, and I can call home on Skype to my heart’s content for (literally) pennies. Even more importantly, I can call North American toll-free numbers directly, something that’s not otherwise possible at any cost from Europe. OK, now I get the excitement around VoIP.

Instant messaging (aka chat) has been around in various guises a lot longer than Skype, but I’m in my fourties, and thus, a little too old ever to have used it socially. What finally changed that is Google’s integration of IM right into their webmail service. While I’m reading my GMail, a little green light goes on when anyone I know is reading at the same time (the joys of AJAX). After midnight Amsterdam time last night, I ended up with three chat windows open — one for my spouse, and one for each of my kids — carrying on three separate private conversations about how their days had gone. I could have called on Skype, of course, but I couldn’t have talked privately to all three at once, and I wouldn’t have known when they were all free without those little green lights. While typing furiously and switching among windows, I got perhaps a tiny taste of what it’s like to be a hyperactive 16-year-old girl.

XTech, AJAX, and Rails

Speaking of AJAX, IDEAlliance staff has told me that the AJAX developer’s day here at XTech 2006 has been so popular that it’s almost overwhelmed, with a huge number of last-minute walk-ins. The Rails tutorials have also been popular. There’s obviously a lot of demand for AJAX and Rails information over here — good job, Edd.

Now, back to age-appropriate communications. When’s the penknife to sharpen my quill? …

Giving thanks

Monday, May 15th, 2006

Over on XML.com, David Peterson gives Microsoft some well-deserved thanks for implementing and popularizing the XMLHttpRequest object that’s so useful in modern web development. He also thanks them for not charging for it, but of course, if they had tried to charge it never would have become popular (from SAX, I know that paradox well).

Omissions

There are a couple of problems with giving thanks to inventors, though. The first is that you inevitably leave people out. David, for example, thanked Microsoft for all of AJAX and modern web development in general. AJAX doesn’t consist solely of XMLHttpRequest, however; it also needs JavaScript and a DOM (both pioneered by Netscape) to manipulate the client display, and something like XML (W3C) or JSON (Douglas Crockford) to encode the messages. Most modern web developers also want CSS (Håkon Wium Lie and Bert Bos). And then, of course, there’s HTML and HTTP (Tim Berners-Lee). To illustrate my point, I’ve certainly left out a lot more that I could have included here, and have likely misassigned at least some credit.

Death of the inventor

The second problem is that it almost never makes sense to assign credit to individual people or companies. Who should get credit for SAX? Me, because I coordinated it? James Clark, because I based many of the ideas on his earlier SGML interfaces (and he suggested many of SAX’s features)? Tim Bray, because he thought up a catchy name? The other dozens of other xml-dev members who contributed most of the core ideas? The major software vendors who actually decided to use SAX, giving it credibility outside of the xml-dev community?

The same applies to just about every other technology we use. Not only do they depend on other innovations (the Web without TCP/IP? SAX without XML?), but the successful innovations are almost always simple and obvious, so their main value comes not from any particular technical brilliance but from the brute-force fact that lots of people use them — in other words, community-building is more important than innovation. Microsoft imitated Netscape’s level-0 DOM, and then the W3C standardized it so that it would work across browsers, then browser developers agreed to follow along, then web developers decided it was safe to start using it. Microsoft initially failed to build a community for XMLHttpRequest (which was a proprietary ActiveX component), so it languished mostly unused for years, until other browsers like Mozilla/Firefox, Safari, and Opera decided to support it as well — it was only then that we started to see a real community grow, and high-profile sites like Gmail and Google Maps take off. Tim Berners-Lee’s original HTML would hardly have mattered if the early Mosaic browser hadn’t shown how to make it user-friendly. Etc., etc. While Netscape introduced some good ideas like the DOM and Javascript, they also introduced some that flopped (does anyone else remember CORBA in the browser?) — no community of users, no success.

Thank the users

The moral of the story is that technology success is not something that a person or company gives to the net, but something that comes back from it, as if you threw a stone at a tree without knowing whether an avalanche of silver or of bird dung would shower down from the branches onto your head. A complex, brilliant idea with no users is worthless; a simple, mediocre idea with lots of users is a treasure.

Kudos for Google

Friday, January 20th, 2006

(Updated to include MSN response; updated again for the China thing.)

According to this CBC article, Yahoo, MSN, and AOL have all willingly handed over search records to the U.S. government (they claim that no personal information is included, but personal information can often be inferred from search URLs). Google said ‘no’, and is now taking the fight to court.

The request is unrelated to national security — instead, the government is gathering background evidence to defend an anti-porn law in court.

Update: Ken Moss defends MSN’s action (via Dare Obasanjo). Ken’s comment repeats the point made in the CBC article that MSN believes it released no personal information.

Update #2: And now, Google has agreed to censor search results for China.  It guess this pulls Google back down to a karmic break even: defender of privacy rights in North America, but anti-free-speech collaborator in Asia.