(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for the 'web' Category

One app store to rule them all …

Sunday, January 10th, 2010

During my university studies, I first encountered the idea of the Myth of Progress — in 19th and 20th centuries a lot of people assumed that the world generally gets better each generation (aside from occasional blips like depressions or world wars), with less bigotry, better medicine, new technology, etc., but there’s no guarantee that any next generation will build on and improve the accomplishments of the previous one, and history’s movement may be more akin to a random walk.

Case in point: in the information technology world, the greatest accomplishment of the most talented coders and business people in GenX was replacing the Baby Boomers’ nasty old platform-dependent shrink-wrapped computer applications with open web applications that could run anywhere, from a Windows desktop to a Linux cell phone. Write once, run all over the place on any hardware/OS you want. GMail instead of Eudora. Wikipedia instead of Encarta. Cool, eh?

So GenY comes along and says “hey: instead of encouraging people to browse the web with open standards, let’s build proprietary applications that run on only one type of mobile phone. And we’ll allow only one store to sell those applications for each type of phone, and every proprietary, platform-specific app will have to be preapproved and precensored by the phone manufacturer, who will extort^H^H^H^H^H^H be gladly offered a cut of sales.” Even Microsoft in its monopolistic hey-day — before it became the toothless lion it is today — never had the balls to try anything like that with Windows apps.

Who, ten years ago, would have predicted an IT catastrophe like this after so much progress and hope? It’s enough to make a person cry. Let’s encourage those GenY’ers who taken up the torch and continue to work on the dream of an open web.

A distinctly Canadian kind of fame

Tuesday, March 3rd, 2009

In Canada, people who have served time for a wrongful murder conviction become famous — very famous — and stay that way for years and decades. Steven Truscott, Donald Marshall Jr., David Milgaard, and Guy Paul Morin are arguably household names, better known than many celebrities (including most medal-winning Canadian Olympic athletes, award-winning musicians, etc.)

Truscott’s initial wrongful conviction took place 40 years ago, but if anything, he’s better known now than any time before. Three men with more recentky-overturned convictions — Robert Baltovich, Bill Mullins-Johnson, and James Driskell — are also getting on-going press coverage, TV documentaries, etc.

I had never thought anything unusual about this phenomenon, until the Mullins-Johnson article was suddenly deleted from Wikipedia, with no debate — the Wikipedia editor had assumed that a wrongful conviction was so obviously unnotable that no discussion was required, but when I objected, he did restore the article and start a proper RFD debate.

When Wikipedia has articles about minor, imaginary videogame characters, it seemed unimaginable to me at first that editors would try to delete an article about a real, famous person, but so far, there seems little support for keeping the article. Thinking about it, I suddenly realized that the wrongfully-convicted aren’t famous in the U.S. Sure, I could Google around and find a few names, but in the U.S., serving 10 years for a murder you didn’t commit does not automatically make you a household name — in fact, it might not even result in a national news story.

Perhaps there’s a strong feeling of discomfort around the issue in a country that still executes so many of its citizens. Or perhaps, because the wrongly-convicted often have prior criminal records, Americans don’t feel that their convictions were such a serious injustice. Many U.S. jurisdictions (all?) have very small limits on the compensation you can receive for a wrongful conviction, while in Canada, someone who has been in jail for years could receive well over $1M — a big news story in itself.

I have a conflict of interest with the RFD for the Bill Mullins-Johnson article because I was the original author (though many others have since contributed), but if the article is going to be deleted, I’d hate it to be simply from lack of debate. So, Wikipedia users, whether you agree or disagree with me, please visit the RFD page and have your say.

Mapping people, money, and land through airports

Friday, January 30th, 2009

OurAirports lets members tag airports to create different kinds of maps. I’ve created two maps that show very vividly where the intersections of people, money, and land occur in the world.

Welcome to the club …

The first tag, top150, shows the world’s 150 busiest airports by passenger traffic (as of 2007). Central Africa has lots of people, but not much money, so it’s empty. Australia and Canada have high per-capita incomes but a low population density, so they also appear mostly empty in the map, with only a handful of top-150 airports each. The U.S. has a lot of land but also a lot of money and a lot of people, so it’s very full. India and the Persian Gulf countries are starting to fill up, as incomes rise and more people travel.

… but not this club

The second tag, top30, shows a much more exclusive club, the world’s 30 busiest airports by passenger traffic. These are the absolute busiest hubs, and it takes a rich and populous city or country to support one. Not by accident, fully half of these airports (15) are in the United States, and 8 more are in Western Europe, leaving only 7 for the rest of the world to share.

In this club, the 1.3 billion citizens of China are represented by only two airports (including Hong Kong), and the 1.2 billion citizens of India are not represented at all. Canada and Australia also don’t make the cut (too few people).

Of course, there are other considerations: aside from its money, land, and people, the heavy air passenger traffic in the U.S. may also reflect its horrendous rail system.

sorry.google.com

Wednesday, October 1st, 2008

See the update below. I was right: Google’s new bot detection is overly naive, and I’m not the only one having problems.

See also John Cowan’s comment below, for a different (personal) interpretation of Google’s terms of service.

Google Maps won’t show me satellite imagery this morning.

Google has recently set up a system to try to autodetect and block bots scraping their system, and it isn’t working very well — people are getting blocked even from Google Search simply because they have too many (human-generated) queries passing through the same proxy.

This morning, I suddenly discovered a different problem: the satellite view in Google Maps has stopped working for me — I get the “don’t have imagery at this zoom level for this region” error everywhere, at every zoom level. I can still see maps and terrain, but not satellite pics, and I noticed the host sorry.google.com setting a lot of cookies.

Is Google’s satellite imagery down for everyone else this morning, or has their software decided that I’m a bot trying to scrape satellite imagery?

Update

I was right — Google’s software had decided that I was a bot. They have a test link directly to a satellite to see if you’re being blocked:

http://khm0.google.com/kh?v=31&hl=en&x=0&y=0&z=1&s=

It took me to this page. I was able to renable access simply by entering a CAPTCHA.

What happened?

I wrote a couple of months ago about how to detect overzoom in Google Maps. My guess is that the overzoom protection in OurAirports — automatically zooming out every 4 seconds until there were actual satellite tiles available — triggered to bot alert, and I’ve disabled the feature for now.

That’s very bad news for any mashup that uses JavaScript to do more sophisticated things with Google Maps, like, say, panning at regular intervals. Google’s bot detection seems to be extremely naive, and any repeated action at regular intervals will fire it off.

Widgets vs. Portlets

Monday, July 14th, 2008

Widgets are web pages embedded in larger web pages, generally using iFrames — the content comes via a separate HTTP connection and has its own CSS stylesheet, cookies, etc. Final composition takes place in the user’s browser.

Portlets are software modules that produce fragments of HTML markup that are assembled into a single HTML page, sharing common CSS stylesheet, cookies, etc. Final composition takes place on a portal server, and a single page is delivered to the client browser.

Features

Portlets have a lot of features that iFrames don’t: they require fewer HTTP connections, they allow for common styling (one CSS stylesheet can style all the portlets on a page), and they can communicate with each other and take advantage of common authentication/authorization, etc. (so that a user doesn’t have to sign on to each portlet separately).

Portlets use a window-manager metaphor, allowing the portlet server to resize them, expand them etc. They also have modes, like edit and view, all of which can be accessed through a common interface. All of this happens on the server side.

iFrame-based widgets don’t normally do any of that, but they don’t require special portal servers, they can be embedded in more creative ways, and they offload the processing from the server to the client. They also introduce potential security holes, but only if they’re hosted somewhere that’s not under the original company’s control (the same applies to remote portlets using WSRP).

Users

Portlets are used mainly in intranets, to provide a collection of enterprise apps on a single web page for employees (e.g. a news feed, calendar, expense forms, bug reports, etc.).

Widgets are used everywhere else (e.g. embedding Google maps, Facebook applications, etc.). While widget authors/consumers don’t tend to know (or care) much about portlets, the portlet people haven’t failed to notice the popularity of widgets — most (if not all) portal servers now have an iFrame portlet that does little more than wrap an iFrame and allow it to be resized, etc.

Future?

Are the extra features of portlets compelling enough to justify the extra cost and hassle of running a portlet server? Now that we have browser tabs, AJAX, etc., do enterprises really need to continue to squish all their apps into a single web page that looks like a 1995 Mac desktop gone bad?

My guess is that the only portlet feature with compelling benefits is common authentication/authorization — once the web community gets behind a solution to that problem (OpenID or something similar), widgets will probably push portlets out completely, even in the enterprise.

Structured community authoring

Tuesday, June 24th, 2008

About 10 months after launching my OurAirports site for air travelers and pilots, I’ve finished the basic infrastructure to allow community authoring. Unlike Wikipedia, OurAirports contains information that is specialized, structured and finite (there are only so many airports in the world), and I’m interested to see the technical and social differences from the Wikipedia world.

More details are available in the announcement on my flying blog. Note, also, that all of the data collected is free for download (public domain).

Set and forget: 335 days and counting …

Wednesday, June 18th, 2008

Late in summer 2007, I set up a dedicated Linux Ubuntu server at a site in San Diego to host OurAirports and my consulting site, megginson.com. The ISP has had some net outages, but the Ubuntu server itself has kept on chugging through. Here’s the uptime:

 11:18:31 up 335 days,  7:12,  1 user,  load average: 0.05, 0.06, 0.01

Since the ISP set the computer up with a minimal Ubuntu install and gave me the access info, it has run continuously — I know I should install an updated kernel some day, but it’s hard to bring myself to do that.

Strange web exploit attempt (?)

Monday, February 4th, 2008

In the search logs for OurAirports, I noticed a series of searches for URLs:

http://www.feliciano.de/Webgalerie/bilder/Italy/une/yiwul/
http://www.unduetretoccaate.it/codice/aseje/wocobo/
http://www.altaiseer-eg.com/ar/articles/jed/umut/

At first, I thought they might be a kind of link spam — some sites display recent searches — but when I checked one of the URLs, I found something totally unexpected:

<?php echo md5("just_a_test");?>

They’re all the same. This is almost certainly related to passwords: is there a known flaw in a PHP content-management system like Drupal, or in the PHP API for a search engine like Lucene, where this would do some damage, or is it just a test probing for weaknesses? Is the PHP code supposed to be served up literally like that, or should I be seeing the MD5 instead?

Is the problem Wikipedia, or David Megginson?

Wednesday, January 23rd, 2008

The Wikipedia article about me was vandalized yesterday (vandalized version) by someone from the IP address 24.225.66.95, which seems to be in or near Raleigh, North Carolina.

What should I do?

  1. Edit the article myself to remove the vandalism? — OK, that’s a really bad idea
  2. Go in anonymously and edit the article? — also a bad idea
  3. Rejoice in the fact that my article is important enough to be vandalized?
  4. Despair in the fact that my article is not important enough for anyone else to have noticed and fixed it?
  5. Reconcile myself to the idea that the edits are not vandalism at all, and I am, in truth, “a freaking looser who knows nothing” and “a noob”

I’m leaning towards #5, though I’m disappointed that kids these days seem to have forgotten how to swear properly: “a freaking loser”???

Google analytics for XML 2007

Monday, January 21st, 2008

I forgot that I’d enabled Google analytics for the XML 2007 web site. Even though the conference is long over, I though it would be interesting to look and see what some of the trends were from September 2007 to January 2008 (keeping in mind that these stats apply to the kind of web users interested in a tech conference, not to the web at large).

MacOS is still #3

Despite the halo effect from the iPod and the widespread use of Mac notebooks among speakers, MacOS still hasn’t managed to make much of a dent in the visitor logs:

  1. Windows: 80.70%
  2. Linux: 9.57%
  3. MacOS: 9.44%

If MacOS can’t beat Linux on the desktop, I don’t know if it has a bright future.

Internet Explorer below 50%

Firefox is still #2 behind MSIE, but for this crowd, the gap is small:

  1. MSIE: 49.61%
  2. Firefox: 41.14%
  3. Safari: 3.50%
  4. Mozilla: 3.22%
  5. Opera: 1.76%

If you’re designing or maintaining a web site with a tech audience, you’d better be testing on Firefox as well as MSIE.

Screen resolution and colour depth

I know that web designers like big layouts, but the sad fact remains that 1024×768 is still the most common resolution (and remember that the browser window may be much smaller than the screen):

  1. 1024×768: 28.32%
  2. 1280×1024: 25.84%
  3. 1280×800: 10.61%

A long tail of resolutions follows, but it’s worth noting that the classic 800×600 has only 1.96%. Better news comes from colour depth, where almost everyone has 16bpp or better:

  1. 32bpp: 80.29%
  2. 24bpp: 11.89%
  3. 16bpp: 7.37%

Traffic

Search engines, referrers, and direct access were all important traffic sources:

  1. Search engines: 36.77%
  2. Referring sites: 34.97%
  3. Direct traffic: 28.22%

Blogs did show up among the referring sites, but the biggest traffic producers were traditional links from partner organizations (other conferences, IDEAlliance itself, etc.) — these were also the stickiest, since most people coming from these links went on to read more than one page.

As far as search engines go, I was surprised to find that nothing really matters but Google (assuming that Google Analytics isn’t biasing the numbers):

  1. Google: 94.16%
  2. Yahoo!: 3.46%
  3. Live: 1.51%
  4. MSN: 0.45%

I knew that Yahoo! and MSN were behind in search, but I had no idea just how bad it was (at least in the tech crowd). More than half of the people who found the site via a search engine went on to read more than one page.

The top search phrases were rather dull and predictable:

  1. “xml 2007″: 28.50%
  2. “xml conference”: 8.22%
  3. “xml conference 2007″: 3.20%
  4. “xml conferences” 3.04%

And so on through a very long tail. Individual speakers’ names start appearing soon, but none with more than 10 searches. I trolled through the low-frequency search phrases for something funny (and maybe risque), but all I came up with was the number “736″, which resulted in three visits. I gave up trying to find the site in the Google results for that number. Does anyone really search for a single three-digit integer, and if so, how many pages of results will that person scroll through?

LAMP stack stability

Thursday, January 10th, 2008

I’m using a single dedicated server to host ourairports.com, megginson.com, and a couple of minor domains. OurAirports is a database-heavy application using (currently) a MySQL v.5 database hosted on the same server. I’ll offload the database to a separate server if traffic keeps increasing, but as long as I’m getting compliments from tech people for my fast response times (mainly thanks to MySQL’s built-in query caching), there’s no point paying for extra hardware.

Uptime

My ISP set up the server for me last summer with a bare-bones Ubuntu distro, then I installed the extra packages I needed using aptitude over ssh. Since then, I’ve done many Ubuntu in-place upgrades, rolled out hundreds of changes and upgrades to the web apps and dozens to the database schema (some very significant), and upgraded WordPress n-teen times. Check this out:

$ uptime
 13:08:31 up 175 days, 10:02,  1 user,  load average: 0.23, 0.06, 0.02

That’s right — since my ISP first set up the server with a basic Ubuntu system, I’ve never had to restart it. In fact, if Apache and mod_php (PHP5) had ‘uptime’ commands, they’d show almost the same amount of time, since I restarted them only to make configuration changes in the first few days of setting up the server (unless apt stopped them to install a newer version during one of my upgrades). I’ve restarted MySQL more recently, but again, only to experiment with configuration changes (especially for fulltext).

-1 for being cool, +10 for having a life

Using reliable old technologies like Linux, Apache, MySQL, and PHP doesn’t win any cool points, but it certainly makes maintaining a web server and its applications easy. I can go on vacation, for example, without worrying about being able to get online to fix or restart my server every couple of days. I don’t have to stay up until 3:00 am on Sunday night so that I can take the server offline to roll out new software versions or bug fixes (aptitude installs any security fixes in place). I spend lots of time with my family. I go to my kids’ school concerts. I learned banjo and mandolin (why not, since I have the free time?).

It’s the developer, not the language

And yes, my PHP web app is easy to maintain and extend, because I designed it to be that way (I can often implement, test and roll out new features in a matter of minutes, even when they require database schema changes) — it’s the developer, not the programming language, that determines the quality and maintainability of an app. A lot of newbies use PHP, so there’s a lot of bad PHP out there, but the same can be said for any language, even Ruby.

Social web sites: the new Proprietors?

Thursday, January 3rd, 2008

Image: Thomas Penn, second proprietor of Pennsylvania, not as nice as his dad William.

Almost a year ago, I wrote that Open data matters more than Open Source — it doesn’t matter (to you, the end user) whether a web site is using Open Source software or not, if they still keep your data locked up.

Here’s a nasty example: Robert Scoble has just had his Facebook account disabled for running a script to try to scrape his personal information off the site (since Facebook doesn’t provide him with any other way to get it).

I understand that Facebook needs to protect against malicious bots — and they might decide to restore his account once they know what Robert was actually trying to do (though for now all traces of him have vanished) — but do we really want to have hope for the good will of social sites and beg for our own data every time we want it? Are web site owners the new version of the Proprietors in the early American colonies, who can grant rights as favours when they see fit?

E-mail users fight back

Sunday, December 16th, 2007

A bit over a year ago, I ran into an unusual problem — for several days, I stopped receiving messages from a customer (in the middle of an important project), then I discovered the messages all hidden deep in my (gmail-hosted) spam box. Everything from that domain was suddenly being flagged as spam.

What happened? This customer had a large mailing list that they used for announcements, etc. My guess is that they sent out an announcement, a lot of other gmail-users flagged it as spam, and whatever weighting algorithm gmail uses tipped it over so that the messages were no longer considered legit by default. I was able to train gmail not to treat those messages as spam (for me, specifically), but it took a week or two before I could trust that some of them weren’t being sent to the spam box.

Hard-core spammers have always had to deal with this kind of thing, and they spend a lot of time trying to figure out a way around it. What’s happening now, though, is that companies with legit (or semi-legit) e-mail lists are also starting to get into trouble, because web-mail makes it possible for hundreds or thousands of people to get together and all vote your e-mail to be undesirable.

The letter of the law isn’t enough

That this isn’t a legal thing. It doesn’t matter at all if your e-mail list is opt-in or opt-out, if the “Send me announcements” checkbox was checked by default or not, or if the recipient originally clicked 10 screens of disclaimers before buying your product/signing up for your service. If they don’t like the e-mail you’re sending them, they’ll just click “Spam”, even if you had a legal right to send it; and if enough of them do it, the e-mail value of your domain fast approaches nil.

You’d better make sure that your mass e-mails have stuff that people actually want to read:

  • I don’t care that your company just won five awards — SPAM! (even if I said before that it was OK to send me e-mails)
  • I probably do care that someone wants to connect with me on a social networking site that I actually use.
  • I don’t care that a merchant I did business with from 2 years ago has a Christmas special on something I’d never buy — SPAM!.
  • I don’t care that your web site has a new look — SPAM!
  • I don’t care that your company has a training session coming up in Tulsa, since I don’t live anywhere near there (and probably wouldn’t go anyway) — SPAM!
  • Yes, I am interested in the tracking info for the books I just ordered. Thanks.
  • I do care that there’s a substantive change to a site that I use a lot.
  • I don’t care about a change on a site I haven’t logged into for a year — SPAM!.

And so on.

This new collaboration is an unexpected side-effect of the shift from desktop e-mail clients to web mail, and it would be foolish for companies not to pay attention. If you consider your domain name to be a valuable part of your corporate identity, don’t piss it away by sending out poorly-targeted mass e-mails, because no matter what prior permission you have, people now can … and will … punish you. After all, it takes only a single mouse click.

Amazon SimpleDB (not very Codd-y)

Friday, December 14th, 2007

This might be of interest:

Amazon SimpleDB

Amazon’s announcement

Dear AWS Developers,

This is a short note to let a subset of our most active developers know about an upcoming limited beta of our newest web service: Amazon SimpleDB, which is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity.

Were excited about this upcoming service and wanted to let you know about it as soon as possible. We anticipate beginning the limited beta in the next few weeks. In the meantime, you can read more about the service, and sign up to be notified when the limited beta program opens and a spot becomes available for you. To do so, simply click the “Sign Up For This Web Service” button on the web site below and we will record your contact information.

Not much there, though

It’s not SQL, or even SQL-like, though, supporting only the operators “=, !=, <, > <=, >=, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION”. I’m no relational expert, but I don’t think Codd would have been impressed. A distributed database is one of the big missing pieces from Amazon’s services, but I’m not sure if this will be it.

How to spend all your free money

Tuesday, November 27th, 2007

Update: the site shopping cart is broken, and doesn’t properly remove items from the total owing — too bad.

Here’s one easy way: via TechCrunch, Deutsche Grammophon, the gold standard in renaissance/ baroque/ classical/ romantic/ orchestral/ opera/ etc. music (often confusingly referred to collectively as “classical”, roughly equivalent calling all popular music since 1890 “rap”), will start selling their catalogue as unprotected MP3s at midnight German time tonight (6:00 pm in New York City) at their new site dgwebshop.com.

As a teenager in the late 1970s, I used to visit the House of Sound in Kingston (Canada), where they had thousands of DG records — probably most of the catalogue — packed in tight on on shelves lining a wall of the store. I couldn’t always afford them, but I loved being able just to pull them out and take a look at the covers of the different famous recordings. These days, the so-called classical music section of any but a couple of specialized stores in big cities like New York or London have maybe one or two rows of worthless classical-pop compilations hidden behind the DVDs of TV series nobody watched in the 1980s — no wonder people don’t shop at record stores any more.

We tech types have been claiming for a while that music companies could make more money selling unprotected digital music, so here’s the test. I plan to give them a lot of my own money if the site actually works, though I should note a couple of caveats:

  1. Many current DG buyers are audiophiles who won’t be satisfied with the sound quality of MP3s (which are optimized more for boom-boom music), so this will probably open a new market for DG rather than leaching their current one.
  2. DG’s market is mostly affulent people outside the intense social environment of high school or university, so people will be less likely to share these MP3s — and even if they do, it will probably just act as a promo for the higher quality recordings.

I hope the site can handle the traffic. Rock on, Deutsche Grammophon!

First looks at OpenSocial: part 4 (content for persistence data)

Thursday, November 8th, 2007

Earlier postings:

I didn’t have time to look at the OpenSocial API yesterday, so I’m continuing today looking at the data format for the last major area, persistence data.

A vision thing?

My first impression of the persistence data API is that it doesn’t belong in v.1 of OpenSocial — unlike the member/friends and activities APIs, it doesn’t seem to be solving a core problem for social-site app writers (I have no way to get at a friends list except through the site’s API, but I can store my own data, thanks). I can see only two reasons that it’s here, neither of them very admirable:

  1. Because someone has a vision of a world where people can write social apps that run entirely on the client side with HTML/CSS/JavaScript, using only resources provided by the social site itself.
  2. Because the GData group in Google co-opted the designers to promote GData in the spec, the same way that the Blu-Ray group in Sony co-opted the PS3 to advance their agenda.

I’ll give Google the benefit of a doubt and assume that it’s a vision thing, but that’s still very unhealthy — specs should solve the real problems of the present, not the speculative problems of the future, especially bare-bones v.1 specs like this.

The format

Now that that’s out of my system, let’s take a look at what you get back from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global (and its many variants). From the spec, here’s what you get when you request a single piece of information from the API:

<entry xmlns='http://www.w3.org/2005/Atom'>
<title type="text">somekey</title>
<content type="text">somevalue</content>
</entry>

Or, in non-XML terms,

$globals{'somekey'} = 'somevalue'

That comes from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global/somekey which requests a single value. Using the first URL mentioned gets you a feed of name=value pairs, sort-of like an associative array:

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global</id>
<updated>2007-10-30T20:53:20.086Z</updated>
<title>Persistence</title>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<link rel='self' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<generator version='1.0' uri='/feeds'>Orkut</generator>
<entry>
  <id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey</id>
  <title>somekey</title>
  <content>somevalue</content>
  <link rel='self' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/>
  <link rel='edit' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/>
</entry>
</feed>

There’s only one entry in the spec’s example, but there could be a lot more. Basically, this is the equivalent of something like

$globals = { 'somekey' => 'somevalue' }

The comparison isn’t quite fair, because there are also some links explaining what you can do to modify this information, etc., but it still seems like a lot of markup for not much value (pun intended). I wonder if this would be a good place to use JSON instead of Atom+XML? After all, the serious apps will be doing their own data storage anyway, and the client-only apps will probably use a JavaScript API that hides the Atom from the developer.

Scope

As hinted at, at least, in my URL posting, there are several different data scopes:

  • All (global) data for this application on this social site (equivalent of static global variables?).
  • Data for this instance of the application only (equivalent of local or object variables?).
  • Data for this user in this application (i.e. your own profile info about the user, available every time your app runs).
  • Data for this user’s friends in this application (i.e. your own profile info about the friends, available every time your app runs).

It seems like a reasonable division of scope, especially since the app can’t get anything out that it didn’t put in.

Final thought (for now)

I do believe that, eventually, many web apps will be about to outsource storage as a service instead of having to maintain their own databases and database clusters — in fact, Amazon’s S3 and its competitors already provide precisely this service, though they might not be optimized for a lot of name=value look ups. I’m surprised though, that this could be considered a key feature of a social app spec, when so much else was left out.

First looks at OpenSocial: part 3 (content for activities)

Tuesday, November 6th, 2007

Earlier postings:

This is the third part of a series where I’m working through the OpenSocial specs as I write — that means that I haven’t preread and predigested this stuff, but am creating a record of how I approach a new set of specifications and try to understand them. First, I looked at the basic URLs for data access, since they provide the best high-level description of the OpenSocial capabilities (read-only info on members and their friends, read/write info on a member’s activity notifications, and a simple data-storage API). Next, I looked at the data format for the most important content, the member profile and friends lists. This time, I’ll look at the format for activity notifications, which is also based on the Atom syndication format.

Activities

To get a list of a member’s recent activities (uploaded a photo, poked a friend, got a new job, or stuff like that, I guess) an OpenSocial application uses the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId} according to the specs, though I suspect that might be intended to be http://{DOMAIN}/feeds/activities/user/{userId} for consistency with the other data-access URLs — it’s hard to be certain. The host should return an Atom feed of activities, like this template example lifted from the spec:

<atom:feed xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
    xmlns:gact='http://schemas.google.com/activities/2007'>
  <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID</atom:id>
  <atom:updated>1970-01-01T00:00:00.000Z</atom:updated>
  <atom:category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/activities/2007#activity'/>
  <atom:title>Feed title</atom:title>
  <atom:link rel='alternate' type='text/html' href='http://sourceID.com/123'/>
  <atom:link rel='http://schemas.google.com/g/2005#feed'
    type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:link rel='http://schemas.google.com/g/2005#post'
    type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:link rel='self' type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:author>
    <atom:name>unknown</atom:name>
  </atom:author>
  <openSearch:totalResults>1</openSearch:totalResults>
  <openSearch:startIndex>1</openSearch:startIndex>
  <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
  <atom:entry>
    <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1</atom:id>
    <atom:updated>2007-10-27T19:41:51.574Z</atom:updated>
    <atom:category scheme='http://schemas.google.com/g/2005#kind'
      term='http://schemas.google.com/activities/2007#activity'/>
    <atom:title>Activity title</atom:title>
    <atom:link rel='self' type='application/atom+xml'
      href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/>
    <atom:link rel='edit' type='application/atom+xml'
      href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/>
    <gact:received>2007-10-27T19:41:51.478Z</gact:received>
  </atom:entry>
</atom:feed>

There’s a lot of front-matter in this, so it’s hard to realize at first glance that it lists only a single activity (in the atom:entry element near the bottom). The entry itself uses mostly standard Atom elements, except for one extension element from the Google activities namespace, giving the date that the notification was received (received date is also important in the news industry, so maybe this is something Atom needs to add to its core). Other than that, the activity itself is easy enough to understand: it has a unique id, a couple of dates, a title (which seems also to serve as the sole description), and web links for viewing and editing.

Unlike the member and friends info, which was read-only, OpenSocial allows apps to post new activities and edit or delete existing ones, but only in what is called a “source-level feed” — that’s a list of a user’s activities limited to a single source (which, I assume, is the application), using the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId} (which, again, may be a typo with an extra “activities” path element at the beginning). In other words, an application can read activities from any source, but it can mess around only with the ones it created. I’m not sure yet how the application knows its source id, or how the host verifies the app’s identity, but I’ll be looking at those issues in a later posting.

For members and friends, I noted that the spec’s example included the OpenSearch namespace but didn’t use it. This time, the namespace is used for the totalResults, startIndex, and itemsPerPage elements. These suggest that it’s possible to page through long lists of activities, though I could find no mention of that in the spec. Again, I don’t know much about Atom, but I think that Atom-blessed way to handle paging would involve using “first”, “next”, and “last” links.

Still learning

I’m not deeply into social networking myself — with my adolescent children using Facebook, my joining that site would be like showing up in a leather jacket at their highschool dance, and 99% of the time I spend on the more grown-up sites like Plaxo, LinkedIn, and Dopplr is used approving connection requests. As a result, I wasn’t aware of how important activity notifications were for a social-networking site.

Whatever happens with OpenSocial, I have found it to be a good architectural introduction to social networking in 2007, though I suspect that the next thing I’m going to look at — the persistence data API — has more to do with Google’s business requirements than with social networking itself.

First looks at OpenSocial: part 2 (content for members and friends)

Monday, November 5th, 2007

See also First looks at OpenSocial: part 1 (URLs)

This is the second part of a series of postings describing how I’m trying to understand the technical specs for the new Google-led OpenSocial initiative. In the first part, I cut down through all the text in the specs to get at the basic URLs, which represent the raw skeleton of services defined by the spec. This time, I’m going to look at the data formats, starting with the real bread and butter of social networking, people and their friends.

The atomic age

The content format for OpenSocial is always the Atom syndication format, a competitor to RSS for syndicating blogs and other similar information. I haven’t spent very much time with Atom yet — I appreciate that it’s more fully-specified than RSS 2.0, but I already know RSS and have run into no practical problems with it (through I’m aware of the potential ones) — so I’m probably not going to notice if or where the OpenSocial specs are violating the spirit or even letter of the Atom specs. I’ve occasionally seen complaints from Atom-heads about Atom-compliance in Google’s GData, and assume those apply to OpenSocial as well.

People

When you ask an OpenSocial provider for information about a member (using the URL pattern http://{DOMAIN}/feeds/people/{userId}), the spec says you get back something like this, assuming you’re authorized to make the request (lifted straight from the spec, and not namespace-compliant):

<entry xmlns='http://www.w3.org/2005/Atom'
  xmlns:georss='http://www.georss.org/georss'
  xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569</id>
  <updated>2007-10-28T14:01:29.948-07:00</updated>
  <title>Elizabeth Bennet</title>
  <link rel='thumbnail' type='image/*'
    href='http://img1.orkut.com/images/small/1193601584/115566312.jpg'/>
  <link rel='alternate' type='text/html'
    href='http://orkut.com/Profile.aspx?uid=17583631990196664929'/>
  <link rel='self' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569'/>
  <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
      <gml:pos>51.668674 -0.066235</gml:pos>
    </gml:Point>
  </georss:where>
  <gd:extendedProperty name='lang' value='en-US'/>
  <gd:postalAddress/>
</entry>

Aside from the fact that the tech writer is a Jane Austen fan, a couple of other points jump out:

  1. In addition to the Atom namespace, they’re using the GeoRSS namespace to provide lat/lon information (so that you could place the person on a map, for example), the GML namespace (which the example forgets to declare), and the GData namespace for generally unimportant information like the postal address (who gives that out?).
  2. The two most important pieces of information seem to be the thumbnail picture/buddy icon and the member’s HTML profile page, both of which are the targets of typed links.

Of course, in reality, the most important information about a member is the member’s friends list, but that information comes through a separate URL, http://{DOMAIN}/feeds/people/{userId}/friends.

Friends

This example is also lifted from the spec (and is still missing the declaration for the GML namespace):

<feed xmlns='http://www.w3.org/2005/Atom'
  xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
  xmlns:georss='http://www.georss.org/georss'
  xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends</id>
  <updated>2007-10-28T21:01:03.690Z</updated>
  <title>Friends</title>
  <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/>
  <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/>
  <author><name>Elizabeth Bennet</name></author>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/02938391851054991972</id>
    <updated>2007-10-28T14:01:03.690-07:00</updated>
    <title>Jane Bennet</title>
    <link rel='thumbnail' type='image/*' href='http://img1.orkut.com/images/small/null'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=574036770800045389'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/02938391851054991972'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>51.668674 -0.066235</gml:pos></gml:Point></georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/12490088926525765025</id>
    <updated>2007-10-28T14:01:03.691-07:00</updated>
    <title>Charlotte Lucas</title>
    <link rel='thumbnail' type='image/*' href='http://img2.orkut.com/images/small/null'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=5799256900854924919'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/12490088926525765025'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>0.0 0.0</gml:pos></gml:Point></georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/15827776984733875930</id>
    <updated>2007-10-28T14:01:03.692-07:00</updated>
    <title>Fitzwilliam Darcy</title>
    <link rel='thumbnail' type='image/*' href='http://img3.orkut.com/images/small/1193603277/115555466.jpg'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=14256507824223085777'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/15827776984733875930'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>53.017016 -1.424363</gml:pos></gml:Point>
    </georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
</feed>

Again, very straight-forward, if not namespace-compliant (due to the missing GML namespace declaration). There’s also a declaration of an OpenSearch namespace URI that’s never used, suggesting a feature that was removed in haste just before release. The friends list is simply a feed of person entries, just like the single entry returned for the member query, with a title, date, etc. at the top. Note that you always get the full friends list — there’s no support for filtering — so this might not be fun for someone who has 10,000+ friends.

What I don’t see, either in the example or the spec, is a way to provide typed relationships, like “spouse”, “colleague”, “classmate”, etc. I don’t know how important that is to application developers — simply getting the list of friends is probably the most important thing.

First looks at OpenSocial: part 1 (URLs)

Saturday, November 3rd, 2007

In a year or two, we’ll know whether the Google-lead OpenSocial initiative was a turning point in the social web or just a weak shot fired across Facebook’s bow. In the meantime, I think it’s worth taking some time to digest the API docs, which are still pretty rough.

I don’t know what I’m talking about…

Instead of reading and understanding everything first and then posting from a (virtual) podium, I’m going to try to work out my own understanding of the APIs right here on the web. That means that I’ll be asking questions that I’ll find the answers for later, that I’ll be making incorrect assumptions, and that I’ll be deferring hard stuff (like authorization/authentication) until I understand the basics. This is not, then, an OpenSocial primer by any stretch, since I don’t actually know what I’m talking about, but it might be useful as a snapshot of how a developer approaches a new API.

It’s the URLs, stupid

OpenSocial is designed so that any app can get information from any site as long as it has permission (I’ll figure out how that works later) — to accomplish that, it uses standard URL patterns on every site returning Atom entries and feeds. So after digging through a lot of crackerjack, I finally found the prize buried in the docs. Here are the URL patterns:

Information about a person
http://{DOMAIN}/feeds/people/{userId}
GET only.
List of a person’s friends
http://{DOMAIN}/feeds/people/{userId}/friends
GET only.
List of a person’s activities
(Wrong?) http://{DOMAIN}/activities/feeds/activities/user/{userId}
GET only
List of a person’s activities from a single source
(Wrong?) http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId}
GET, POST, PUT, DELETE
Application-global data
http://{DOMAIN}/feeds/apps/{appId}/persistence/global
http://{DOMAIN}/feeds/apps/{appId}/persistence/global/{partKey}
GET, POST, PUT, DELETE
Per-instance data
http://{DOMAIN}/feeds/apps/{appId}/persistence/{userId}/instance/{instanceId}/{partKey}
GET, POST, PUT, DELETE
Shared user data
http://{DOMAIN}/feeds/apps/{appID}/persistence/{userId}/shared/{partKey}
GET, POST, PUT, DELETE
Friends’ shared data
http://{DOMAIN}/feeds/apps/{appID}/persistence/{userId}/friends
GET, POST, PUT, DELETE

Did I miss anything?

Listing all the URLs together like this, instead of spreading them out over pages and pages of docs, is the best way to start with a REST API. For example, you can tell at a glance what what kind of information is available and what is and isn’t writable (you can’t add new friends for a user, but you can add a new activity). Sure, Javascript libraries, etc. are nice, but the class hierarchies can obscure how simple the underlying data actually is (or “are”, if you’ve studied Latin). You can also spot possible typos in the docs — for example, what are the odds that the activity URLs are really supposed to start with “/activities/feeds/” when everything else starts with “/feeds/”? It could be poor, inconsistent design, but I suspect cut-and-paste errors.

Next time: content

The next time I get around to looking at OpenSocial, I’ll try to figure out the formats — it shouldn’t be too hard, since they’re all Atom entries or feeds. Then I’ll get into messier stuff like auth/auth, and I may eventually try adding OpenSocial support to my OurAirports hobby site, though it doesn’t even support friends yet.

Two problems with Google Maps for aviation

Wednesday, August 29th, 2007

I love Google Maps and their API, and am using it extensively in my new web site OurAirports. However, there are two problems that keep coming up for using Google Maps with an aviation application:

[Diagram of Mercator projection]

  1. Google Maps uses a Mercator Projection, grossly distorting the northern and southern parts of the world, and cutting off the area near the poles so that a few of the Antarctic airports don’t show up on my maps at all. I can understand the reasons for their choice, with simple panning and tile paging and a rectangular area, but it can make things look pretty silly sometimes (such as Greenland and Africa appearing the same size).

  2. Google Maps does not provide an API call to draw a great-circle path. This seems to me to be almost a no-brainer, and it’s especially important in a Mercator projection, where the apparently straight paths drawn by the API are anything but (especially east-west). After messing with some out-of-date third-party libraries, I finally found some JavaScript at one site that does a good job on efficient, approximate great-circle paths, and am waiting to hear from the author about terms for reuse. Google might want to just go ahead and add this, though.

[Diagram of Mercator projection]

Aviation charts mostly use a Lambert conformal conic projection, which ensures that distances are preserved (any two points the same distance apart on the chart are the same distance apart in the real world); however, by definition this projection can’t show more than half the world at once, and generally shows much less than that, so it wouldn’t work for something like Google Maps.