(Skip to main content.)

Blogs Quoderat Land and Hold Short

Quoderat

Archive for the 'programming' Category

LAMP stack stability

Thursday, January 10th, 2008

I’m using a single dedicated server to host ourairports.com, megginson.com, and a couple of minor domains. OurAirports is a database-heavy application using (currently) a MySQL v.5 database hosted on the same server. I’ll offload the database to a separate server if traffic keeps increasing, but as long as I’m getting compliments from tech people for my fast response times (mainly thanks to MySQL’s built-in query caching), there’s no point paying for extra hardware.

Uptime

My ISP set up the server for me last summer with a bare-bones Ubuntu distro, then I installed the extra packages I needed using aptitude over ssh. Since then, I’ve done many Ubuntu in-place upgrades, rolled out hundreds of changes and upgrades to the web apps and dozens to the database schema (some very significant), and upgraded WordPress n-teen times. Check this out:

$ uptime
 13:08:31 up 175 days, 10:02,  1 user,  load average: 0.23, 0.06, 0.02

That’s right — since my ISP first set up the server with a basic Ubuntu system, I’ve never had to restart it. In fact, if Apache and mod_php (PHP5) had ‘uptime’ commands, they’d show almost the same amount of time, since I restarted them only to make configuration changes in the first few days of setting up the server (unless apt stopped them to install a newer version during one of my upgrades). I’ve restarted MySQL more recently, but again, only to experiment with configuration changes (especially for fulltext).

-1 for being cool, +10 for having a life

Using reliable old technologies like Linux, Apache, MySQL, and PHP doesn’t win any cool points, but it certainly makes maintaining a web server and its applications easy. I can go on vacation, for example, without worrying about being able to get online to fix or restart my server every couple of days. I don’t have to stay up until 3:00 am on Sunday night so that I can take the server offline to roll out new software versions or bug fixes (aptitude installs any security fixes in place). I spend lots of time with my family. I go to my kids’ school concerts. I learned banjo and mandolin (why not, since I have the free time?).

It’s the developer, not the language

And yes, my PHP web app is easy to maintain and extend, because I designed it to be that way (I can often implement, test and roll out new features in a matter of minutes, even when they require database schema changes) — it’s the developer, not the programming language, that determines the quality and maintainability of an app. A lot of newbies use PHP, so there’s a lot of bad PHP out there, but the same can be said for any language, even Ruby.

Amazon SimpleDB (not very Codd-y)

Friday, December 14th, 2007

This might be of interest:

Amazon SimpleDB

Amazon’s announcement

Dear AWS Developers,

This is a short note to let a subset of our most active developers know about an upcoming limited beta of our newest web service: Amazon SimpleDB, which is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity.

Were excited about this upcoming service and wanted to let you know about it as soon as possible. We anticipate beginning the limited beta in the next few weeks. In the meantime, you can read more about the service, and sign up to be notified when the limited beta program opens and a spot becomes available for you. To do so, simply click the “Sign Up For This Web Service” button on the web site below and we will record your contact information.

Not much there, though

It’s not SQL, or even SQL-like, though, supporting only the operators “=, !=, <, > <=, >=, STARTS-WITH, AND, OR, NOT, INTERSECTION AND UNION”. I’m no relational expert, but I don’t think Codd would have been impressed. A distributed database is one of the big missing pieces from Amazon’s services, but I’m not sure if this will be it.

First looks at OpenSocial: part 4 (content for persistence data)

Thursday, November 8th, 2007

Earlier postings:

I didn’t have time to look at the OpenSocial API yesterday, so I’m continuing today looking at the data format for the last major area, persistence data.

A vision thing?

My first impression of the persistence data API is that it doesn’t belong in v.1 of OpenSocial — unlike the member/friends and activities APIs, it doesn’t seem to be solving a core problem for social-site app writers (I have no way to get at a friends list except through the site’s API, but I can store my own data, thanks). I can see only two reasons that it’s here, neither of them very admirable:

  1. Because someone has a vision of a world where people can write social apps that run entirely on the client side with HTML/CSS/JavaScript, using only resources provided by the social site itself.
  2. Because the GData group in Google co-opted the designers to promote GData in the spec, the same way that the Blu-Ray group in Sony co-opted the PS3 to advance their agenda.

I’ll give Google the benefit of a doubt and assume that it’s a vision thing, but that’s still very unhealthy — specs should solve the real problems of the present, not the speculative problems of the future, especially bare-bones v.1 specs like this.

The format

Now that that’s out of my system, let’s take a look at what you get back from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global (and its many variants). From the spec, here’s what you get when you request a single piece of information from the API:

<entry xmlns='http://www.w3.org/2005/Atom'>
<title type="text">somekey</title>
<content type="text">somevalue</content>
</entry>

Or, in non-XML terms,

$globals{'somekey'} = 'somevalue'

That comes from a URL like http://{DOMAIN}/feeds/apps/{appId}/persistence/global/somekey which requests a single value. Using the first URL mentioned gets you a feed of name=value pairs, sort-of like an associative array:

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global</id>
<updated>2007-10-30T20:53:20.086Z</updated>
<title>Persistence</title>
<link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<link rel='self' type='application/atom+xml'
  href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global'/>
<generator version='1.0' uri='/feeds'>Orkut</generator>
<entry>
  <id>http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey</id>
  <title>somekey</title>
  <content>somevalue</content>
  <link rel='self' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/>
  <link rel='edit' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/apps/02864641990088926753/persistence/global/somekey'/>
</entry>
</feed>

There’s only one entry in the spec’s example, but there could be a lot more. Basically, this is the equivalent of something like

$globals = { 'somekey' => 'somevalue' }

The comparison isn’t quite fair, because there are also some links explaining what you can do to modify this information, etc., but it still seems like a lot of markup for not much value (pun intended). I wonder if this would be a good place to use JSON instead of Atom+XML? After all, the serious apps will be doing their own data storage anyway, and the client-only apps will probably use a JavaScript API that hides the Atom from the developer.

Scope

As hinted at, at least, in my URL posting, there are several different data scopes:

  • All (global) data for this application on this social site (equivalent of static global variables?).
  • Data for this instance of the application only (equivalent of local or object variables?).
  • Data for this user in this application (i.e. your own profile info about the user, available every time your app runs).
  • Data for this user’s friends in this application (i.e. your own profile info about the friends, available every time your app runs).

It seems like a reasonable division of scope, especially since the app can’t get anything out that it didn’t put in.

Final thought (for now)

I do believe that, eventually, many web apps will be about to outsource storage as a service instead of having to maintain their own databases and database clusters — in fact, Amazon’s S3 and its competitors already provide precisely this service, though they might not be optimized for a lot of name=value look ups. I’m surprised though, that this could be considered a key feature of a social app spec, when so much else was left out.

First looks at OpenSocial: part 3 (content for activities)

Tuesday, November 6th, 2007

Earlier postings:

This is the third part of a series where I’m working through the OpenSocial specs as I write — that means that I haven’t preread and predigested this stuff, but am creating a record of how I approach a new set of specifications and try to understand them. First, I looked at the basic URLs for data access, since they provide the best high-level description of the OpenSocial capabilities (read-only info on members and their friends, read/write info on a member’s activity notifications, and a simple data-storage API). Next, I looked at the data format for the most important content, the member profile and friends lists. This time, I’ll look at the format for activity notifications, which is also based on the Atom syndication format.

Activities

To get a list of a member’s recent activities (uploaded a photo, poked a friend, got a new job, or stuff like that, I guess) an OpenSocial application uses the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId} according to the specs, though I suspect that might be intended to be http://{DOMAIN}/feeds/activities/user/{userId} for consistency with the other data-access URLs — it’s hard to be certain. The host should return an Atom feed of activities, like this template example lifted from the spec:

<atom:feed xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
    xmlns:gact='http://schemas.google.com/activities/2007'>
  <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID</atom:id>
  <atom:updated>1970-01-01T00:00:00.000Z</atom:updated>
  <atom:category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/activities/2007#activity'/>
  <atom:title>Feed title</atom:title>
  <atom:link rel='alternate' type='text/html' href='http://sourceID.com/123'/>
  <atom:link rel='http://schemas.google.com/g/2005#feed'
    type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:link rel='http://schemas.google.com/g/2005#post'
    type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:link rel='self' type='application/atom+xml'
    href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID'/>
  <atom:author>
    <atom:name>unknown</atom:name>
  </atom:author>
  <openSearch:totalResults>1</openSearch:totalResults>
  <openSearch:startIndex>1</openSearch:startIndex>
  <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
  <atom:entry>
    <atom:id>http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1</atom:id>
    <atom:updated>2007-10-27T19:41:51.574Z</atom:updated>
    <atom:category scheme='http://schemas.google.com/g/2005#kind'
      term='http://schemas.google.com/activities/2007#activity'/>
    <atom:title>Activity title</atom:title>
    <atom:link rel='self' type='application/atom+xml'
      href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/>
    <atom:link rel='edit' type='application/atom+xml'
      href='http://www.google.com/activities/feeds/activities/user/userID/source/sourceID/a1'/>
    <gact:received>2007-10-27T19:41:51.478Z</gact:received>
  </atom:entry>
</atom:feed>

There’s a lot of front-matter in this, so it’s hard to realize at first glance that it lists only a single activity (in the atom:entry element near the bottom). The entry itself uses mostly standard Atom elements, except for one extension element from the Google activities namespace, giving the date that the notification was received (received date is also important in the news industry, so maybe this is something Atom needs to add to its core). Other than that, the activity itself is easy enough to understand: it has a unique id, a couple of dates, a title (which seems also to serve as the sole description), and web links for viewing and editing.

Unlike the member and friends info, which was read-only, OpenSocial allows apps to post new activities and edit or delete existing ones, but only in what is called a “source-level feed” — that’s a list of a user’s activities limited to a single source (which, I assume, is the application), using the URL pattern http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId} (which, again, may be a typo with an extra “activities” path element at the beginning). In other words, an application can read activities from any source, but it can mess around only with the ones it created. I’m not sure yet how the application knows its source id, or how the host verifies the app’s identity, but I’ll be looking at those issues in a later posting.

For members and friends, I noted that the spec’s example included the OpenSearch namespace but didn’t use it. This time, the namespace is used for the totalResults, startIndex, and itemsPerPage elements. These suggest that it’s possible to page through long lists of activities, though I could find no mention of that in the spec. Again, I don’t know much about Atom, but I think that Atom-blessed way to handle paging would involve using “first”, “next”, and “last” links.

Still learning

I’m not deeply into social networking myself — with my adolescent children using Facebook, my joining that site would be like showing up in a leather jacket at their highschool dance, and 99% of the time I spend on the more grown-up sites like Plaxo, LinkedIn, and Dopplr is used approving connection requests. As a result, I wasn’t aware of how important activity notifications were for a social-networking site.

Whatever happens with OpenSocial, I have found it to be a good architectural introduction to social networking in 2007, though I suspect that the next thing I’m going to look at — the persistence data API — has more to do with Google’s business requirements than with social networking itself.

First looks at OpenSocial: part 2 (content for members and friends)

Monday, November 5th, 2007

See also First looks at OpenSocial: part 1 (URLs)

This is the second part of a series of postings describing how I’m trying to understand the technical specs for the new Google-led OpenSocial initiative. In the first part, I cut down through all the text in the specs to get at the basic URLs, which represent the raw skeleton of services defined by the spec. This time, I’m going to look at the data formats, starting with the real bread and butter of social networking, people and their friends.

The atomic age

The content format for OpenSocial is always the Atom syndication format, a competitor to RSS for syndicating blogs and other similar information. I haven’t spent very much time with Atom yet — I appreciate that it’s more fully-specified than RSS 2.0, but I already know RSS and have run into no practical problems with it (through I’m aware of the potential ones) — so I’m probably not going to notice if or where the OpenSocial specs are violating the spirit or even letter of the Atom specs. I’ve occasionally seen complaints from Atom-heads about Atom-compliance in Google’s GData, and assume those apply to OpenSocial as well.

People

When you ask an OpenSocial provider for information about a member (using the URL pattern http://{DOMAIN}/feeds/people/{userId}), the spec says you get back something like this, assuming you’re authorized to make the request (lifted straight from the spec, and not namespace-compliant):

<entry xmlns='http://www.w3.org/2005/Atom'
  xmlns:georss='http://www.georss.org/georss'
  xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569</id>
  <updated>2007-10-28T14:01:29.948-07:00</updated>
  <title>Elizabeth Bennet</title>
  <link rel='thumbnail' type='image/*'
    href='http://img1.orkut.com/images/small/1193601584/115566312.jpg'/>
  <link rel='alternate' type='text/html'
    href='http://orkut.com/Profile.aspx?uid=17583631990196664929'/>
  <link rel='self' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569'/>
  <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
      <gml:pos>51.668674 -0.066235</gml:pos>
    </gml:Point>
  </georss:where>
  <gd:extendedProperty name='lang' value='en-US'/>
  <gd:postalAddress/>
</entry>

Aside from the fact that the tech writer is a Jane Austen fan, a couple of other points jump out:

  1. In addition to the Atom namespace, they’re using the GeoRSS namespace to provide lat/lon information (so that you could place the person on a map, for example), the GML namespace (which the example forgets to declare), and the GData namespace for generally unimportant information like the postal address (who gives that out?).
  2. The two most important pieces of information seem to be the thumbnail picture/buddy icon and the member’s HTML profile page, both of which are the targets of typed links.

Of course, in reality, the most important information about a member is the member’s friends list, but that information comes through a separate URL, http://{DOMAIN}/feeds/people/{userId}/friends.

Friends

This example is also lifted from the spec (and is still missing the declaration for the GML namespace):

<feed xmlns='http://www.w3.org/2005/Atom'
  xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
  xmlns:georss='http://www.georss.org/georss'
  xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends</id>
  <updated>2007-10-28T21:01:03.690Z</updated>
  <title>Friends</title>
  <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml'
    href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/>
  <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569/friends'/>
  <author><name>Elizabeth Bennet</name></author>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/02938391851054991972</id>
    <updated>2007-10-28T14:01:03.690-07:00</updated>
    <title>Jane Bennet</title>
    <link rel='thumbnail' type='image/*' href='http://img1.orkut.com/images/small/null'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=574036770800045389'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/02938391851054991972'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>51.668674 -0.066235</gml:pos></gml:Point></georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/12490088926525765025</id>
    <updated>2007-10-28T14:01:03.691-07:00</updated>
    <title>Charlotte Lucas</title>
    <link rel='thumbnail' type='image/*' href='http://img2.orkut.com/images/small/null'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=5799256900854924919'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/12490088926525765025'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>0.0 0.0</gml:pos></gml:Point></georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
  <entry>
    <id>http://sandbox.orkut.com:80/feeds/people/15827776984733875930</id>
    <updated>2007-10-28T14:01:03.692-07:00</updated>
    <title>Fitzwilliam Darcy</title>
    <link rel='thumbnail' type='image/*' href='http://img3.orkut.com/images/small/1193603277/115555466.jpg'/>
    <link rel='alternate' type='text/html' href='http://sandbox.orkut.com:80/Profile.aspx?uid=14256507824223085777'/>
    <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/15827776984733875930'/>
    <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
    <gml:pos>53.017016 -1.424363</gml:pos></gml:Point>
    </georss:where>
    <gd:extendedProperty name='lang' value='en-US'/>
    <gd:postalAddress/>
  </entry>
</feed>

Again, very straight-forward, if not namespace-compliant (due to the missing GML namespace declaration). There’s also a declaration of an OpenSearch namespace URI that’s never used, suggesting a feature that was removed in haste just before release. The friends list is simply a feed of person entries, just like the single entry returned for the member query, with a title, date, etc. at the top. Note that you always get the full friends list — there’s no support for filtering — so this might not be fun for someone who has 10,000+ friends.

What I don’t see, either in the example or the spec, is a way to provide typed relationships, like “spouse”, “colleague”, “classmate”, etc. I don’t know how important that is to application developers — simply getting the list of friends is probably the most important thing.

First looks at OpenSocial: part 1 (URLs)

Saturday, November 3rd, 2007

In a year or two, we’ll know whether the Google-lead OpenSocial initiative was a turning point in the social web or just a weak shot fired across Facebook’s bow. In the meantime, I think it’s worth taking some time to digest the API docs, which are still pretty rough.

I don’t know what I’m talking about…

Instead of reading and understanding everything first and then posting from a (virtual) podium, I’m going to try to work out my own understanding of the APIs right here on the web. That means that I’ll be asking questions that I’ll find the answers for later, that I’ll be making incorrect assumptions, and that I’ll be deferring hard stuff (like authorization/authentication) until I understand the basics. This is not, then, an OpenSocial primer by any stretch, since I don’t actually know what I’m talking about, but it might be useful as a snapshot of how a developer approaches a new API.

It’s the URLs, stupid

OpenSocial is designed so that any app can get information from any site as long as it has permission (I’ll figure out how that works later) — to accomplish that, it uses standard URL patterns on every site returning Atom entries and feeds. So after digging through a lot of crackerjack, I finally found the prize buried in the docs. Here are the URL patterns:

Information about a person
http://{DOMAIN}/feeds/people/{userId}
GET only.
List of a person’s friends
http://{DOMAIN}/feeds/people/{userId}/friends
GET only.
List of a person’s activities
(Wrong?) http://{DOMAIN}/activities/feeds/activities/user/{userId}
GET only
List of a person’s activities from a single source
(Wrong?) http://{DOMAIN}/activities/feeds/activities/user/{userId}/source/{sourceId}
GET, POST, PUT, DELETE
Application-global data
http://{DOMAIN}/feeds/apps/{appId}/persistence/global
http://{DOMAIN}/feeds/apps/{appId}/persistence/global/{partKey}
GET, POST, PUT, DELETE
Per-instance data
http://{DOMAIN}/feeds/apps/{appId}/persistence/{userId}/instance/{instanceId}/{partKey}
GET, POST, PUT, DELETE
Shared user data
http://{DOMAIN}/feeds/apps/{appID}/persistence/{userId}/shared/{partKey}
GET, POST, PUT, DELETE
Friends’ shared data
http://{DOMAIN}/feeds/apps/{appID}/persistence/{userId}/friends
GET, POST, PUT, DELETE

Did I miss anything?

Listing all the URLs together like this, instead of spreading them out over pages and pages of docs, is the best way to start with a REST API. For example, you can tell at a glance what what kind of information is available and what is and isn’t writable (you can’t add new friends for a user, but you can add a new activity). Sure, Javascript libraries, etc. are nice, but the class hierarchies can obscure how simple the underlying data actually is (or “are”, if you’ve studied Latin). You can also spot possible typos in the docs — for example, what are the odds that the activity URLs are really supposed to start with “/activities/feeds/” when everything else starts with “/feeds/”? It could be poor, inconsistent design, but I suspect cut-and-paste errors.

Next time: content

The next time I get around to looking at OpenSocial, I’ll try to figure out the formats — it shouldn’t be too hard, since they’re all Atom entries or feeds. Then I’ll get into messier stuff like auth/auth, and I may eventually try adding OpenSocial support to my OurAirports hobby site, though it doesn’t even support friends yet.

[not] Protecting web sites and services from DNS rebinding attacks

Wednesday, August 1st, 2007

Update: Nope, my solution won’t work. As Christian Matthies points out in the comments, it is possible to spoof the HTTP Host header as well (his link in the comment is broken because of an extra comma, but this one works). As a kludge, browsers could be modified to prevent Host header spoofing, but (a) it would take a long time to deploy to the world at large, and (b) it would be only a bandaid for a much bigger problem.

Summary: While there’s no way to protect browsers against the DNS rebinding attack, you can protect web sites and web services by forcing them to check the HTTP Host header with every request. This is easy to do for RESTful services going through a regular web server like Apache — you get it by default with virtual hosts — but might be trickier for WS-* services.

If you or your company is using HTTP-based web services (either WS-* or REST), you might be in trouble — a new exploit allows a web site from outside your firewall to use a web browser as a proxy to read any web site or service inside your firewall.

Artur Bergman at O’Reilly has a posting on the DNS rebinding (aka anti-DNS-pinning) attack that works against all major browsers, including all versions of Firefox and MSIE. There’s no obvious general fix for this, though there’s a Firefox extension that helps a tiny bit.

The attack

In a DNS-rebinding attack, the attacker is able to force your browser to read data from any IP address that your browser has access to, even if you’re behind a router/firewall, by changing the IP address associated with a domain name you’ve connected to. That means that given an IP address, an outside attacker can read your local website (at 127.0.0.1), anything behind your corporate firewall (such as an Intranet accounting page or a web service), or — I think (haven’t tested yet) — a website that you’re logged into using a cookie (HTTP authentication will force a popup, since the browser will see a different domain name, even if you’re logged into the site in another tab/window). If you run a local web server on your computer (say, at 127.0.0.1), you can go to http://www.jumperz.net/index.php?i=2&a=1&b=7, type in the local address, and see jumperz.net use the exploit display the source of your home page.

The defence

There’s no way to protect the browser yet, but you can protect your HTTP-based sites and services from this attack very easily — in fact, many sites on the web are already unknowingly protected, though I don’t know if most enterprise web services are.

The trick is in the HTTP Host header. While the DNS rebinding attack can associate a new IP address with a hostname, it cannot change the hostname itself, so the browser will still send the original hostname to the new host. Nearly all shared-hosting servers — and many servers at dedicated hosts as well — will check the Host header to decide what pages to serve out. As long as the site does something harmless when it gets an unrecognized hostname (such as returning a “501 Not implemented” HTTP status code), the site will be safe the attack. In Apache, for example, you use the ServerName directive for each virtual host, and just make sure that there’s a default virtual host that returns an error or at least does nothing harmful.

For Web Services, the same thing applies. It’s often tempting to use IP addresses instead of hostnames for web services (including RESTful services), especially during development, but doing so opens you right up to a DNS-rebinding attack, which could be very harmful if you’re using real data for development and testing. To protect your HTTP-based services from this attack, you need to make sure that every web service is accessed via a hostname rather than a raw IP address, and that every service checks its hostname. For RESTful services, this is trivially easy (since you’re probably going through Apache or something similar anyway, just as with a web site); for WS-* services, I don’t know the implementations well enough to be sure, but it should be possible to force them to check the Host header somehow.

Even if you’re not building web services, managing an enterprise intranet, or running a public web site, don’t forget to protect the web server on your local computer, if you have one.

Three simple tips for LAMP web site developers

Saturday, July 21st, 2007

You’ve learned to write some basic HTML, CSS, PHP/Python/Perl and SQL, found a hosting service, and are ready to create your first LAMP web application. You’ve already read a bit about security (you know always to escape user-supplied parameters, etc.). Here are a three very simple tips that will help you along right at the start, without getting caught up in religious wars about frameworks, MVC, REST, abstraction, object orientation, etc.:

  1. Keep all the database code together. Put all your database calls into a single source file if you can — functions like mysqli_query (PHP) should never appear anywhere else but in this file — and create neutral functions like get_member() or delete_cart() for the rest of your code to call. The reason for this is not so that you can switch databases in the future (that’s easy enough to fix), but so that you can easily do a search/replace when you rename or modify tables. If all your database code is in the same place, your application will be orders of magnitude easier to maintain and upgrade a few months from now. Seriously.

  2. Make an extra database for junk. If your hosting account allows more than one database, create at least two, say “foo” and “foo_cache” — put all the tables you need to back up into the first one, and all the stuff you don’t need to back up (views, caching tables, session states, etc.) into the second. Write a SQL script to automatically regenerate any required tables in “foo_cache” when you restore. That way, you won’t waste time and bandwidth every day backing up megabytes or gigabytes of stuff you don’t need and can easily regenerate.

  3. Make GET harmless. If you use HTTP GET (e.g. $_GET in PHP) to do things like deleting or modifying records, bad things will happen to your application — search engines will start randomly changing your database by following links (robots.txt might not be enough to protect you), browsers will delete records by trying to precache pages, etc. Always use POST (normally from a form button) for anything that can make a change. More here.

Coding lessons from university

Wednesday, June 27th, 2007

Dare Obasanjo, smart code guy and occasional punching bag for the anti-Microsoft people, is collecting lists of Three Things I Learned About Software In College. I posted mine in a comment on his blog, but decided to reproduce them here. Note that these are not lessons you learned 10 or 20 years later, but what you discovered back then.

I coded a lot in university — some of it for pay — but fortunately, I didn’t study computer science or engineering. Here are my major lessons:

  1. Readable code goes further and survives longer than optimized code, especially once you’re no longer the one maintaining it (or if you have to come back to it two years later).

  2. If you write code that makes you feel like a genius, throw it out — you’ll realize later that it’s crap. If you write code that makes you feel like a competent tradesman, you’re on the right track.

  3. No matter how smart you are, everyone — even the most incompetent loser of a coder — knows at least one thing you don’t. It’s a good idea to listen.

Note: If you want to record your own list of three things, please leave it as a comment to Dare’s original posting, not here.

Maybe the women are right

Thursday, June 21st, 2007

Summary: Perhaps the women who don’t choose computer programming are making a good choice, especially with the deteriorating working conditions, stagnant or falling salaries, and offshoring.

Recently, we’ve had a few postings about women in computing (or the lack thereof) — see Bray, Wood, Tenison, and Bray (again), all ignited by a piece in devChix.

These postings all assume that we need to do something to pull more women into coding. Why? Do we think there are there lots of women would be happy coding, but aren’t smart enough or motivated enough to choose the right careers for themselves, or are too timid to deal with any barriers unless someone comes along and dismantles them first?

Listen to the market

In an age where we’ve come to trust central planning less and the free market more, why not try to learn from the labour market instead of trying to push it ways it doesn’t want to go?

If we assume that the majority of working women are smart, strong, motivated, and brave, then we can also assume that they have good reasons for choosing their careers. And in fact, it turns out that their track record isn’t bad. For example, in the 1970s and 1980s, women were grossly underrepresented in manufacturing and overrepresented in lower-paying service-industry jobs like retail. But when manufacturing starting offshoring in the 1980s and 1990s, it was the women who were still working (often as managers, at this point), while the men were at home, depressed, collecting welfare cheques or trying to retrain for jobs that paid a fraction of what they used to earn.

Now, while there’s lots of work connected with tech, we see pure coding increasingly being offshored, the same way that manufacturing was 20 years ago. There’s no shortage of women working in jobs connected with computers, but instead of coding, many women choose onsite consulting, training, marketing, and other jobs that are not only social but require face time with customers, and as a result, are much more difficult to offshore.

Of course, if you absolutely love coding, like I do (and most of the people reading this do), you’re going to work hard to try to find a way to keep doing it, whether you’re a man or a woman. But if you don’t feel that burning love, why let yourself be dragged kicking and screaming into an industry where salaries are falling, jobs are fleeing, hours are increasing (bye bye weekends!), and workers are increasingly treated as interchangeable cogs on a development assembly line, without even the (questionable) union protection their parents had in their factory jobs 20-30 years ago?

Ruby on Rails pain at Twitter

Thursday, April 12th, 2007

Josh Kenzer has posted an interview with Alex Payne, a developer for Twitter, which is one of (if not the) biggest Ruby on Rails-based web apps.

A couple of years ago, when I was getting tired of working within the confines of the Java/J2EE bubblesphere, I tried out PHP and Ruby on Rails, intended to like Rails; instead, I surprised myself by preferring PHP, an ugly hack of a language optimized for script kiddies (I’ve been using it ever since). It looks like Payne is coming to the same conclusion, as his team has ended up working to keep Twitter running despite RoR instead of because of it. Here is an excerpt:

All the convenience methods and syntactical sugar that makes Rails such a pleasure for coders ends up being absolutely punishing, performance-wise. Once you hit a certain threshold of traffic, either you need to strip out all the costly neat stuff that Rails does for you (RJS, ActiveRecord, ActiveSupport, etc.) or move the slow parts of your application out of Rails, or both.

There’s lots more in Kenzer’s posting, including Payne’s claim (I don’t know enough to verify) that Rails cannot support more than one database at once, and that “Running on Rails has forced us to deal with scaling issues — issues that any growing site eventually contends with — far sooner than I think we would on another framework.”.

In praise of architecture astronauts

Thursday, January 4th, 2007

Six years ago, Joel Spolsky wrote a piece on Architecture Astronauts, people who get so obsessed with the big picture that they miss the important little details that actually make things work. More recently, Dare Obasanjo pointed to Spolsky’s piece in his posting XML Has Too Many Architecture Astronauts.

I’d like to start by agreeing with Dare: XML does have too many architecture astronauts, and almost everything that’s bad, ugly, or simply scary about the huge number of standards built around XML (WS-* springs immediately to mind, but it’s not alone) comes from gross overgeneralization. That said, architecture astronauts do have their place, and we ignore them at our peril.

Case 1: Napster

Let’s start by turning Spolsky’s main example (which Dare cites) on its head. Here are two different perspectives on Napster circa 2001:

Architecture pedestrian: Napster lets people find and download songs.

Architecture astronaut: Peer-to-peer networks let people find and download songs. Napster is (was) a peer-to-peer network.

Spolsky writes about how architecture astronaut perspective helped to fuel a mini-P2P bubble at the time, with investors pouring wasted money into P2P-everything, when Napster’s success was due not to the fact that it was P2P but to the fact that it let people get songs easily. However, consider what was happening at the same time in the music industry. Rightly or wrongly, they wanted to stop people from sharing songs. The architecture pedestrian perspective (my term, not Spolsky’s) told them that Napster lets people find and download songs, so the industry spent millions of dollars in legal fees, PR, etc. shutting down Napster. The result? People downloaded even more music. After all, as the astronauts said, it was P2P networks that let people share music, not Napster in particular. Since then, the music industry has been fighting the equivalent of an insurgency, putting down one uprising after another with no end in sight.

Case #2: The Netscape IPO

My second example took place over 11 years ago, kicking of the much larger dot.com bubble (the P2P mini-bubble was just a tiny part of its tail). It was around 1995 that most non-techies noticed the web, mostly through the lens of the Netscape browser. Again, the architecture pedestrian and the architecture astronaut looked at this differently:

Architecture pedestrian: Netscape lets people see text and pictures online.

Architecture astronaut: The web allows people to put text and pictures online. Netscape is a web browser.

This time, the investors listened to the architecture pedestrian rather than the architecture astronaut: Netscape was set to open at $14/share, doubled to $28/share, and climbed to $75/share on the first day, and eventually reached a peak market cap of $8 billion. The astronauts knew all along, however, that while people (at the time) thought of the web in terms of the Netscape browser, the web wasn’t Netscape. If Internet Explorer hadn’t knocked Netscape off its perch (resulting in layoffs as early as January 1998), some other browser soon would have.

Case #3: XML

So how does this all apply to XML? I think that there are two ways that architecture astronauts can approach XML, one good and one bad. The bad one is in line with Spolsky’s original piece, where people miss what made XML popular (relative simplicity, no need to create DTDS, etc.) and believe that if a bit of standardization is good, a lot must be even better. The good one is to step back and point out that most of the advantages that appear to come from XML actually come from generic tree markup, and that holy wars between XML, JSON, YAML, etc. are really beside the point. In various situations, one syntax may have an advantage due to software support — for example, web browsers have built-in support for parsing XML or styling it using CSS, and they can convert JSON directly to JavaScript data structures using the eval() function — but when you look at the whole world of generic markup, those are small blips on a very large screen, and all of the markup languages more-or-less look the same.

Templating languages and XML

Saturday, December 23rd, 2006

Erich Schubert is talking about web templating languages. He’s looking for a pure-XML templating solution, but that might not be necessary for simple web-page design, where we don’t need all the extra benefits of heavy-duty transformation standards like XSLT.

Keeping it simple

For PHP-driven web sites, I’m a big fan of Smarty, which uses braces (”{” and “}”) to delimit template constructions. Braces have no special meaning to XML parsers (they’re just character data), so it’s possible to put a template expression inside an attribute value (for example), while keeping the template itself as well-formed XML and not requiring the elaborate paraphrastic expressions you need to set up attribute values in XSLT:

<p id="x-{$myvalue|escape}">Hello, world!</p>

Concurrent markup resurrected

Really, Smarty adds a second set of concurrent markup on top of the XHTML. Smarty constructs don’t have to balance with XML element boundaries, and with only a little care, I’ve never ended up with a Smarty template that wasn’t well-formed. JSP’s mistake was using something that looks like XML but isn’t quite, messing up parsers. Even the old SGML CONCUR feature would not have allowed markup inside attribute values. Sometimes there’s something to be said for using two different syntaxes when you’re trying to represent two different things.

Gap buffers

Wednesday, June 7th, 2006

Tim Bray updated an old piece on binary search this morning — I missed it the first time around, so I was glad that it popped up in my blog reader. Tim’s taking some flak about data abstraction from people who don’t have his experience in high-performance environments, but what got my attention most was his mention of using gaps in a long array to provide efficient updates.

It turns out that this technique, called a gap buffer [wikipedia], is one of the cornerstones of text editors like Gnu Emacs. I’ve been using Emacs for 20 years and have contributed to the main distribution (see derived.el), but never bothered to look at the C code long enough to discover this particular technique. There’s surprisingly little information online — if anyone’s ever bothered to do testing for the optimum gap size, etc., it’s not showing up in Google — but it’s still nice to experience the joy and excitement of a new (to me), simple algorithm that solves a common problem well.

Does anyone have pointers to more detailed research on gap buffers? It seems to me that they’d have applications far beyond text editing, including (perhaps) storing compiled tree data (aka binary xml) on disk.

Continuations, cont’d

Saturday, May 20th, 2006

[Update: see further contributions to the discussion from Ian Griffiths, Avi Bryant, James Robertson, and Joe Duffy; note also John Cowan's excellent comment below, pointing out that hidden fields work with the back button but not with bookmarks.]

It looks like continuations are back on the discussion board (Gilad Bracha, Tim Bray, and Don Box). I spent some time with Scheme a decade ago and continuations were one of the new features I had to try to understand. Then, as now, I found them more clever than practical.

Gilad sets up a use case for continuations before he goes on to oppose them: in essence, a web application could use continuations to maintain separate stacks, so that as a user hits the back button and then starts down new paths, the web application would not become confused, selling the user a trip to Hawaii instead of Alaska. I can see how continuations would work for that, just as I can see how a bulldozer could turn over the sod in my garden, but I’m far from convinced that either is the right tool for what is really a much simpler problem.

Explicit state

First, a continuation preserves the entire state of a program, including the stack, instruction counter, local variables, etc. How much of that do you really need for a hypothetical travel web app? In reality, you probably need, maybe, 1-5 variable values to restore a previous state in the travel app, so why not just save those explicitly? It would be faster, more secure (less information being saved), and much easier to performance tune and debug (since no magic is happening behind the scenes). Save those variables in a database, in a hash table, in an XML or CSV file, in memcached, or wherever happens to be most convenient. You may be looking at under 100 bytes for each saved state, so if you really want to do this, it’s not going to hurt too badly.

REST

But do you really want to do this? Most of the discussion around REST has focussed on the use of persistent URLs and how to use HTTP verbs like GET, POST, PUT, and DELETE, but there’s another, perhaps more critical idea behind REST — that the resource your retrieve (a web page, XML document, or what-have-you) contains its own transition information.

Let’s say that you load a web page into your browser, load more web pages, then use the back button to return to the original one. Now, select a link. What happens? Did you browser have to go back to the original web server, which was using continuations (or other kinds of saved state) to keep track of the links from every page you visited, so that it won’t send you to the wrong one? Of course not. The web page that you originally downloaded already included a list of all its transitions (links), and intuitive things just happen naturally when you hit the back button.

The web is stateless, but web application toolkits maintain pseudo-sessions (using cookies, URL rewriting, or what-have-you) that makes them look stateful, and that makes programmers lazy. Obviously, you don’t want to stick information like ‘isauthenticated’ on a web page, since it could be forged; likewise, you don’t want to put a credit-card number there. But it is trivially simple to make sure that forms, like links, go to the right place even when you hit the back button — just make the transitions fully independent of any session stored on the server side. For example, consider this:

<form method="post" action="/actions/book-trip">
  <button>Book this trip!</button>
</form>

Presumably, the trip the person was looking at is stored somewhere in a session variable on the browser. DON’T DO THIS! As Gilad pointed out, someone hitting the back button might end up booking the wrong trip. There are gazillions of ways to push all of the context-sensitive stuff into the web page itself, where it belongs. Here’s one example:

<form method="post" action="/actions/book-trip">
  <label>Book your economy trip to Alaska!</label>
  <input type="hidden" name="destination" value="alaska"/>
  <input type="hidden" name="package" value="economy"/>
  <button>Book it.</button>
</form>

Here’s another:

<form method="post" action="/actions/book-trip/alaska/economy">
  <label>Book your economy trip to Alaska!</label>
  <button>Book it.</button>
</form>

This is 100% backbutton-proof and it’s trivially simple to implement. It took me a while after reading Gilad’s (admittedly, strawman) example to realize that there are people who do not develop webapps this way. If they do this much damage just with a Session stack, how much pain will they be able to cause with continuations?

The REST people are right, at least on this point: there’s no need to drive a continuation bulldozer through your webapp, when a little REST garden spade will work quite nicely (and won’t tear up your lawn in the process). Don suggests that there may be other, more legitimate use cases for continuations outside of web applications, and I have no reason to disagree, but I would like to look at them pretty carefully.

How many environments?

Tuesday, May 2nd, 2006

Assume that you are a lone developer, maintaining a small web site in a shared hosting account. How many software environments do you need from development to production?

One environment

On the simplest level, you could develop directly in your ISP account, loading and saving files remotely via SFTP, WebDav, etc. — in other words, your development and production environment would be the same. For anything non-trivial, that’s a pretty hairy way to work, since you have no way to test changes before they’re rolled out into the world.

Two environments

I normally use the two-environment approach that (I suspect) is the most common one for single-developer LAMP sites: I maintain a development environment on my notebook, and periodically upload changes to the production environment at the ISP. I try to run roughly the same version of Apache, PHP, MySQL, etc. as my ISP, but otherwise, I take no special steps to replicate the production environment. On my notebook, I set up the development directory as its own virtual host (e.g. http://localhost:8001/, etc.) so that I can test changes literally as I type.

Three environments?

Even though there are no other developers working with me right now, I sometimes wonder if it would make sense to start using a third environment between development and production (a separate directory and virtual host on my notebook). A third development would allow me to run major experiments and restructuring in the development code, while still making small bug fixes, typo corrections, etc. in the stable code before uploading them to my ISP production environment.

While this sounds like a good idea initially, there is a major coordination problem involved in backporting fixes from the middle environment to the development environment, and the middle environment will still become unstable while new changes are rolled out into it, tempting me to create another environment — complexity is, sadly, highly contagious. Have any other lone developers had success (or failure) with this approach?

More

Big organizations use an enormous number of environments to build and roll-out a system:

  • Each developer’s desktop, where code generally lives for a few hours.
  • The development server, typically a single server running database, application server, etc. as well as version control unit/regression tests.
  • One or more test environments, covering integration testing, system testing, user-acceptance testing, etc. (these can range from single servers to small clusters to near-duplicates of the full production environment).
  • The staging environment, which is typically very similar or identical to the production environment.
  • The production environment, where the system runs.

I’m still undecided about whether enterprises help or hurt themselves by making things so complicated — coordinating a lot of people on a big system is hard, it’s even harder to imagine an agile process functioning under so many layers of pain.

Tired of frameworks

Sunday, April 23rd, 2006

I’m tired of software-development frameworks — they seem always to be optimized for the way someone else works, or for the way someone else thinks I should work. Granted, it’s fun to write frameworks, and it’s almost as much fun to learn them, but as soon as I try to do anything non-trivial they either get in my way or lure me off the road into blind alleys.

Is this a serious defect in my own skills and practices, or do others feel the same way? To paraphrase an almost-famous saying by Tim Bray, here’s what I want most from a development environment:

The REST schism and the REST contradiction

Saturday, March 25th, 2006

Update: a proposal for a better name.

Don Box got people talking last week in a posting where he distinguishes between two kinds of REST: lo-REST, which uses only HTTP GET and POST, and hi-REST, which also uses HTTP PUT and DELETE.

The schism

If this distinction doesn’t seem very important, don’t worry — it’s not. Tim Bray captured the most important point, that Don Box (who is heavily involved in REST’s nemesis, Web Services) is talking positively about REST at all. For the RESTafarians and some of their friends, however, Box’s heresy was even worse than his former non-belief, because heresy can easily lead the faithful astray: witness strong reactions from Dimitri Glazkov, Jonnay (both via Dare Obasanjo), and Dare Obasanjo himself. There is even a holy scripture, frequently cited to clinch arguments.

The contradiction

I do not yet have a strong opinion on which approach is better, but I do see a contradiction between the two arguments I hear most often from REST supporters:

  1. REST is superior to Web Services/SOAP/SOA because it’s been proven to work on the Web.
  2. Almost nobody on the Web uses REST correctly.

Pick one, and only one of these arguments, please. As far as I can see, apart from a few rare exceptions (like WebDAV), Don’s lo-REST — HTTP GET and POST only — is what’s been proven on the web. The pure Book of Fielding, hi-REST GET/POST/PUT/DELETE version is every bit as speculative and unproven as Web Services/SOAP/SOA themselves (that’s not to say that it’s wrong; simply that it’s unproven). Some REST supportors, like Ryan Tomayko, acknowledge this contradiction.

(Update) A better name?

Tim Bray proposes throwing out the REST name altogether and talking instead about Web Style. I like that idea, though the REST name may be too sticky to get rid of by now. Dumping the REST dogma along with the name would clear up a lot of confusion: HTTP GET and POST have actually been proven to work and scale across almost unimaginable volumes; on the other hand, like the WS-* stack, using HTTP PUT and DELETE remains a clever design idea that still needs to be proven practical and scalable.

RFC: (Java) SAX exceptions and new minor SAX version

Sunday, March 12th, 2006

(Note that this is not a major API change, and does not affect non-Java versions of SAX.)

Over on the sax-devel mailing list, Norman Walsh, who is involved with JAXP at Sun, has requested a small change to the SAXException class (see the archived thread).

When we were designing SAX quite a few years back, we needed the ability to embed an exception in another exception but Java did not support that, so we designed our own support. Starting with JDK 1.4, Java has supported embedded exceptions through the getCause method. Implementing getCause in SAXException would allow for more accurate stack traces and debugging, among other things.

Unfortunately, there is never such a thing as a perfectly backwards-compatible change. Chris Burdess pointed out that this change will break Java code that was calling initCause manually, and obviously, there will be some other differences in behaviour depending on which version of SAX people use. I believe that bringing SAX in line with modern Java usage (JDK 1.4 has itself been around for a while) is worth the trouble, and that very few applications would experience problems, but I’d like to see some wider discussion before I decide to put out a minor SAX release. Please let me know what you think, either by subscribing to the sax-devel list, posting a comment here, or posting your own blog entry and pinging this one.

Programming languages of distinction

Monday, March 6th, 2006

Via Ongoing, I read some interesting discussions of programming languages — mainly Python vs. Ruby, with most people happily dumping on Java.

Steve Yegge, in particular, argues that language success is based mainly on marketing, and that Python is doomed to obscurity because of the community’s lack of marketing savvy.

The programming language cycle

While I agree that Python probably is doomed to perpetual obscurity at this point, I think that Yegge’s focus on marketing is oversimplistic; instead, I’d argue that there’s a self-perpetuating cycle at work for successful programming languages:

  1. Elite (guru) developers notice too many riff-raff using their current programming language, and start looking for something that will distinguish them better from their mediocre colleagues.
  2. Elite developers take their shopping list of current annoyances and look for a new, little-known language that apparently has fewer of them.
  3. Elite developers start to drive the development of the new language, contributing code, writing libraries, etc., then evangelize the new language.
  4. Sub-elite (senior) developers follow the elite developers to the new language, creating a market for books, training, etc., and also accelerating the development and testing of the language.
  5. Sub-elite developers, who have huge influence (elite developers tend to work in isolation on research projects rather than on production development teams), begin pushing for the new language in the workplace.
  6. The huge mass of regular developers realize that they have to start buying books and taking courses to learn a new language.
  7. Elite developers notice too many riff-raff using their current programming language, and start looking for something that will distinguish them better from their mediocre colleagues.

You’ll notice that there’s no step here called “marketing”; instead, there are several distinct stages of evangelization and community building. Major vendors (other than the language’s owner, if it’s a vendor) will start to notice the language once the second wave (sub-elite) developers arrive, and IT managers will notice it because of books, magazine articles, and pressure from the high-end developers. Some — possibly a lot — of marketing will come out of those steps, but it is as much a result of the language’s success as a cause.

Points of failure

In this cycle, there are a few highly probably points of failure:

  • Timing: A new language might not be at the right stage of development (too raw, or too stale) at the time when elite developers decide to make a mass migration.
  • Features: If the new language’s features don’t answer the elite developers’ annoyance list, not enough of them will migrate to it.
  • Openness: Elite developers are used to having a lot of influence, and if the new language’s development process does not allow them sufficient say in the new language’s evolution, they will leave before they attract enough sub-elite developers.
  • Tools: Sub-elite developers might find the language unsuitable for day-to-day production use, especially if enough basic tools are not available (libraries, testing, debugging, GUI tools, performance measurement, etc.).
  • General acceptance: Regular developers might object to the new language and sabotage projects using it, either by producing poor-quality code or by missing deadlines (and blaming the new language in both cases).

Most programming languages stumble over one or more of these — it’s as much luck as clever design when a language like C++ or Java makes it past the hurdles and into the workplace. Success tends to draw more success, money draws more money, etc.

The final and most important point here is that a programming language’s perceived coolness will always suffer from its success. Java cannot possibly still be cool when there are thousands of regular developers slaving away in the bowels of ACME Widgets using it to write enterprise applications. If, in fact, Ruby displaces Java in the enterprise (which may not happen, since Ruby has no advantage over Java to match Java’s memory-management advantage over C++), it will suffer precisely the same fate, and we can expect Bruce Tate to write a book Beyond Ruby in five years or so.

By that measure, Python’s very failure is a kind of success — as long as it never really becomes takes hold in the workplace it will always carry a small degree of distinction with it, and at least a few elite developers won’t feel pressured to move on. Like a movie or band that never becomes too popular, Python will hang onto its snob appeal.