@ayende You ought to try Mercurial. in reply to ayende 1 week ago
02
Feb

Are deletionists harming Wikipedia?

There’s a discussion over on the Colemak forums at the moment about the Wikipedia problem. It seems that, not content with having the article deleted on the grounds of non-notability a while ago, some Wikipedians are trying to eradicate every last mention of the layout from anywhere on the site. The deletion decision had eventually ended up as a redirect to a section on the Keyboard layout article, but it seems that even that’s been removed now, by a particularly argumentative individual who is rigidly and inflexibly applying his interpretation of the Reliable Sources policy.

Now as a satisfied Colemak typist I may be somewhat biased on this matter, but this one should be obvious. Colemak may be a pretty niche subject, but it has been covered a couple of times in the media—not a lot, but usually sufficient to at least get a “no consensus” decision in an AfD debate, which automatically defaults to “keep.” On top of that, it is included in X11 and every Linux distribution going. It’s one of only about half a dozen options for keyboard layout variant displayed on the installation screens of Ubuntu. It’s right in your face, not tucked away in some obscure and dangerous config file. Everyone who installs Ubuntu will be aware of it. Some of them will want to find out more about it. And they will expect Wikipedia to say something about it. But it won’t.

Of course, if it were just Colemak that were affected, I’m sure you could just dismiss this as a fanboy rant on my part, but this actually illustrates a much wider problem. With over three million articles, on everything from minor league ice hockey players to fictional foods in Babylon 5, Wikipedia is now the first place people turn to for information on anything obscure and only marginally notable. Wikipedia’s end users expect it to be an indiscriminate collection of information. Yet an indiscriminate collection of information is one of the things that Wikipedians are adamant that Wikipedia is not.

This is like being told that a problem in Sage or QuickBooks that is causing your tax return to be filled out with gibberish is not a bug, but a feature.

The problem is that there is a massive disconnect between Wikipedia’s users—casual visitors who often don’t even bother to create an account—and its overlords—the regular, active Wikipedians with edit counts in the thousands or even tens of thousands and an encyclopaedic knowledge and understanding of its policies. It is at its most striking in the whole inclusionist versus deletionist debate. And the deletionists are alienating a lot of would-be Wikipedians.

It turns out that this is one of the biggest criticisms levelled at Wikipedia by occasional editors. People come onto the site knowing nothing of Wikpedia’s policies, but plenty about some—possibly very niche—subject. They make half a dozen or so edits, then return a week later to find that their article has been deleted with no apparent explanation. Or perhaps it will be flagged with a deletion debate, crammed full of arcane and cabalistic abbreviations such as WP:NFT, WP:NOTE, WP:V, WP:WAX, WP:SOAP, WP:IAR, and so on, all pointing to Wikipedia’s byzantine and convoluted policies, guidelines and procedures. What kind of impression does this leave the casual editor? That Wikipedia is a hideout for a bunch of antisocial, bureaucratic teenage control freaks—a kind of online equivalent to the kids on the beach who kick the sandcastle you’ve just spent three hours building into your face. And since first impressions count the most, they will go off, never contribute anything else, and rant on blogs and forums about how insular and out of touch with Real Life these Wikipedians are.

Why is this harming Wikipedia? Because these are the people who contribute the overwhelming majority of substantive, meaningful content to the site.

This study by Aaron Swartz will be particularly enlightening to anyone who doubts this claim. His research on a data dump of Wikipedia indicated that most contributions of actual substantive content are made by new and casual users, many of whom never even create an account and most of whom only make a handful of edits to the site. Regular Wikipedians, on the other hand, tend to spend most of their time tidying things up—moving text around, correcting spelling mistakes, wikifying things—and deleting stuff.

I’ve sometimes looked at these deletion debates and wondered how many of the people voting for deletion with reference to obscure areas of Wiki policy even begin to understand the subject matter of the article under discussion itself. Some of the arguments for deletion of Colemak are laughable for starters. They’d have us belive that nobody uses it (a brief glance at the activity on the forums and the Facebook group and even the AfD debate itself will quickly dispel this notion); that X11 is an anarchic free-for-all where you could submit a patch containing a rootkit backdoor and it would be accepted; and that the only way to enable Colemak in Ubuntu is to edit some obscure and dangerous config file where it’s buried in a list of gazillions of options and a slight typo will make your computer unbootable.

Certainly, searches for reliable sources are usually cursory: no hits on Google News, no hits on Google Scholar, so delete. Blogs are automatically not considered reliable sources, even if they’re written by experts in the industry such as Tim Bray, Simon Willison or Jeff Atwood. In fact, Jeff Atwood’s Wikipedia entry also fell foul of the deletionists a year ago, when Stack Overflow was in public beta, which shows just how completely out of touch with reality they are. (Incidentally, web development is one area in particular where WP:RS is a very bad metric for notability, simply because it’s an industry where a lot of key activity happens at the grassroots level. The sources that web developers regard as reliable enough for practical purposes are generally high profile blogs like Jeff’s, while the academics writing papers on how to use lines of code per day as a productivity metric are frequently regarded as an irrelevance at best and harmful at worst.)

There’s also a lot of bluster and bullying goes on when the deletionists crop up. Throwing acronyms around sends a signal to newbies that they’re not welcome. If you Twitter about a deletion debate, you’re accused of canvassing and booed off. Anonymous accounts and new users are often regarded with suspicion as potential sock puppets. Most people find it hostile and intimidating, and perhaps even a bit childish, but the deletionists don’t care. They’re so obsessed with making Wikipedia what they think it should be that they’ve completely lost sight of the end users.

08
Sep

Paths and file locations in ASP.NET

There are loads of ways to find the path — either the URL or the physical path — to a page, user control or other file in an ASP.NET application. Unfortunately, however, the documentation doesn’t do a brilliant job of explaining them to you. There are also several different scenarios, depending on whether you are using conventional web forms, or URL rewriting, or Server.Transfer, or ASP.NET MVC. So I thought I’d better write down an overview of some of them for reference.

Scenario 1: direct request for a web form.

Just suppose for a minute that you have been contracted to rewrite Wikipedia in ASP.NET. So, for instance, you may end up with the page “What Wikipedia is Not” (aka “WP:NOT” or “Wikipedia’s attempt to get into the Guinness Book of Records for the most lies per kilobyte on a web page”) at http://en.wikipedia.org/wiki.aspx/WP:NOT.

In this case, you have several different properties of HttpContext.Current.Request containing different representations of it.

  • Request.RawUrl = "/wiki.aspx/WP:NOT" represents the path and query string parts of the URL. In this case, of course, there is no query string, but if there were, you might see it set to something like "/wiki.aspx/WP:NOT?mode=edit".
  • Request.Path = "/wiki.aspx/WP:NOT" represents the path part of the URL.
  • Request.FilePath = "/wiki.aspx" represents the part of the path to the file (in this case wiki.aspx) that is handling the request.
  • Request.PathInfo = "/WP:NOT" is a diff of Request.Path and Request.FilePath, giving the extraneous bit of the path that does not refer to a file in the file system.
  • Request.PhysicalPath = "c:\inetpub\wwwroot\wiki.aspx" is the physical path to the file that is servicing the request.

Case 2: Server.Transfer() and Server.Execute()

Sometimes, you may want to transfer control from one file to another. Let us suppose, for instance, that you decide to use several Web forms: one for articles, one for special pages, and one for article history. You do a few simple checks in wiki.aspx and decide to transfer control to another file, say, article.aspx, using Server.Transfer(). Then, another property of Request comes into play.

  • Request.CurrentExecutionFilePath = "/article.aspx" represents the path to the file that is currently handling the current part of the request.
  • Request.FilePath = "/wiki.aspx", however, remains unchanged.
  • Request.PhysicalPath = "c:\inetpub\wwwroot\wiki.aspx" also remains unchanged.
  • Request.AppRelativeCurrentExecutionFilePath = "~/article.aspx" is the same as Request.CurrentExecutionFilePath, but relative to the root of the web application, as defined in IIS. If your application were rooted at, say, "/wiki" then Request.CurrentExecutionFilePath would be "/wiki/article.aspx".
  • Everything else remains unchanged.

Note that Request.CurrentExecutionFilePath is always in use: if there has been no call to Server.Transfer it will be the same as Request.FilePath.

Case 3: URL rewriting

So you have this lovely new ASP.NET version of Wikipedia up and running, it works much more smoothly, has much less downtime, and runs on only a dozen or so servers rather than a hundred. Then, you start getting hate mail from irate Wikipedians, many of whom are open source zealots who are definitely not NPOV on Microsoft Windows. Jimbo and the Arbitration Committee get involved and demand you rewrite those URLs to cover up the fact that the Wikimedia Foundation has gone over to the Dark Side.

So, you take the original URL http://en.wikipedia.org/wiki/WP:NOT and transmogrify it into http://en.wikipedia.org/wiki.aspx?ns=Wikipedia&pg=What_Wikipedia_is_Not using a discreet call to Context.RewritePath.

Suddenly, everything changes!

  • Request.RawUrl = "/wiki/WP:NOT" represents the original path and query string parts of the URL. In actual fact, Request.RawUrl always represents exactly what you typed into your browser.
  • Request.Path = "/wiki.aspx" represents the path part of the URL.
  • Request.FilePath = "/wiki.aspx" represents the part of the path to the file (in this case wiki.aspx) that is handling the request.
  • Request.PathInfo is blank. When you use URL rewriting you have to point to a real file: you can’t use a PathInfo — that’s why you need to use a query string instead.
  • Request.CurrentExecutionFilePath = "/wiki.aspx" until you call Server.Transfer, when it changes.
  • Request.QueryString = "ns=Wikipedia&pg=What_Wikipedia_is_Not" is of course changed after the URL rewrite.
  • Request.PhysicalPath = "c:\inetpub\wwwroot\wiki.aspx" is, again, the physical path to the file that is servicing the request.

Case 4: ASP.NET MVC

So how on earth, you may be asking, does all this work with ASP.NET MVC? After all, it doesn’t use Web forms in the same way — URLs map to controllers, which then decide which views to render themselves.

Well here’s the skinny:

  • Request.RawUrl = "/wiki/WP:NOT" contains the raw URL (path and query string) as before.
  • Request.Path, Request.FilePath, and Request.CurrentExecutionFilePath, all contain the “path” part of the URL without the query string. They will all be set to "/wiki/WP:NOT"
  • Request.PathInfo is blank. ASP.NET MVC handles path info through the routing engine and passes it in the parameters for your controller.
  • Request.PhysicalPath = "c:\inetpub\wwwroot\wiki\WP:NOT" is NOT the physical path to the file that is servicing the request. Controllers may decide to render one of any number of views or other results, and they need not even be Web forms — they could be raw text content (from a ContentResult), or a redirect (from a RedirectResult or a RedirectToRouteResult) or a JSON string (from a JsonResult) and they aren’t associated with a physical file on the filesystem at all.

Case 5: ASP.NET MVC with URL rewriting and/or Server.Transfer

I shall leave this one as an exercise for the reader. No doubt there is someone, somewhere, who is doing this, for reasons that completely befuddle me. After all, I’d have thought that the whole MVC pattern renders URL rewriting and Server.Transfer pretty much redundant.

Case 6: Requests for a directory’s home page

This is much the same as the above, except that ASP.NET inserts the name of the home page — typically default.aspx — into Request.RawUrl, and, by extension, everything else. Obviously, this does not apply to ASP.NET MVC.

06
Aug

Don’t stuff beans up your nose

Wikipedia will never cease to amaze me. Its instructions include such gems as:

Or best of all:

How can you possibly take an encyclopedia seriously when it has editorial policies such as those?

26
Nov

Is it time to kill off wikitext?

Anyone who has ever tried to edit Wikipedia will have encountered wikitext, the rather esoteric syntax used for markup on its pages.

Wikitext is, in theory at least, simpler than HTML. Two single quotes delimit ''italics'', while three single quotes indicate '''bold text'''. [Square brackets] indicate external links, [[double square brackets]] indicate internal links, and so on. A lot of other wiki software uses similar syntax. For example, Trac, a popular open source bug tracking system, uses a very similar markup language, and since you can also embed HTML in it, and even use a fairly sophisticated macro language, it allows very fine-grained control of the contents of the page. For the novice, there is a helpful toolbar at the top of the edit box, so that you can easily mark up various parts of the text as bold, italics, hyperlinks, and so on.

image

However, in late 2007, it somehow feels wrong. As wrong as it felt not being able to get broadband in late 2005.

Perhaps there is a place for wikitext, as a fallback to improve accessibility when JavaScript is not available. And some things are simply not possible (yet) without it, such as typesetting mathematical equations. However, in terms of usability, it sucks. Apart from having to navigate away from the main article page, you have to scroll through the box to find the part of the wikitext corresponding to where you want to make the change (not obvious in an article with a lot of footnotes, references, tables and the like). It also creates a distinct range of systemic biases, which is a problem that Wikipedia itself acknowledges. How much nicer it would be, if clicking on “edit” on a section of a wiki page were to bring up an in-line rich text editor where what you see is what you get.

Web browsers have now had rich text editing capabilities for over seven years. This feature was first introduced in July 2000 in Internet Explorer 5.5, and nowadays every major browser supports it one way or another. It needs a lot of fiddling about with JavaScript in order to work properly on all of them, of course, but there are several popular and mature libraries and components such as FreeTextBox, TinyMCE and FCKeditor that handle this very well, so that’s pretty much a solved problem. Even cleaning Word HTML and producing valid XHTML — once common objections to rich text editors — are solved problems too.

There are many rich Internet applications these days that raise the bar significantly in terms of quality of user experience. Slick, good looking, easy to use sites are becoming more and more commonplace, and while ones such as Google Maps or EyeOS still have a bit of a “wow” factor, it’s getting easier all the time to develop them. With libraries like jQuery, for instance, you can implement a Google Suggest-style Ajax search facility in a couple of hours.

With it becoming increasingly easy to create elegant rich Internet applications, and the tools to do so being readily available, free and open source, having such an awkward and clunky way of editing content is beginning to look very last millennium. It’s time it went the way of the dinosaurs.

08
Nov

Can we live without Wikipedia?

It’s now about two months since I decided to quit editing Wikipedia, and I think it’s been the best decision I’ve made so far this year. Wikipedia can be pretty distracting if you take it too seriously, and in fact the best advice I can give to anyone thinking of becoming a regular Wikipedian is: don’t.

I’ve decided that I’m not going to make any anonymous edits either. When I see stuff on Wikipedia that is blatantly biased, untrue and even downright stupid, it takes a lot of restraint to avoid clicking the "Edit" button, but I’ve decided that the best thing to do is just resign myself to the fact that Wikipedia is a soapbox, it is a social networking site, it is an indiscriminate collection of information, and it is pretty much everything else that it claims it isn’t, and trying to keep it right is like painting the Forth Bridge.

For a while I’ve been wondering on and off whether I could just dispense with Wikipedia altogether. As an experiment, I’ve added an entry for en.wikipedia.org to my hosts file on my work computer to block it off completely. It gets a little bit frustrating when I come across a link on someone’s blog to a Wikipedia entry on something I don’t properly understand, but hey, there’s always Google to help me seek out more reliable sources. It’ll be interesting to see how long I can go without it, but I rather suspect that before too long I won’t even notice it.

11
Sep

On leaving Wikipedia

Regular readers of my blog will no doubt be aware that over the past eighteen months or so I have been fairly active on Wikipedia, notching up just under a thousand edits or so. WikipediaThis may sound like a lot but it is not uncommon to find Wikipedians with edit counts reaching well into the tens of thousands, so I am not the most active by a long shot.

However, I have decided that it is time to call it a day.

The main reason is to help me stay focused. Editing Wikipedia and participating in all the discussions etc can be fun, but it can also be pretty distracting if you are not careful. It is also all too easy to get lost in all the masses and masses of trivia that are to be found there: if you have ever gone on to it looking for information about cryptographic hash algorithms and ended up with a dozen articles open at pages such as Jennifer Aniston, Self Diagnosis and Eta Carinae after five hours of fascinated clicking, having totally forgotten what you went on to Wikipedia for in the first place, you will know exactly what I mean.

Wikipede“Computer Joe” Anderson wrote a blog entry recently in which he attempted to debunk some “myths” about Wikipedia (or “the Wikipedia” as he calls it). It’s gives an interesting insight into Wikipedian culture that most casual readers have no idea about, such as the recent changes patrol and the Arbitration Committee. However, while his arguments are all factually accurate, I must disagree with his conclusions. Despite all its processes, policies, procedures and patrollers, Wikipedia still has very much the feel of being the Wild West of encyclopedia territory: chaotic, anarchic and at times pretty bewildering.

I must admit to finding it curious that Wikipedia policy states in no uncertain terms that “Wikipedia is not an indiscriminate collection of information.” After all, as far as indiscriminate collections of information go, Wikipedia has few rivals. After all, when you encounter articles like Globus Cassus — an obscure book that you’ve almost certainly never heard of, which outlines a completely whacked out and bizarre proposal to dismantle the earth and use it to build a flying saucer the size of Saturn — and then discover that it survived not just one but two deletion debates, you just have to shake your head and say to yourself, “Only on Wikipedia”. But it turns out that it featured at one of those highfalutin modern art exhibitions where you expect to see cows in formaldehyde, piles of bricks, and avant garde paintings by elephants and chimpanzees. Apparently that makes it Notable. Go figure.

22
Nov

WordPress not notable?!!?

What does this guy think he’s playing at? He thinks WordPress — the world’s most popular open source blogging engine — is not notable enough for Wikipedia.

Okay, let’s come up with a few other articles that, by the same standard, are not notable enough for Wikipedia. How about, er, this one, this one, this one, or this one?

18
May

Errors in Wikipedia Considered Harmful

I have just spent the whole morning battling with a bug in an RTF encoding class in one of our applications. For some reason, Unicode characters were causing Microsoft Word to either consider that the file was corrupted or else to drop characters seemingly at random.

It turned out that the reason for the bug was that I was using some incorrect information on Wikipedia. Now I must confess to being a bit of a Wikipedia fan, but it turns out that this particular article omitted some important information, as I discovered on a closer reading of the MSDN documentation.

Since anyone can edit Wikipedia, I have corrected it, but it just goes to show that if you rely too much on the information that you get there, you can run into trouble.