Friday, March 27, 2015

Pernicious link rot

I started Ereignis, the web page in 1995, and one of the early gripes was that links stopped working, because the content was moved or deleted. When I was informed that a site had moved or I noticed a link was broken, I would update or remove it, but I did not regularly test the links and remove the dead links.

One of the goals with the Gesamtausgabe app is to only display valid links. Towards that goal, I've written a module that checks all the links in the database, and checks the links embedded in the paper and book page content. Once that was working, it was easy to re-purpose the code and point it at the the Ereignis pages on beyng.com, and have it check the links on Ereignis.

This I did. I skipped the Ereignis pages that have links by subject, and only repeat links that are already on the general pages. I included the book pages in the bibliography, where the links and mainly to authors, publishers, and reviews.

Out of 2691 hyperlinks on Ereignis, 705 were broken, 26%. Testing the 2691 links took 80 minutes. The oldest page of links, from the 1990's, had over 80% rotten links. 10% of the links from the last year have rotted. The distribution appears linear. Link rot occurs consistently. Surprisingly, links with the most rot were those to people, rather than papers. Links to institutional web sites are likelier to rot then links to individual web site. Universities and publishers are changing their web hosting software regularly and tossing their old content, while individuals are more likely to ensure that their URLs continue to work.

No comments:

Post a Comment