Friday, March 27, 2015

Pernicious link rot

I started Ereignis, the web page in 1995, and one of the early gripes was that links stopped working, because the content was moved or deleted. When I was informed that a site had moved or I noticed a link was broken, I would update or remove it, but I did not regularly test the links and remove the dead links.

One of the goals with the Gesamtausgabe app is to only display valid links. Towards that goal, I've written a module that checks all the links in the database, and checks the links embedded in the paper and book page content. Once that was working, it was easy to re-purpose the code and point it at the the Ereignis pages on beyng.com, and have it check the links on Ereignis.

This I did. I skipped the Ereignis pages that have links by subject, and only repeat links that are already on the general pages. I included the book pages in the bibliography, where the links and mainly to authors, publishers, and reviews.

Out of 2691 hyperlinks on Ereignis, 705 were broken, 26%. Testing the 2691 links took 80 minutes. The oldest page of links, from the 1990's, had over 80% rotten links. 10% of the links from the last year have rotted. The distribution appears linear. Link rot occurs consistently. Surprisingly, links with the most rot were those to people, rather than papers. Links to institutional web sites are likelier to rot then links to individual web site. Universities and publishers are changing their web hosting software regularly and tossing their old content, while individuals are more likely to ensure that their URLs continue to work.

Thursday, March 19, 2015

A helpful suggestion from History Today on broken hyperlinks.
Digital library researchers at Los Alamos National Laboratory found in a survey of three and a half million scholarly articles from scientific journals between 1997 and 2012 that one in five links provided in the footnotes suffered from 'reference rot'. Another survey, this time of law and policy publications, revealed that after six years nearly half of URLs cited had become inaccessible. Historians (perhaps unsurprisingly, given their profession) have been slower to place this most modern of problems at the top of their agenda. They are, however, not immune from its effect. An American study of two leading history journals found that in articles published seven years earlier, 38 percent of web citations were dead. Missing web pages can sometimes be relocated by academics through digital archives, the biggest of them being the Wayback Machine in San Francisco. A good many web pages, however, have not been archived and are permanently irretrievable.
A tool called Perma.cc was launched in beta phase in 2014. Developed by the Harvard Law School Library, it ‘allows users to create citation links that will never break’. If you want to secure the future of an Internet link in your footnotes, you create an archived version of the page you are referring to and anyone later clicking on your link will be taken through to the archived version. This ‘permalink’ does not repair Internet citations that have already decayed, but it does effectively fix the problem going forward. It has already been taken up by law reviews in America.
It would be cool if philosophy papers had links, instead of just referencing paper editions.