Friday, November 21, 2014

The problem with Google Scholar

Assume you want to find the most cited papers about a subject. There are library databases, that are professionally maintained, but they have a limited scope -- e.g., only cover a subset of all journals -- and it is difficult to get casual access to library databases. Then there is Google Scholar, which scans any paper Google can find, plus some databases, and is free to search. So Google Scholar should be a good source for citations.

But Google Scholar doesn't quite work.

For example, if you want to know how many times "Sein und Zeit" has been cited, you find:

"Sein und Zeit" is considered differently from "Sein und Zeit (1927)", "Sein und Zeit [Being and Time]", "sein und Zeit, tübingen", "Martin Heidegger: Sein und Zeit". Included too are any papers or books that include "Sein und Zeit" in their abstracts or titles, and onwards for dozens more pages of results.

The problem is that Google Scholar is just running automatically, trying to extrapolate citations from texts that are formatted in many different, inconsistent ways, and Google Scholar doesn't have editors who would realize that a set of different entries all refer to a single item and connect them. Algorithms don't understand the meaning of text. But they can get better.

Sunday, November 9, 2014

How to get gesamt.html with PowerShell

How to download the file gesamt.html from Heidegger Gesamtausgabe app at gesam.azurewebsites.net. The app will generate a fresh copy from its database. The PowerShell commands follow:

# Sends a sign-in request by running the Invoke-WebRequest cmdlet. The command specifies a value of "fb" for the SessionVariable parameter, and saves the results in the $r variable. 
$r=Invoke-WebRequest https://gesam.azurewebsites.net/Account/Login -SessionVariable fb -UseBasicParsing

$rVerificationToken=($r.InputFields | Where { $_.name -eq "__RequestVerificationToken" }).value

# Gets the first form in the Forms property of the HTTP response object in the $r variable, and saves it in the $form variable. 
$form = $r.Forms[0]

# The next two commands populate the values of the "email" and "pass" keys of the hash table in the Fields property of the form. Of course, you can replace the email and password with values that you want to use.  
$form.Fields["Email"] = "nemo@nowhere.com"
$form.Fields["Password"] = "password"
$form.Fields["__RequestVerificationToken"] = $rVerificationToken

# The final command uses the Invoke-WebRequest cmdlet to sign in to the web service.
$r=Invoke-WebRequest -Uri "https://gesam.azurewebsites.net/Account/Login" -WebSession $fb -Method POST -Body $form.Fields

# The above is based on the Invoke-WebRequest help. Now get the gesamt.html into $r
$r=Invoke-WebRequest -Uri "https://gesam.azurewebsites.net/Band/Print" -WebSession $fb -Method POST -Body $form.Fields