A few years ago I created a Gesamtausgabe app with the info about the volumes, translations, translators, related books, papers, etc.
The data was hosted in Azure, Microsoft's cloud, in a Sql Server database, and there was an IIS/MVC web site to generate web pages.
It is still running today.
Its functional problem is that the database and website go to sleep after 15 minutes of inactivity, and it takes a minute to restart the site, so, to look up a GA volume, it's quicker to use the simple text file version.
The old GA app required logging in with user IDs, in order to keep communications between the web site and browser browser, because of how SSL was implemented back then. Today, on beyng.com, SSL is now always available, anonymously; no logins or users IDs, authentication not required. Access to restricted documents is now controlled at the source -- e.g. access to Heidegger Circle proceedings is now controlled at heidegger-circle.org; if you are already logged there, links from GA app will open automatically -- so there is no need for the GA app to manage roles; different levels of authorization according to user ID.
I've written a new GA app. The new version does its processing in the browser instead of in the cloud.
When it starts, it downloads all the data as text files to the browser, and the browser processes the data and generates web pages.
The new version does not depend on anything running in the cloud, it only needs to download static files. I intend to keep updating this app, and make improvements.
The new GA app is built on a framework from Microsoft called Blazor, that runs on WebAssembly in the browser.
WebAssembly is new and only works with current versions of browsers. The app will change as Blazor evolves.
The principal feature of the earlier cloud-based GA app that hasn't been reproduced in the new GA app is Search. The old cloud-based GA app use Azure for indexing and searching all the GA data, plus documents on beyng.com and files on other sites. Since search depends on the cloud, it is not implemented in the new GA app. I am looking for a way to search just the GA data in the client, and considering a new standalone search service that indexes all the files on beyng.com.
Thursday, December 26, 2019
Monday, November 11, 2019
Turning texts into apps
I few years ago I wrote an app for the Volpi book, in Angular 5, to make it easier to study the text. For example, double-clicking on a Greek word popped up a screen about that Greek word.
This year I've been working two texts and trying to create apps for them. I started with the first ~200 pages of GA 19 (Plato's Sophist) and wrote a Blazor app. Blazor is an experimental framework for writing code in browsers using WebAssembly. The app hosts the German pages and English translation. I has some features like "hover over a Greek word to see glossary look-up". Some features, like responding to a double-click on a selected word, son't quite work in Blazor yet. I need to do some more experimenting with Blazor as it matures.
In the summer I joined a B&T reading group, and created a B&T app, with the first ~100 pages of that text. In addition to the German and English, this app also hosts Tom Sheehan's paraphrastic condensation; users can flip between English translation and paraphrase.
Creating apps for texts is labor intensive if the text is not ready -- e.g., needs OCR corrections.
This year I've been working two texts and trying to create apps for them. I started with the first ~200 pages of GA 19 (Plato's Sophist) and wrote a Blazor app. Blazor is an experimental framework for writing code in browsers using WebAssembly. The app hosts the German pages and English translation. I has some features like "hover over a Greek word to see glossary look-up". Some features, like responding to a double-click on a selected word, son't quite work in Blazor yet. I need to do some more experimenting with Blazor as it matures.
In the summer I joined a B&T reading group, and created a B&T app, with the first ~100 pages of that text. In addition to the German and English, this app also hosts Tom Sheehan's paraphrastic condensation; users can flip between English translation and paraphrase.
Creating apps for texts is labor intensive if the text is not ready -- e.g., needs OCR corrections.
Saturday, May 4, 2019
How to sort Greek in C#
Found here.
class GreekComparer : IComparer<string>
{
public int Compare(string s1, string s2)
{
return String.Compare(s1.Normalize(System.Text.NormalizationForm.FormD),
s2.Normalize(System.Text.NormalizationForm.FormD),
StringComparison.InvariantCultureIgnoreCase);
}
}
GreekComparer gc = new GreekComparer();
sortedWordList = wordList.Sort(gc);
Saturday, March 16, 2019
Normalize Unicode
When "Zoë" !== "Zoë". Or why you need to normalize Unicode strings
So important.
In javascript:
So important.
In javascript:
const normalized = str.normalize('NFC')
Saturday, March 9, 2019
Make a Heidegger tool. Part I: assemble an archive
You're going to need Heidegger's texts. There are 100 volumes in his complete works. More than half of the German volumes are shared on the internet. A few are very good, all the characters in the electronic text are correct. A few are almost useless, the text can't be searched. Most of the usable texts are shared in PDF files.
1. Get the best version of each volume
Most PDFs have images of the pages and the text extracted (OCR) from the images. For our purposes, what matters is the quality of the text, not the quality of the images.
If you have a better OCR, extract better text from the images.
2. Convert the text to HTML files
Export the text from the PDF. Create an HTML page for each page of relevant text in the book. Try to get the most information possible from the PDF, like font (e.g. italics).
You will now have an archive of the best available texts. It'll be 90% reliable for simple words. 10% reliable for words with umlauts or Greek.
3. Correct the text
Update the text in the HTML pages to be correct, match what is on the printed page, in order to reliably search it. Most of the errors will be the results of OCR, which will make consistent errors, so you can make corrections across all the text files. Apply spellchecker; you'll need to add Heidegger's neologisms.
4. Put the HTML files on web server
5. Have search engines index pages
You can look up individual pages or search inside all texts.
The goal is to have 100% of the texts and that they be 100% correct in order to be able to search them reliably.
1. Get the best version of each volume
Most PDFs have images of the pages and the text extracted (OCR) from the images. For our purposes, what matters is the quality of the text, not the quality of the images.
If you have a better OCR, extract better text from the images.
2. Convert the text to HTML files
Export the text from the PDF. Create an HTML page for each page of relevant text in the book. Try to get the most information possible from the PDF, like font (e.g. italics).
You will now have an archive of the best available texts. It'll be 90% reliable for simple words. 10% reliable for words with umlauts or Greek.
3. Correct the text
Update the text in the HTML pages to be correct, match what is on the printed page, in order to reliably search it. Most of the errors will be the results of OCR, which will make consistent errors, so you can make corrections across all the text files. Apply spellchecker; you'll need to add Heidegger's neologisms.
4. Put the HTML files on web server
5. Have search engines index pages
You can look up individual pages or search inside all texts.
The goal is to have 100% of the texts and that they be 100% correct in order to be able to search them reliably.
Sunday, February 10, 2019
The Blazor Sofist
There's a new low level language in browsers called WebAssembly. Microsoft has built an experimental mechanism for running .NET virtual machines on WebAssembly called Blazor. That means that .NET languages like C# can now be used to write apps for web browsers.
I've written a simple Blazor app for a new, post-codex, "book". The first third of Heidegger's lectures on Plato's Sophist, which are about Aristotle's Metaphysics and Nicomachean Ethics. The app has the English and German text. I call the app Preliminary Sofist.
I've written the C# code to link Greek words on a page to their wiki entry, if that entry exists, the first time the word appears on a page. And, code to decorate Greek words with their English translation, if it appears in the glossary at the back of the book, so that the translation appears when the pointer hovers over the Greek word.
When Blazor's features improve, I want to add a dialog with Greek help, that pops up on double-clicking a Greek word, like I did with the Angular 5 Volpi app last year.
The app only works with Chrome. I use the IFrame srcdoc attribute to insert the page content into the app. Edge doesn't fully support HTML5.
I couldn't figure out how to get the app's URL routing to work from a sub-folder on a web site, so I had to host the app on its own domain.
I still have to proof-read and correct OCR errors from 2/3s of the German text, and add all the Greek words to the glossary and wiki links.
I've written a simple Blazor app for a new, post-codex, "book". The first third of Heidegger's lectures on Plato's Sophist, which are about Aristotle's Metaphysics and Nicomachean Ethics. The app has the English and German text. I call the app Preliminary Sofist.
I've written the C# code to link Greek words on a page to their wiki entry, if that entry exists, the first time the word appears on a page. And, code to decorate Greek words with their English translation, if it appears in the glossary at the back of the book, so that the translation appears when the pointer hovers over the Greek word.
When Blazor's features improve, I want to add a dialog with Greek help, that pops up on double-clicking a Greek word, like I did with the Angular 5 Volpi app last year.
I couldn't figure out how to get the app's URL routing to work from a sub-folder on a web site, so I had to host the app on its own domain.
I still have to proof-read and correct OCR errors from 2/3s of the German text, and add all the Greek words to the glossary and wiki links.
Subscribe to:
Posts (Atom)