Thursday, February 26, 2015

Azure Search and diacritics

I've been experimenting with Azure Search, to improve searching the Gesamtausgabe. I've got all the content indexed on Azure, and it's returning decent results; compensating for misspellings, and providing suggestions-as-you-type. I still need to figure out how to integrate with the Bootstrap typeahead control, before I update the website with the new search feature.

One of the features Azure Search doesn't have yet is asciifolding, so that a search for "αληθεια" will return documents containing "ἀλήθεια". Who can remember the polytonic Greek keyboard's diacritics' layout? And not every document uses diacritics consistently. If this feature is important to you, you can cast three votes for it here.

[Update March 9, 2015]
Asciifolding now works with Azure Search, with api-version=2015-02-28-Preview. The new release cadence from Microsoft is much better than the old days; "we'll fix that in the next Windows release". I've rebuilt the indexes and asciifolding is working in the app version that I'm currently working on. I hope to release it soon, several weeks.

In the fields you want to be searchable with asciifolding you set:

Analyzer = "standardasciifolding.lucene"

No comments:

Post a Comment