Friday, July 3, 2015

character expansion or handling Eszett

This week I discovered that Azure Search does not handle character expansion, meaning that it considers Strasse and Straße to be different words. They are alternative spellings of the same word - it depends on what keyboard is ready-to-hand, or which official spelling directive rules your world. If you search Strasse you will not find documents containing Straße, and vice-versa. That's disappointing. I entered a feature request at the Azure Search site. Vote for this feature if you care!

In addition to using Azure Search to search all the documents, searching for a term in the GA App also searches the glossaries I've added to the database. There I've been able to enable character expansion. With search term Schluß, the glossary search returns Abschluss,






















and searching for Schluss returns Schluß.





















The trick to getting character expansion to work, is to tell the search that we want the search to be done with Invariant Culture; the default is Ordinal.

dw = db.DeWords.Where(w => w.ISBN == isbn).SingleOrDefault(w => w.Word.Equals(q, StringComparison.InvariantCultureIgnoreCase)); 

In the database, in table DeWords, with the records from a book (ISBN), find a record where the Word field matches the search query.

If there aren't exact matches in a glossary, GA App then looks for partial matches. InvariantCulture is automatic when searching for substrings, but always case sensitive, so:

dw = db.DeWords.Where(w => w.ISBN == isbn).FirstOrDefault(w => w.Word.ToLower().Contains(q.ToLower()));

No comments:

Post a Comment