The process of breaking up text into tokens is called tokenization. Analyzers are nothing but the components which controls Tokenization.
Sitecore’s Content Search API comes configured with the standard analyzer by default, however it’s possible to configure a synonym analyzer if you need this functionality (i.e. searching for a synonym of a word in content finds that result). Sitecore ships with its own implementation of a synonym analyzer: Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer.
The key to the synonym analyzer is providing it a list of synonyms, which need to be set in your own custom XML file. The reason for this is that Sitecore includes its own synonym engine implementation that uses XML files to store the synonym mappings.
Configuring the Synonym Analyzer
- In ContentSearch.Lucene.DefaultIndexConfiguration.config , change the inner defaultAnalyzer parameter reference from the standard analyzer to the synonym analyzer:
- Now, unlike the standard analyzer, the synonym analyzer requires an implementation of an ISynonymEngineas its parameter:
<param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine,
- Sitecore’s implementation of that engine is able to read from XML files, and its requires a path to the XML file as its only parameter:
<param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\website\Data\synonyms.xml </param>
4.Putting it all together,
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer, Sitecore.ContentSearch.LuceneProvider"> <param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine, Sitecore.ContentSearch.LuceneProvider"> <param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\yoursite\Data\synonyms.xml </param> </param> </param>
5.Defining Synonyms in XML
All terms listed in the same group are synonyms of each other. So for example, if a content item has the word “quick” in its CMS content but you search for the
word “rapid” you will get that content item as a result.