Custom Lucene.NET indexes in Sitecore

Reason

Indexes can be configured to contain only certain types of items, based on their template, location and similar characteristics. By implementing a custom database crawler the filtering can be refined further to allow for very granular in- and exclusion of items. When dealing with large amounts of items this can provide an additional performance gain compared to using the standard Quick Search (aka System) index.

The following is an example of…

  • a database crawler which considers base templates when determining whether to index an item or not
  • a repository used to retrieve these items (a modified version of the repository implemented in Sitecore index based search)
  • a variety of custom index configurations

Examples are based on Sitecore 6.6.0 Update-2 and .NET 4.5.

Note: I’ve used the implementation extensively with Sitecore 6.2 Update-5/.NET 3.5 and Sitecore 6.5/.NET 4. It should be compatible with Sitecore 6.3.1 and 6.4.1 as well.

Code

The following configuration causes web database indices to be updated when content is published. Save the contents to a .config-file and place it in the “App_Config/Include”-folder.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <!-- Automatically update web database index when items are being published -->
    <databases>
      <database id="web">
        <Engines.HistoryEngine.Storage>
          <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
            <param connectionStringName="$(id)" />
            <EntryLifeTime>30.00:00:00</EntryLifeTime>
          </obj>
        </Engines.HistoryEngine.Storage>
        <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
      </database>
    </databases>
  </sitecore>
</configuration>

As mentioned earlier the custom database crawler shown below checks base templates when determining which items to add to an index.

using Sitecore.Configuration;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Diagnostics;
using System.Collections.Generic;
using System.Linq;

public class DatabaseCrawler : Sitecore.Search.Crawlers.DatabaseCrawler
{
  private Item _indexRootItem;

  protected virtual Item IndexRootItem
  {
    get { return _indexRootItem ?? (_indexRootItem = GetIndexRootItem()); }
  }

  private Item GetIndexRootItem()
  {
    Assert.IsNotNullOrEmpty(base.Database, "Database name not set.");
    Database database = Factory.GetDatabase(base.Database);
    Assert.IsNotNull(database, "Database \"{0}\" not found.", base.Database);
    Item indexRootItem;
    if (string.IsNullOrEmpty(base.Root))
      indexRootItem = database.GetRootItem();
    else
      indexRootItem = database.GetItem(base.Root);
    Assert.IsNotNull(indexRootItem, "Index root item not found.");
    return indexRootItem;
  }

  protected override bool IsMatch(Item item)
  {
    Assert.ArgumentNotNull(item, "item");
    return IsIndexRootItemOrDescendantHereof(item) && !UsesExcludedTemplate(item) && UsesIncludedTemplate(item);
  }

  private bool IsIndexRootItemOrDescendantHereof(Item item)
  {
    return IndexRootItem.ID == item.ID || item.Paths.IsDescendantOf(IndexRootItem);
  }

  private bool UsesExcludedTemplate(Item item)
  {
    // "All templates welcome"
    if (!base._hasExcludes)
      return false;

    return GetTemplateIDs(item).Any(IsExcludedTemplate);
  }

  private IEnumerable<ID> GetTemplateIDs(Item item)
  {
    return GetTemplateIDs(new[] { item.Template });
  }

  private IEnumerable<ID> GetTemplateIDs(IEnumerable<TemplateItem> templates)
  {
    HashSet<ID> templateIDs = new HashSet<ID>();
    foreach (TemplateItem templateItem in templates)
    {
      templateIDs.Add(templateItem.ID);
      templateIDs.UnionWith(GetTemplateIDs(templateItem.BaseTemplates));
    }
    return templateIDs;
  }

  private bool IsExcludedTemplate(ID templateID)
  {
    return (from pair in _templateFilter
            where ID.Parse(pair.Key) == templateID
            select pair.Value == false).FirstOrDefault();
  }

  private bool UsesIncludedTemplate(Item item)
  {
    // "All templates welcome"
    if (!base._hasIncludes)
      return true;

    return GetTemplateIDs(item).Any(IsIncludedTemplate);
  }

  private bool IsIncludedTemplate(ID templateID)
  {
    return (from pair in _templateFilter
            where ID.Parse(pair.Key) == templateID
            select pair.Value == true).FirstOrDefault();
 }
}

The ItemRepository has been modified to take advantage of the fact that a certain amount of filtering has already been performed by a database crawler during the indexing process.

using Sitecore;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Diagnostics;
using Sitecore.Globalization;
using Sitecore.Search;

public class ItemRepository
{
  private readonly Sitecore.Search.Index _index;

  public ItemRepository() :
    this(SearchManager.SystemIndex)
  {
  }

  public ItemRepository(string indexID) :
    this(SearchManager.GetIndex(indexID))
  {
  }

  public ItemRepository(Sitecore.Search.Index index)
  {
    Assert.ArgumentNotNull(index, "index");
    _index = index;
  }

  /// <summary>
  /// Get all items which inherit from the specified template (i.e. supports template based polymorphism).
  /// </summary>
  public IEnumerable<Item> GetByTemplate(ID templateId)
  {
    // Base template IDs are stored without dashes and brackets (e.g. "00E76C234BC54EEF86C571A2423A6E6C").
    string formattedId = templateId.Guid.ToString("N");
    return GetByFieldValue(BuiltinFields.AllTemplates, formattedId);
  }

  public IEnumerable<Item> GetByFieldValue(string fieldNameOrFieldID, ID fieldValue)
  {
    Assert.ArgumentNotNull(fieldValue, "fieldValue");
    return GetByFieldValue(fieldNameOrFieldID, fieldValue.Guid);
  }

  public IEnumerable<Item> GetByFieldValue(string fieldNameOrFieldID, Guid fieldValue)
  {
    Assert.ArgumentNotNull(fieldValue, "fieldValue");
    string formattedfieldValue = fieldValue.ToString("B").ToLowerInvariant();
    return GetByFieldValue(fieldNameOrFieldID, formattedfieldValue);
  }

  public IEnumerable<Item> GetByFieldValue(string fieldNameOrFieldID, string fieldValue)
  {
    return GetByFieldValue(fieldNameOrFieldID, fieldValue, Context.Language, Context.Database);
  }

  public IEnumerable<Item> GetByFieldValue(string fieldNameOrFieldID, string fieldValue, Language language, Database database)
  {
    Assert.ArgumentNotNullOrEmpty(fieldNameOrFieldID, "fieldNameOrFieldID");
    Assert.ArgumentNotNullOrEmpty(fieldValue, "fieldValue");
    Assert.ArgumentNotNull(language, "language");
    Assert.ArgumentNotNull(database, "database");
    CombinedQuery query = CreateQuery(language, database);
    string fieldName = GetFieldName(fieldNameOrFieldID, database);
    query.Add(CreateFieldClause(fieldName, fieldValue), QueryOccurance.Must);
    using (IndexSearchContext searchContext = _index.CreateSearchContext())
    {
      return GetQueryResult(searchContext, query);
    }
  }

  private CombinedQuery CreateQuery(Language language, Database database)
  {
    FieldQuery languageQuery = new FieldQuery(BuiltinFields.Language, language.Name);
    FieldQuery databaseQuery = new FieldQuery(BuiltinFields.Database, database.Name);
    CombinedQuery query = new CombinedQuery();
    query.Add(languageQuery, QueryOccurance.Must);
    query.Add(databaseQuery, QueryOccurance.Must);
    return query;
  }

  private string GetFieldName(string fieldNameOrFieldID, Database database)
  {
    if (ID.IsID(fieldNameOrFieldID))
    {
      Assert.ArgumentNotNull(database, "database");
      Item fieldItem = database.GetItem(fieldNameOrFieldID);
      Assert.IsNotNull(fieldItem, "Unable to find field item \"{0}\" in database \"{1}\".", fieldNameOrFieldID, database.Name);
      fieldNameOrFieldID = fieldItem.Name;
    }
    return fieldNameOrFieldID;
  }

  private QueryBase CreateFieldClause(string fieldName, string fieldValue)
  {
    return new FieldQuery(fieldName.ToLowerInvariant(), fieldValue);
  }

  private IEnumerable<Item> GetQueryResult(IndexSearchContext searchContext, QueryBase query)
  {
    // When using later versions of Sitecore this method should be
    // replaced by a call to "Search(Query query, int maxResults)"
    SearchHits hits = searchContext.Search(query);
    if (hits == null)
      return Enumerable.Empty<Item>();

    if (hits.Length == 0)
      return Enumerable.Empty<Item>();

    SearchResultCollection results = hits.FetchResults(0, hits.Length);
    return results.Select(result => result.GetObject<Item>()).Where(item => item != null).ToArray();
  }

  public IEnumerable<Item> GetAll()
  {
    return GetAll(Context.Language, Context.Database);
  }

  public IEnumerable<Item> GetAll(Language language, Database database)
  {
    QueryBase query = CreateQuery(language, database);
    using (IndexSearchContext searchContext = _index.CreateSearchContext())
    {
      return GetQueryResult(searchContext, query);
    }
  }
}

Shown below are various examples of index configurations. They should be saved to .config-files and placed in the “App_Config/Include”-folder as usual.

The configuration shown below will cause the System index to contain items from the web database in addition to items found in the master database (configured by default).

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <search>
      <configuration>
        <indexes>
          <index id="system">
            <locations>
              <!-- Index the web database -->
              <web type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
                <Database>web</Database>
                <Tags>web content</Tags>
              </web>
            </locations>
          </index>
        </indexes>
      </configuration>
    </search>
  </sitecore>
</configuration>

The “News” index simply contains all items in the web database that are based on the News Page template.

News Page index

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <search>
      <configuration>
        <indexes>
          <index id="News" type="Sitecore.Search.Index, Sitecore.Kernel">
            <param desc="name">$(id)</param>
            <!-- Index files are stored in this subfolder of the "IndexFolder" specified in Web.config -->
            <param desc="folder">News</param>
            <Analyzer ref="search/analyzer"/>
            <locations hint="list:AddCrawler">
              <!-- Index the web database -->
              <web type="Sitecore.Search.Crawlers.DatabaseCrawler,Sitecore.Kernel">
                <Database>web</Database>
                <!-- Specify where to look for items -->
                <Root>/sitecore/content</Root>
                <include hint="list:IncludeTemplate">
                  <!-- The standard database crawler will only include items created from this template -->
                  <NewsPage>{23FBF944-C3B6-4891-9488-0BD1BF979FC2}</NewsPage>
                </include>
                <Tags>web news</Tags>
              </web>
            </locations>
          </index>
        </indexes>
      </configuration>
    </search>
  </sitecore>
</configuration>

The “Product” index is configured to include items from both the master and the web database, which can be useful when implementing e.g. administrative applications available from within the Sitecore shell.

Product Page index

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <search>
      <configuration>
        <indexes>
          <index id="Products" type="Sitecore.Search.Index, Sitecore.Kernel">
            <param desc="name">$(id)</param>
            <param desc="folder">Products</param>
            <Analyzer ref="search/analyzer"/>
            <locations hint="list:AddCrawler">
              <!-- Index the master database -->
              <master type="Sitecore.Search.Crawlers.DatabaseCrawler,Sitecore.Kernel">
                <Database>master</Database>
                <Root>/sitecore/content</Root>
                <include hint="list:IncludeTemplate">
                  <ProductPage>{AAED2406-A86A-4D5E-AF58-27CD4767BD85}</ProductPage>
                </include>
                <Tags>master products</Tags>
              </master>
              <!-- Index the web database -->
              <web type="Sitecore.Search.Crawlers.DatabaseCrawler,Sitecore.Kernel">
                <Database>web</Database>
                <Root>/sitecore/content</Root>
                <include hint="list:IncludeTemplate">
                  <!-- The standard database crawler will only include items created from this template -->
                  <ProductPage>{AAED2406-A86A-4D5E-AF58-27CD4767BD85}</ProductPage>
                </include>
                <Tags>web products</Tags>
              </web>
            </locations>
          </index>
        </indexes>
      </configuration>
    </search>
  </sitecore>
</configuration>

The “BrowserTitle” index contains both News, Product and Content Pages from the web database, as all these page templates inherit from the “__BrowserTitle” data template.

Browser Title index

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <search>
      <configuration>
        <indexes>
          <index id="BrowserTitle" type="Sitecore.Search.Index, Sitecore.Kernel">
            <!-- Descriptive name -->
            <param desc="name">Anything with a browser title</param>
            <param desc="folder">BrowserTitle</param>
            <Analyzer ref="search/analyzer"/>
            <locations hint="list:AddCrawler">
              <!-- Custom database crawler type - insert namespace and assembly name as needed -->
              <web type="NamespaceName.DatabaseCrawler, AssemblyName">
                <Database>web</Database>
                <Root>/sitecore/content</Root>
                <include hint="list:IncludeTemplate">
                  <!-- The custom database crawler will include all items created or inheriting from this template -->
                  <BrowserTitle>{5C364328-D25A-48EE-81C9-BB6B9C335905}</BrowserTitle>
                </include>
                <Tags>web browsertitle</Tags>
              </web>
            </locations>
          </index>
        </indexes>
      </configuration>
    </search>
  </sitecore>
</configuration>

Example

Shown here are some simple look ups based on the indices defined earlier. First of, the index has to be rebuild using the wizard accessible through the Sitecore control panel. The wizard shows the name of the index rather than its ID.

Note: When using a Sitecore version prior to 6.5 Update-5 / 6.6 Update-1 the rebuild dialog will not display any custom indexes. The Sitecore Index Viewer can be used as an alternative to trigger index rebuilds. Failing these options, republishing any content relevant to the index is probably the easiest way to trigger a rebuild.

Dialog - rebuild the search index

The sample site contains a few news, product and content pages:

Sample site content

Using the ItemRepository shown earlier, the indexed content can be retrieved easily. Indices can simply be iterated using e.g. Sitecore.Search.SearchManager.Indexes.

News index contents

Product index contents

Browser Title index contents

System index contents

7 thoughts on “Custom Lucene.NET indexes in Sitecore

  1. Pingback: .Net News – Januar Summary – Namics Weblog

  2. Mr Uli, I have a question for you. How where you able to see the indexes in the wizard(from the control panel)? I can only see the quick search index – so I had to make may own RebuildDatabaseCrawlers page in order to see and rebuild my own defined indexes.

    • The index rebuild dialog doesn’t show custom indexes in Sitecore versions prior to 6.5.0 Update-5 and Sitecore 6.6.0 Update-1 (I’ve updated the post accordingly).
      Indexes will be updated when e.g. items are saved/published even though it’s not possible to manually rebuild them using the dialog.
      If in doubt, open the directory in which the index is located as per your configuration, and look at it while publishing. An index rebuild is very easy to spot because of all the file operations involved.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s