Sitecore index based search

Out of the box, Sitecore uses Lucene.NET to index the contents of the Core and Master database. This post explains how to enable and populate the same index for the Web database, and provides a working light weight facade (“ItemRepository”) which retrieves items using that index. The code samples are based on Sitecore 6.5 and C# 4.0.

Note: Even though it’s possible to use Lucene as a “full-text search a la Google”, that is not the aim of the implementation presented here.

Reason

Using Sitecores search index (i.e. Lucene.NET) provides a significant performance gain whenever items are retrieved based on field values or template inheritance, compared to horrors like “item.Axes.GetDescendants()” or “//*[@this=’will kill your site’]”.

The example below shows a performance comparison of Sitecore.Data.Query.Query and the aforementioned ItemRepository which uses an index based lookup. The folder used as the query root contains 5000 child items.

private void ProfileQueries()
{
  Item match = null;
  Profiler.StartOperation("ItemRepository.GetByFieldValue");
  match = new ItemRepository().GetByFieldValue("ProductName", "Product123").FirstOrDefault();
  Profiler.EndOperation();
  Assert.IsNotNull(match, "Didn't find item.");

  Item productFolder = Sitecore.Context.Database.GetItem("/sitecore/content/Home/Products");
  Profiler.StartOperation("Query.SelectSingleItem");
  match = Query.SelectSingleItem("child::*[@ProductName='Product123']", productFolder);
  Profiler.EndOperation();
  Assert.IsNotNull(match, "Didn't find item.");
}

Even though all items are cache hits the index based search is more than 100 times faster than using the customary query mechanism.

Code

Sitecore only indexes the Core and Master database by default, which means a so called “database crawler” must be added for the Web database.
Copy the XML below into a config-file (e.g. “ItemRepository.config”) and put it into your “[webroot]/App_Config/Include”-folder. Sitecore will merge these settings into the main Web.config file when your site starts up.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <!-- Automatically update web database index when items are being published -->
    <databases>
      <database id="web">
        <Engines.HistoryEngine.Storage>
          <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
            <param connectionStringName="$(id)" />
            <EntryLifeTime>30.00:00:00</EntryLifeTime>
          </obj>
        </Engines.HistoryEngine.Storage>
        <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
      </database>
    </databases>

    <!-- Enable indexing of web database -->
    <search>
      <configuration>
        <indexes>
          <index id="system">
            <locations>
              <web type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
                <Database>web</Database>
                <Tags>web content</Tags>
              </web>
            </locations>
          </index>
        </indexes>
      </configuration>
    </search>
  </sitecore>
</configuration>

When the config-file has been deployed, a rebuild of the search index can be triggered from the Sitecore control panel (Sitecore start button -> Control Panel -> Database -> Rebuild the Search Index).
Once the index is up and running, create something like the class shown below for your solution. Check the code comments for usage guidelines.

// Lucene.Net.dll
// Sitecore.Kernel.dll
public class ItemRepository
{
  private readonly Index _index;

  public ItemRepository() :
    this(SearchManager.SystemIndex)
  {
  }

  public ItemRepository(Index index)
  {
    Assert.ArgumentNotNull(index, "index");
    _index = index;
  }

  /// <summary>
  /// Finds all items which inherit from the specified template (i.e. supports template based polymorphism).
  /// </summary>
  public IEnumerable<Item> GetByTemplate(ID templateId)
  {
    // Base template IDs are stored without dashes and brackets (e.g. "00E76C234BC54EEF86C571A2423A6E6C").
    string formattedId = templateId.Guid.ToString("N");
    return GetByFieldValue(BuiltinFields.AllTemplates, formattedId);
  }

  /// <summary>
  /// Finds all items which contain the specified ID in the field <c>fieldName</c>.
  /// Note that <c>LinkDatabase.GetReferences(Item)</c> performs a lot better than this method.
  /// </summary>
  public IEnumerable<Item> GetByFieldValue(string fieldName, ID id)
  {
    return GetByFieldValue(fieldName, FormatAndEscapeId(id));
  }

  /// <summary>
  /// When looking for IDs in e.g. link fields and treelist fields, the ID must be
  /// formatted to conform with the format used by Sitecore (i.e. "{00E76C23-4BC5-4EEF-86C5-71A2423A6E6C}").
  /// The query wildcard "*" is pre- and appended to allow matches against e.g. the media ID
  /// in the raw image field value "<image mediaid="{1685EDA0-CD61-4061-93E8-33D9125B1F90}" [...] />".
  /// </summary>
  private string FormatAndEscapeId(ID id)
  {
    string formattedId = id.Guid.ToString("B");
    string escapedFormattedId = QueryParser.Escape(formattedId);
    return string.Format("*{0}*", escapedFormattedId);
  }

  public IEnumerable<Item> GetByFieldValue(string fieldName, string fieldValue = "*")
  {
    return GetByFieldValue(fieldName, fieldValue, Sitecore.Context.Language, Sitecore.Context.Database);
  }

  /// <summary>
  /// Find items that contain <c>fieldValue</c> in field <c>fieldName</c>. Asterisk may be used as a wildcard.
  /// </summary>
  public IEnumerable<Item> GetByFieldValue(string fieldName, string fieldValue, Language language, Database database)
  {
    Assert.ArgumentNotNullOrEmpty(fieldName, "fieldName");
    Assert.ArgumentNotNullOrEmpty(fieldValue, "fieldValue");
    Assert.ArgumentNotNull(language, "language");
    Assert.ArgumentNotNull(database, "database");

    Query query = CreateQuery(database, language, fieldName, fieldValue);
    using (IndexSearchContext searchContext = _index.CreateSearchContext())
      return GetQueryResult(searchContext, query);
  }

  private BooleanQuery CreateQuery(Database database, Language language, string fieldName, string fieldValue)
  {
    BooleanQuery query = new BooleanQuery();
    query.Add(CreateTermClause(BuiltinFields.Database, database.Name));
    query.Add(CreateTermClause(BuiltinFields.Language, language.Name));
    query.Add(CreateWildCardClause(fieldName, fieldValue));
    return query;
  }

  private BooleanClause CreateTermClause(string field, string value)
  {
    TermQuery query = new TermQuery(new Term(field.ToLowerInvariant(), value));
    return new BooleanClause(query, BooleanClause.Occur.MUST);
  }

  private BooleanClause CreateWildCardClause(string field, string value)
  {
    QueryParser queryParser = new QueryParser(field.ToLowerInvariant(), _index.Analyzer);
    queryParser.SetAllowLeadingWildcard(true);
    string escapedValue = QueryParser.Escape(value);
    Query query = queryParser.Parse(escapedValue);
    return new BooleanClause(query, BooleanClause.Occur.MUST);
  }

  private IEnumerable<Item> GetQueryResult(IndexSearchContext searchContext, Query query)
  {
    SearchHits hits = searchContext.Search(query);
    if (hits == null)
      return Enumerable.Empty<Item>();

    if (hits.Length == 0)
      return Enumerable.Empty<Item>();

    SearchResultCollection results = hits.FetchResults(0, hits.Length);
    return Filter(results).ToArray();
  }

  private IEnumerable<Item> Filter(IEnumerable<SearchResult> results)
  {
    // Remove Standard Value holders etc. from results as needed.
    return from result in results
            let item = result.GetObject<Item>()
            where item.Paths.IsContentItem
            select item;
  }
}

Example

The following code shows an arbitrary repository which uses the ItemRepository to retrieve items derived from a simple Product template.

Note that the repository has no knowledge about the hierarchical item structure within Sitecore. It doesn’t matter whether your items are in a single large pile, nested thousandfold or scattered all over – it will retrieve them regardless of where they’re located.

public static class ProductRepository
{
  private static readonly ItemRepository Repository = new ItemRepository();
  private static readonly ID ProductTemplateId = new ID("{00E76C23-4BC5-4EEF-86C5-71A2423A6E6C}");
  private const string CategoryFieldName = "ProductCategory";
  private const string RelatedProductsFieldName = "ProductRelatedProducts";

  public static IEnumerable<Item> GetAll()
  {
    return Repository.GetByTemplate(ProductTemplateId);
  }

  public static IEnumerable<Item> GetByCategory(string category)
  {
    return Repository.GetByFieldValue(CategoryFieldName, category);
  }

  public static IEnumerable<Item> GetRelated(Item product)
  {
    MultilistField relatedProductsField = product.Fields[RelatedProductsFieldName];
    IEnumerable<Item> relatedProducts = relatedProductsField.GetItems() ?? new Item[0];

    // Get all products which reference the current product.
    IEnumerable<Item> referencingProducts = Repository.GetByFieldValue(RelatedProductsFieldName, product.ID);
    return relatedProducts.Union(referencingProducts);
  }
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s