Indexing ALL fields – Sitecore 7 Content Search

Reason

To properly store Sitecore field values in Content Search indices, each template field requires a matching entry in the Content Search configuration; each time a new template field is added to a solution which utilizes the Content Search API, a matching configuration entry must be added as well.

When working on a solution containing hundreds of fields, it quickly becomes clear that writing and maintaining hundreds of matching configuration entries requires a lot of work as the solution evolves. Making the most of the new Content Search features (e.g. Linq-to-Sitecore) can hence seem difficult.

The following article provides a custom FieldMap implementation which creates the required search field configuration entries at runtime, instead of relying on static configuration files.
Template fields both new and old are automatically included and stored in the indices, without developers having to write and maintain matching configuration entries.

Although it’s possible to achieve a similar result using the various configuration options provided by the standard Content Search API, based on my current level of experience with Sitecore 7 the provided implementation is preferable to the configuration file based approach: it’s suitable for the majority of template fields which don’t require any special boosting or otherwise altered indexing behavior, while requiring less maintenance. Template fields which require granular control during the indexing process can still be configured using .config-files as usual.

Examples are based on .NET 4.5 and Sitecore 7.1 rev. 130926.

Note: The indexAllFields setting found in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config DOES index all fields when set to true, but DOES NOT index any field values (i.e. same as setting the storageType-attribute to NO for a field configuration).
I don’t know whether this is intended behavior or a bug; personally I haven’t found much use for storing empty fields in my indexes, as there would be no values to run Linq-to-Sitecore queries against.

Code

When instantiated during the crawling process, the custom FieldMap works like this:

  1. Load all “field-type-to-system-type” mappings configured in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config.
  2. Retrieve all templates relative to a configurable root path, defined via the setting ContentSearch.DataTemplateRootPath.
  3. Create an appropriate Lucene search field configuration for each template field, based on the “field-type-to-system-type” mappings.
  4. Add the search field configuration to the field map.

The end result is a field map which contains both field configurations from regular .config-files as well as those added runtime.

Configuration

To replace the default FieldMap with the custom implementation, save the configuration shown below to a .config-file (e.g. “RuntimeFieldMap.config”) and place it in a subfolder of the “App_Config/Include”-folder (e.g. “App_Config/Include/CompanyName/”).
Thanks to Stephen Pope for pointing out the option to place include files in a subfolder to achieve a proper inclusion order – it’s simpler than juggling with file name prefixes and has the added benefit of a clean separation between standard Sitecore and custom include files.
Modify namespace and assembly names as needed, and insert a proper value for the ContentSearch.DataTemplateRootPath setting:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <settings>
      <setting name="ContentSearch.DataTemplateRootPath" value="/sitecore/templates/My companies usual template root folder" />
    </settings>

    <contentSearch>
      <configuration>
        <defaultIndexConfiguration>
          <fieldMap>
            <patch:attribute name="type">NamespaceName.RuntimeFieldMap, AssemblyName</patch:attribute>
          </fieldMap>
        </defaultIndexConfiguration>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

Depending on how your company organizes Sitecore templates, the DataTemplateRootPath setting might or might not make sense to you. Shown below is the template placement in my example solution:

Template hierarchy screenshot

As can be seen above, all my templates are located under a common root, i.e. the “data template root path”. If the concept doesn’t fit with the placement of templates in your solution, simply change the template retrieval logic accordingly in the GetDataTemplates(...)-method shown below.

FieldMap

The following implementation is written for use with Lucene indices.

using Sitecore.Configuration;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Diagnostics;
using Sitecore.ContentSearch.LuceneProvider;
using Sitecore.ContentSearch.LuceneProvider.Analyzers;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Data.Managers;
using Sitecore.Diagnostics;
using Sitecore.Xml;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

public class RuntimeFieldMap : IFieldMap
{
  private readonly string _dataTemplateRootPath;
  private readonly IDictionary<string, string> _fieldToSystemTypeMappings = new Dictionary<string, string>();
  private readonly IFieldMap _innerFieldMap = new FieldMap();

  public RuntimeFieldMap()
  {
    _dataTemplateRootPath = Settings.GetSetting("ContentSearch.DataTemplateRootPath");
    Assert.IsNotNullOrEmpty(_dataTemplateRootPath, "The setting 'ContentSearch.DataTemplateRootPath' is null or empty.");
    LoadFieldToSystemTypeMappings();
    AddDataTemplateFields();
  }

  /// <summary>
  /// Load field type mappings from the default index configuration.
  /// </summary>
  private void LoadFieldToSystemTypeMappings()
  {
    XmlNode fieldTypesConfigNode = Factory.GetConfigNode("/sitecore/contentSearch/configuration/defaultIndexConfiguration/fieldMap/fieldTypes", true);
    foreach (XmlNode fieldTypeConfiguration in XmlUtil.GetChildElements("fieldType", fieldTypesConfigNode))
    {
      string fieldTypeName = XmlUtil.GetAttribute("fieldTypeName", fieldTypeConfiguration);
      string systemTypeName = XmlUtil.GetAttribute("type", fieldTypeConfiguration);
      _fieldToSystemTypeMappings.Add(fieldTypeName.ToLowerInvariant(), systemTypeName);
    }
  }

  private void AddDataTemplateFields()
  {
    IEnumerable<TemplateItem> dataTemplates = GetDataTemplates(Factory.GetDatabase("master"));
    foreach (TemplateItem template in dataTemplates)
    {
      AddTemplateFields(template);
    }
  }

  private IEnumerable<TemplateItem> GetDataTemplates(Database database)
  {
    Assert.ArgumentNotNull(database, "database");
    Assert.IsNotNull(database.Templates, "database.Templates");
    TemplateItem[] templates = database.Templates.GetTemplates(LanguageManager.DefaultLanguage);
    Assert.IsTrue(templates.Any(), "No templates found in database {0}.", database.Name);
    TemplateItem[] dataTemplates = templates.Where(IsDataTemplate).ToArray();
    Assert.IsTrue(dataTemplates.Any(), "No data templates found in database {0}.", database.Name);
    return dataTemplates;
  }

  private bool IsDataTemplate(TemplateItem templateItem)
  {
    if (templateItem == null)
      return false;
    if (templateItem.InnerItem == null)
      return false;
    if (!templateItem.InnerItem.Paths.FullPath.StartsWith(_dataTemplateRootPath, StringComparison.InvariantCultureIgnoreCase))
      return false;
    return templateItem.OwnFields.Any();
  }

  private void AddTemplateFields(TemplateItem componentTemplate)
  {
    foreach (TemplateFieldItem field in componentTemplate.OwnFields)
    {
      AddTemplateField(field);
    }
  }

  private void AddTemplateField(TemplateFieldItem field)
  {
    XmlNode node = CreateLuceneFieldConfiguration(field);
    AddFieldByFieldName(node);
  }

  private XmlNode CreateLuceneFieldConfiguration(TemplateFieldItem field)
  {    
    XElement fieldConfiguration = new XElement("field");
    fieldConfiguration.Add(new XAttribute("fieldName", field.Name));
    fieldConfiguration.Add(new XAttribute("storageType", "YES"));
    fieldConfiguration.Add(new XAttribute("indexType", "TOKENIZED"));
    fieldConfiguration.Add(new XAttribute("vectorType", "NO"));
    fieldConfiguration.Add(new XAttribute("boost", "1f"));
    fieldConfiguration.Add(new XAttribute("type", GetMappedSystemTypeName(field.Type)));
    fieldConfiguration.Add(new XAttribute("settingType", GetSystemTypeName<LuceneSearchFieldConfiguration>()));
    if (ShouldAddKeywordAnalyzerConfiguration(field))
      AddKeywordAnalyzerConfiguration(fieldConfiguration);
    return XmlUtil.LoadXml(fieldConfiguration.ToString()).FirstChild;
  }

  private string GetMappedSystemTypeName(string fieldTypeName)
  {
    if (_fieldToSystemTypeMappings.ContainsKey(fieldTypeName.ToLowerInvariant()))
      return _fieldToSystemTypeMappings[fieldTypeName.ToLowerInvariant()];
    return GetSystemTypeName<string>();
  }

  private string GetSystemTypeName<T>()
  {
    return string.Format("{0}, {1}", typeof(T).FullName, typeof(T).Assembly.GetName().Name);
  }

  /// <summary>
  /// This method assumes that only fields indexed as string content should 
  /// use a keyword analyzer.
  /// </summary>
  private bool ShouldAddKeywordAnalyzerConfiguration(TemplateFieldItem field)
  {
    string systemTypeName = GetMappedSystemTypeName(field.Type);
    return typeof(string).FullName.Equals(systemTypeName, StringComparison.InvariantCultureIgnoreCase);
  }

  private void AddKeywordAnalyzerConfiguration(XElement fieldConfiguration)
  {
    XAttribute analyzerTypeAttribute = new XAttribute("type", GetSystemTypeName<LowerCaseKeywordAnalyzer>());
    XElement analyzerConfiguration = new XElement("analyzer", analyzerTypeAttribute);
    fieldConfiguration.Add(analyzerConfiguration);
  }

  public void AddFieldByFieldName(XmlNode configNode)
  {
    string message = string.Format("{0}.AddFieldByFieldName: {1}", GetType(), XmlUtil.GetXml(configNode, true));
    CrawlingLog.Log.Debug(message);
    _innerFieldMap.AddFieldByFieldName(configNode);
  }

  public void AddFieldByFieldTypeName(XmlNode configNode)
  {
    string message = string.Format("{0}.AddFieldByFieldTypeName: {1}", GetType(), XmlUtil.GetXml(configNode, true));
    CrawlingLog.Log.Debug(message);
    _innerFieldMap.AddFieldByFieldTypeName(configNode);
  }

  public AbstractSearchFieldConfiguration GetFieldConfiguration(string fieldName)
  {
    return _innerFieldMap.GetFieldConfiguration(fieldName);
  }

  public AbstractSearchFieldConfiguration GetFieldConfiguration(IIndexableDataField field)
  {
    return _innerFieldMap.GetFieldConfiguration(field);
  }
}

Example

Shown below is some of the debug information logged by the RuntimeFieldMap:
Crawling log debug samples

When using Linq-to-Sitecore the fields that were added runtime can be retrieved from the Sitecore indices, incl. supported field values:
Runtime configured search fields example code

Runtime configured search fields example output

Index all the fields

4 thoughts on “Indexing ALL fields – Sitecore 7 Content Search

  1. I think there is some confusion here based on what I understand. You are attempting to force the IndexAllFields config setting to STORE all field values in the index. This will bloat the size of your index and is not the intention of the setting.

    There are two main things: indexing a field and storing it.

    Indexing a field allows Lucene to analyze it at indexing time and makes its searchable. So you can use the LINQ layer to filter content by these properties. However you cannot fill the properties and use the POCO to render the data because it is not necessarily stored in the index. You need to go back to native Sitecore item to get and render out the data.

    Storing a field in the index means that if you get that document as a result, you can actually access the value directly from the index and don’t have to go back to the Sitecore database. I believe this is what you are doing with your code here. In theory, your setting should actually be called StoreAllFieldsInIndex and turning it on will greatly increase the size of your index.

    This blog post may help as well:
    http://www.sitecore.net/Learn/Blogs/Technical-Blogs/Sitecore-7-Development-Team/Posts/2013/05/Sitecore-7-Performance-Tuning-Part-3.aspx

    • Hi Mark,
      You’re right, I didn’t know about the distinction between indexing and storing a field value. Thanks for the clarification!
      The solution where this was implemented (currently) doesn’t suffer from a large index size; not having to write twice the plumbing code by fetching values directly from the index seemed like the easiest way to go at the time.

    • Mark. No offense but I’ve seen your posts before on other sites and you seem to go around attempting to point out what others have done wrong. I’m not a fan of your style.

  2. Pingback: Generic extension methods for Sitecore ContentSearch - Laub plus Co

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s