Throttling Lucene index optimizations – Sitecore 7 Content Search

Reason

  • Sitecore runs a Lucene index optimization each time an item is updated (e.g. saved).
    Usages of LuceneUpdateContext.Optimize().

    Usages of LuceneUpdateContext.Optimize().

  • Optimizing a large index is an I/O intensive task and can take a while to complete.
  • During optimization the index is locked.
  • When editors change content at a higher rate than the optimizations are completed, the optimization tasks queue up, hogging system resources.

A quick fix for the problem is outlined below. If anyone knows of a better way to solve this issue, please let me know.

  • Override the Sitecore.ContentSearch.LuceneProvider.LuceneIndex implementation used by the indices sitecore_master_index and sitecore_web_index.
  • Override the Sitecore.ContentSearch.LuceneProvider.LuceneUpdateContext implementation and “disable” the Optimize() method.
  • Implement an agent which optimizes the indexes with regular intervals; we’re going to use the existing scheduled index optimization task as inspiration (Sitecore.ContentSearch.Tasks.Optimize).

This article is based on Sitecore 7.5 rev. 150130.

Code

It’s necessary to override the LuceneIndex implementation in order to return our custom IProviderUpdateContext.

using Sitecore.ContentSearch;
using Sitecore.ContentSearch.LuceneProvider;
using Sitecore.ContentSearch.Maintenance;

public class Index : LuceneIndex
{
  private IProviderUpdateContext _updateContext;

  public Index(string name, string folder, IIndexPropertyStore propertyStore, string group)
    : base(name, folder, propertyStore, group)
  {
  }

  public Index(string name, string folder, IIndexPropertyStore propertyStore)
    : base(name, folder, propertyStore)
  {
  }

  protected Index(string name)
    : base(name)
  {
  }

  public override IProviderUpdateContext CreateUpdateContext()
  {
    EnsureInitialized();
    ICommitPolicyExecutor commitPolicyExecutor = (ICommitPolicyExecutor)CommitPolicyExecutor.Clone();
    commitPolicyExecutor.Initialize(this);
    _updateContext = new UpdateContext(this, commitPolicyExecutor);
    return _updateContext;
  }

  protected override void Dispose(bool isDisposing)
  {
    base.Dispose(isDisposing);
    if (_updateContext != null)
      _updateContext.Dispose();
  }
}

The UpdateContext implementation shown below disables the Optimize() method. It still provides the means to run an actual index optimization via the OptimizeIndex() method, which will be called from the index optimization agent.

using System;
using System.Diagnostics;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Diagnostics;
using Sitecore.ContentSearch.LuceneProvider;

public class UpdateContext : LuceneUpdateContext
{
  public UpdateContext(ILuceneProviderIndex index, ICommitPolicyExecutor commitPolicyExecutor) 
    : base(index, commitPolicyExecutor)
  {
  }

  /// <summary>
  /// Does not optimize the index. Use <c>OptimizeIndex()</c> instead. Sparingly.
  /// </summary>
  public override void Optimize()
  {
    StackTrace stackTrace = new StackTrace();
    CrawlingLog.Log.Debug(GetType().FullName + ".Optimize() called:" + Environment.NewLine + stackTrace);
  }

  public void OptimizeIndex()
  {
    base.Optimize();
  }
}

The OptimizeIndex agent works with any implementation of the IProviderUpdateContext interface, but does an additional type check in case the update context is an instance of our custom class.

using System;
using System.Collections.Generic;
using Sitecore;
using Sitecore.ContentSearch;
using Sitecore.Diagnostics;

public class OptimizeIndex
{
  private readonly List<string> _indexNames = new List<string>();

  public List<string> Indexes
  {
    get
    {
      return _indexNames;
    }
  }

  public void Run()
  {
    foreach (string indexName in Indexes)
    {
      try
      {
        ISearchIndex index = ContentSearchManager.GetIndex(indexName);
        using (IProviderUpdateContext context = index.CreateUpdateContext())
        {
          if (context is UpdateContext)
            ((UpdateContext) context).OptimizeIndex();
          else
            context.Optimize();

          ReportProgress(indexName);
        }
      }
      catch (Exception ex)
      {
        Log.Error(string.Format("Failed to optimize index '{0}'.", indexName), ex, this);
      }
    }
  }

  private void ReportProgress(string indexName)
  {
    if (Context.Job == null)
      return;
    if (Context.Job.Status == null)
      return;
    Context.Job.Status.Processed++;
    Context.Job.Status.Messages.Add(string.Format("Optimized index '{0}'.", indexName));
  }
}

The configuration shown below overrides the LuceneIndex implementation for the Master and Web indices, and sets up an agent which optimizes these indices every hour.
Save it in a .config-file in a subfolder of “App_Config/Include/” (e.g. “App_Config/Include/MyCompany/ScheduledIndexOptimizations.config”).
Modify namespace and assembly names as needed, and adjust the optimization frequency to your needs.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration>
        <indexes>
          <index id="sitecore_master_index">
            <patch:attribute name="type" value="ReasonCodeExample.Index, ReasonCodeExample" />
          </index>
          <index id="sitecore_web_index">
            <patch:attribute name="type" value="ReasonCodeExample.Index, ReasonCodeExample" />
          </index>
        </indexes>
      </configuration>
    </contentSearch>
    <scheduling>
      <agent type="Sitecore.ContentSearch.Tasks.Optimize" method="Run" interval="01:00:00">
        <!-- Uncomment to change the optimization frequency --> 
        <!-- <patch:attribute name="interval" value="06:00:00" /> -->
        <patch:attribute name="type" value="ReasonCodeExample.OptimizeIndex, ReasonCodeExample" />
        <indexes hint="list">
          <master patch:instead="index">sitecore_master_index</master>
          <web>sitecore_web_index</web>
        </indexes>
      </agent>
    </scheduling>
  </sitecore>
</configuration>

Example

Once the optimization agent is running the following log messages should start appearing:

OptimizeIndex agent log entries.

OptimizeIndex agent log entries.

When the log level is set to “Debug” for the Sitecore crawling log, you can check just how many index optimizations are skipped by looking for the following log entries:

Skipped index optimizations.

Skipped index optimizations.

The screenshot shown below shows the default behavior of a clean Sitecore 7.5 rev. 150130 installation, with 1 logged in user (yours truly) making about 5 changes in total.
As can be seen from the crawling log, the Lucene master index is optimized a lot, which is fine when the optimization completes in a millisecond — and horrible if it takes longer to optimize than for the next optimization task to queue up.

Index optimizations in a barebone Sitecore 7.5 solution.

Index optimizations in a barebone Sitecore 7.5 solution.

2 thoughts on “Throttling Lucene index optimizations – Sitecore 7 Content Search

  1. Pingback: The Solr *Optimize Now* Button for Sitecore Use Cases – Sitecore Architecture

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s