In my previous post I didn't write much on how Server Push works with caching beside the fact that it does. This can be easily verified by looking at Network tab in Chrome Developer Tools while making subsequent requests to the sample action.

Chrome Developer Tools Network Tab - Warm Server Push

But this isn't the full picture. The chrome://net-internals/#http2 tab can be used in order to get more details. First screenshot below represents the cold scenario while second represents the warm one.

Chrome Net Internals - Cold Server Push

Chrome Net Internals - Warm Server Push

The highlighted column informs that there were resources pushed from server which haven't been used. The "View live HTTP/2 sessions" provides even more detailed debug information, down to the single HTTP/2 frames. Going through those reveals that HTTP2_SESSION_RECV_PUSH_PROMISE, HTTP2_SESSION_RECV_HEADERS and HTTP2_SESSION_RECV_DATA frames are present for both pushed resources. This shouldn't affect latency as the stream carrying HTML is supposed to have the highest priority, which means that server should start sending (and browser receiving) the data as soon as something is available (even if sending/receiving for pushed resources is not finished). On the other hand the bandwidth is being lost and loosing bandwidth often means loosing money. As nobody wants to be loosing money, this is something that needs to be resolved and Cache Digest aims at doing exactly that.

How Cache Digest solves the issue

The Cache Digest specification proposal (in its current shape) uses Golomb-Rice coded Bloom filter in order to provide server with information about the resources which are already contained by client cache. The algorithm uses URLs and ETags for hashing which ensures proper identification (and with correct ETags also freshness) of the resources. The server can query the digest and if the resource which was intended for push is present, the push can be skipped.

There is no native browser support for Cache Digest yet (at least that I know of), but there are two ways in which HTTP/2 servers (http2server, h2o) and third party libraries are trying to provide it:

  • On the client side with usage of Service Workers and Cache API
  • On server side by providing a cookie-based fallback

Implementing Cache Digest with Service Worker and Cache API

One of third party libraries providing Cache Digest support on client side is cache-digest-immutable. It combines Cache Digest with HTTP Immutable Responses proposal in order to avoid overly aggressive caching. This means that only resources with cache-control header containing immutable extension will be considered. In order to add such extension to pushed resource the location and customHeaders elements of web.config can be used.

<?xml version="1.0"?>
<configuration>
  ...
  <location path="content/css/normalize.css">
    <system.webServer>
      <httpProtocol>
        <customHeaders>
          <add name="Cache-Control" value="max-age=31536000, immutable" />
        </customHeaders>
      </httpProtocol>
    </system.webServer>
  </location>
  <location path="content/css/site.css">
    <system.webServer>
      <httpProtocol>
        <customHeaders>
          <add name="Cache-Control" value="max-age=31536000, immutable" />
        </customHeaders>
      </httpProtocol>
    </system.webServer>
  </location>
  ...
</configuration>

Quick look at responses after running the demo application shows that headers are being added properly. Now the Service Worker can be set up accordingly to the instructions. From this point the subsequent requests to the demo application will contain cache-digest header. In case of the demo application its value is CdR2gA; complete. The part before the ; is the digest (Base64 encoded) while the rest contains flags.

The flags can be parsed easily by checking if corresponding constants are present in the header.

var compareInfo = CultureInfo.InvariantCulture.CompareInfo;
var compareOptions = CompareOptions.IgnoreCase;

bool reset = (compareInfo.IndexOf(cacheDigestHeaderValue, "RESET", compareOptions) >= 0);
bool complete = (compareInfo.IndexOf(cacheDigestHeaderValue, "COMPLETE", compareOptions) >= 0);
bool validators = (compareInfo.IndexOf(cacheDigestHeaderValue, "VALIDATORS", compareOptions) >= 0);
bool stale = (compareInfo.IndexOf(cacheDigestHeaderValue, "STALE", compareOptions) >= 0);

My first attempt to decode the digest has failed miserably with an awful exception. The reason is that the padding is being truncated and it must be added back.

int separatorIndex = cacheDigestHeaderValue.IndexOf(DigestValueSeparator);
string digestValueBase64 = cacheDigestHeaderValue.Substring(0, separatorIndex);

int neededPadding = (digestValueBase64.Length % 4);
if (neededPadding > 0)
{
    digestValueBase64 += new string('=', 4 - neededPadding);
}
byte[] digestValue = Convert.FromBase64String(digestValueBase64);

Now the digest query algorithm can be implemented. The algorithm assumes going through digest value bit by bit so first step was changing array of bytes into array of bools.

bool[] bitArray = new bool[digestValue.Length * 8];
int bitArrayIndex = bitArray.Length - 1;

for (int byteIndex = digestValue.Length - 1; byteIndex  >= 0; byteIndex --)
{
    byte digestValueByte = digestValue[byteIndex];
    for (int byteBitIndex = 0; byteBitIndex < 8; byteBitIndex++)
    {
        bitArray[bitArrayIndex--] = ((digestValueByte % 2 == 0) ? false : true);
        digestValueByte = (byte)(digestValueByte >> 1);
    }
}

The algorithm also requires reading a series of bits as integer in few places, below method takes care of that.

private static uint ReadUInt32FromBitArray(bool[] bitArray, int starIndex, int length)
{
    uint result = 0;

    for (int bitIndex = starIndex; bitIndex < (starIndex + length); bitIndex++)
    {
        result <<= 1;
        if (bitArray[bitIndex])
        {
            result |= 1;
        }
    }

    return result;
}

With this method count of URLs and probability can be easily retrieved.

// Read the first 5 bits of digest-value as an integer;
// let N be two raised to the power of that value.
int count = (int)Math.Pow(2, ReadUInt32FromBitArray(digestValueBitArray, 0, 5));

// Read the next 5 bits of digest-value as an integer;
// let P be two raised to the power of that value.
int log2Probability = (int)ReadUInt32FromBitArray(digestValueBitArray, 5, 5);
uint probability = (uint)Math.Pow(2, log2Probability);

The part which reads hashes requires keeping in mind that there might be additional 0 bits at the end, so there is a risk of going over the array without proper checks. I've decided to keep the hashes in HashSet for quicker lookups later.

HashSet hashes = new HashSet();

// Let C be -1.
long hash = -1;

int hashesBitIndex = 10;
while (hashesBitIndex < bitArray.Length)
{
    // Read ‘0’ bits until a ‘1’ bit is found; let Q bit the number of ‘0’ bits.
    uint q = 0;
    while ((hashesBitIndex < bitArray.Length) && !bitArray[hashesBitIndex])
    {
        q++;
        hashesBitIndex++;
    }

    if ((hashesBitIndex + log2Probability) < bitArray.Length)
    {
        // Discard the ‘1’.
        hashesBitIndex++;

        // Read log2(P) bits after the ‘1’ as an integer. Let R be its value.
        uint r = ReadUInt32FromBitArray(bitArray, hashesBitIndex, log2Probability);

        // Let D be Q * P + R.
        uint d = (q * probability) + r;

        // Increment C by D + 1.
        hash = hash + d + 1;

        hashes.Add((uint)hash);
        hashesBitIndex += log2Probability;
    }
}

Last thing which is needed to query the digest value is the ability to calculate a hash of given URL (and optionally ETag). The important thing here is that DataView used by Service Worker in order to convert SHA-256 to integer is internally using Big-endian byte order so the implementation must accommodate for that.

// URL should be properly percent-encoded (RFC3986).
string key = url;

// If validators is true and ETag is not null.
if (validators && !String.IsNullOrWhiteSpace(entityTag))
{
    // Append ETag to key.
    key += entityTag;
}

// Let hash-value be the SHA-256 message digest (RFC6234) of key, expressed as an integer.
byte[] hash = new SHA256Managed().ComputeHash(Encoding.UTF8.GetBytes(key));
uint hashValue = BitConverter.ToUInt32(hash, 0);
if (BitConverter.IsLittleEndian)
{
    hashValue = (hashValue & 0x000000FFU) << 24 | (hashValue & 0x0000FF00U) << 8
                | (hashValue & 0x00FF0000U) >> 8 | (hashValue & 0xFF000000U) >> 24;
}

// Truncate hash-value to log2(N*P) bits.
int hashValueLength = (int)Math.Log(count * probability, 2);
hashValue = (hashValue >> (_hashValueLengthUpperBound - hashValueLength))
            & (uint)((1 << hashValueLength) - 1);

I've combined all this logic into CacheDigestHashAlgorithm, CacheDigestValue and CacheDigestHeaderValue classes, which I've used in PushPromiseAttribute in order to conditionally push resources.

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, ...)]
public class PushPromiseAttribute : FilterAttribute, IActionFilter
{
    ...

    public void OnActionExecuting(ActionExecutingContext filterContext)
    {
        ...
        HttpRequestBase request = filterContext.HttpContext.Request;
        CacheDigestHeaderValue cacheDigest = (request.Headers["Cache-Digest"] != null) ?
            new CacheDigestHeaderValue(request.Headers["Cache-Digest"]) : null;

        IEnumerable pushPromiseContentPaths = _pushPromiseTable.GetPushPromiseContentPaths(
            filterContext.ActionDescriptor.ControllerDescriptor.ControllerName,
            filterContext.ActionDescriptor.ActionName);

        foreach (string pushPromiseContentPath in pushPromiseContentPaths)
        {
            string pushPromiseContentUrl = GetAbsolutePushPromiseContentUrl(
                filterContext.RequestContext,
                pushPromiseContentPath);

            if (!(cacheDigest?.QueryDigest(pushPromiseContentUrl) ?? false))
            {
                filterContext.HttpContext.Response.PushPromise(pushPromiseContentPath);
            }
        }
    }
}

The easiest way to verify the functionality is to debug the demo application or check "View live HTTP/2 sessions" in chrome://net-internals/#http2 tab. After initial request the Service Worker will take over fetching of the resources by adding cache-digest header to the subsequent requests and serving cached resources with help of Cache API. On the server side the cache-digest header will be picked up and push properly skipped.

Chrome Developer Tools Network Tab - Service Worker

Chrome Net Internals - Service Worker

The idea behind the cookie-based fallback is very simple - the server generates the cache digest by itself and passes it between requests as a cookie. Important difference is that only the digest value can be stored (as cookies doesn't allow semicolon and white spaces in values) so server needs to assume the flags. The algorithm for generating the digest value is similar to the one for reading it, so I'll skip it here (for interested ones it is splited into CacheDigestValue.FromUrls and CacheDigestValue.ToByteArray methods) and move on to changes in PushPromiseAttribute.

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, ...)]
public class PushPromiseAttribute : FilterAttribute, IActionFilter
{
    ...

    public bool UseCookieBasedCacheDigest { get; set; }

    public uint CacheDigestProbability { get; set; }
    ...

    public void OnActionExecuting(ActionExecutingContext filterContext)
    {
        ...
        HttpRequestBase request = filterContext.HttpContext.Request;

        CacheDigestHeaderValue cacheDigest = null;
        if (UseCookieBasedCacheDigest)
        {
            cacheDigest = (request.Cookies["Cache-Digest"] != null) ?
                new CacheDigestHeaderValue(
                    CacheDigestValue.FromBase64String(request.Cookies["Cache-Digest"].Value)) : null;
        }
        else
        {
            cacheDigest = (request.Headers["Cache-Digest"] != null) ?
                new CacheDigestHeaderValue(request.Headers["Cache-Digest"]) : null;
        }

        IDictionary cacheDigestUrls = new Dictionary();

        IEnumerable pushPromiseContentPaths = _pushPromiseTable.GetPushPromiseContentPaths(
            filterContext.ActionDescriptor.ControllerDescriptor.ControllerName,
            filterContext.ActionDescriptor.ActionName);

        foreach (string pushPromiseContentPath in pushPromiseContentPaths)
        {
            string pushPromiseContentUrl = GetAbsolutePushPromiseContentUrl(
                filterContext.RequestContext,
                pushPromiseContentPath);

            if (!(cacheDigest?.QueryDigest(absolutePushPromiseContentUrl) ?? false))
            {
                filterContext.HttpContext.Response.PushPromise(pushPromiseContentPath);
            }
            cacheDigestUrls.Add(absolutePushPromiseContentUrl, null);
        }

        if (UseCookieBasedCacheDigest)
        {
            HttpCookie cacheDigestCookie = new HttpCookie("Cache-Digest")
            {
                Value = CacheDigestValue.FromUrls(cacheDigestUrls, CacheDigestProbability)
                    .ToBase64String(),
                Expires = DateTime.Now.AddYears(1)
            };
            filterContext.HttpContext.Response.Cookies.Set(cacheDigestCookie);
        }
    }
}

There shouldn't be nothing surprising here, but the code exposes couple of limitations when cookie-based fallback is being used.

First of all there is no way to recreate the full list of pushed resources from the digest value. The code above creates new cookie value during every request which is good enough if we always attempt to push the same resources, but if we needs to push something different for a specific action we will loose the information about resources which are not related to that action. One possible optimization here would be to store in digest only resources which are supposed to be pushed almost always and in case of actions which require smaller subset keep the information that all the needed resources were already in the digest (which means that value doesn't have to be changed). Another approach would be to keep a registry of every resource pushed for given session or client, but with such detailed registry using additionally a cookie becomes questionable.

The other thing are checks for resources freshness. The above implementation works well if names can be used for freshness control (for example they contain versions) but in other cases we will not know that a resource has expired. In such scenarios the validators flag should be set to true and ETags provided.

Final thoughts

The Cache Digest seems to be the missing piece which allows fine tuning of HTTP/2 Server Push. There is a number of open issues around the proposal so things might change, but the idea is already being adopted by some of the Web Servers with HTTP/2 support. When the proposal will start closing to its final shape hopefully browsers will start providing native support.

One of the new features in HTTP/2 is Server Push. It allows the server to send resources to the browser without having to wait for the browser to request it. Normally the browser requests needed resources after receiving and parsing the HTML. That creates cost of additional Round Trip Times. If we push critical CSS and JS during the initial request the cost of additional Round Trip Times can be minimized. Also the pushed resources can be cached (which was impossible in cases where inlining was being used for the same purpose).

I don't intend to provide comprehensive information regarding HTTP/2 Server Push here as there is a lot of resources on the web doing that already, all I want is to play a little with Server Push in context of ASP.NET MVC powered by IIS. For that purposes I have created a demo application consisting of following model, view and controller.

public interface IStarWarsCharacters
{
    IEnumerable<string> GetCharacters();
}

public class LazyStarWarsCharacters : IStarWarsCharacters
{
    public IEnumerable<string> GetCharacters()
    {
        foreach(Character character in StarWarsContext.Characters)
        {
            yield return String.Format("{0} ({1})", character.Name,
                character.HomeworldId.HasValue ? StarWarsContext.Planets.First(p => p.Id == character.HomeworldId.Value).Name : "N/A");
            System.Threading.Thread.Sleep(10);
        }
    }
}
@model IStarWarsCharacters
<!DOCTYPE html>
<html>
    <head>
        <link rel="stylesheet" href="~/content/normalize.css">
        <link rel="stylesheet" href="~/content/site.css">
    </head>
    <body>
        <ul>
            @foreach (string starWarsCharacter in Model.GetCharacters())
            {
                <li>@starWarsCharacter</li>
            }
        </ul>
        ...
    </body>
</html>
public class DemoController : Controller
{
    [ActionName("server-push")]
    public ActionResult ServerPush()
    {
        return View("ServerPush", new LazyStarWarsCharacters());
    }
}

For this demo application I've captured the basic timings as visible on screen shot below.

Chrome Developer Tools Network Tab - No Server Push

Proof of Concept HtmlHelper

The IIS built in support for Server Push is exposed to the ASP.NET through HttpResponse.PushPromise method which takes virtual path to the resource which is supposed to be pushed. My first idea for using this method in ASP.NET MVC was HtmlHelper. The idea is simple, whenever a link element is being rendered the helper registers the push as well.

public static class PushPromiseExtensions
{
    public static IHtmlString PushPromiseStylesheet(this HtmlHelper htmlHelper, string contentPath)
    {
        UrlHelper urlHelper = new UrlHelper(htmlHelper.ViewContext.RequestContext);

        htmlHelper.ViewContext.RequestContext.HttpContext.Response.PushPromise(contentPath);

        TagBuilder linkTagBuilder = new TagBuilder("link");
        linkTagBuilder.Attributes.Add("rel", "stylesheet");
        linkTagBuilder.Attributes.Add("href", urlHelper.Content(contentPath));

        return new HtmlString(linkTagBuilder.ToString());
    }
}

The only thing which needs to be done is replacing the links to the resources which should be pushed with calls to the helper.

@model IStarWarsCharacters
<!DOCTYPE html>
<html>
    <head>
        @Html.PushPromiseStylesheet("~/content/normalize.css")
        @Html.PushPromiseStylesheet("~/content/site.css")
    </head>
    <body>
        <ul>
            @foreach (string starWarsCharacter in Model.GetCharacters())
            {
                <li>@starWarsCharacter</li>
            }
        </ul>
        ...
    </body>
</html>

Below screen shot represents timings after the change.

Chrome Developer Tools Network Tab - Server Push During Render (Lazy Data Access)

Even with local connection and small resources the difference is visible. The browser receives initial part of push early which results in very short resource retrieval time.

"When to push?" and "What to push?" are very important

To be honest I've cheated - I've prepared the demo application in a way which supported my initial idea. The model has been prepared so the time expensive processing is being moved to the render phase. Building web applications like that is very good practice but often not possible. Usually the data are being grabbed from database, processed and then passed to the view. Below model is closer to that scenario.

public class EagerStarWarsCharacters : IStarWarsCharacters
{
    IEnumerable<string> _characters;

    public EagerStarWarsCharacters()
    {
        List<string> characters = new List<string>();

        foreach (Character character in StarWarsContext.Characters)
        {
            characters.Add(String.Format("{0} ({1})", character.Name,
                character.HomeworldId.HasValue ? StarWarsContext.Planets.First(p => p.Id == character.HomeworldId.Value).Name : "N/A"));
            System.Threading.Thread.Sleep(10);
        }

        _characters = characters;
    }

    public IEnumerable<string> GetCharacters()
    {
        return _characters;
    }
}

Small change to the action will show the impact on timings.

[ActionName("server-push")]
public ActionResult ServerPush()
{
    return View("ServerPush", new EagerStarWarsCharacters());
}

Chrome Developer Tools Network Tab - Server Push During Render (Eager Data Access)

Because the helper is being executed after the time consuming processing and nothing interrupts the view during writing to the response stream, the push is being send after the HTML. We can still see gain when it comes to the single resources retrieval times (no Round Trip Times) but the overall improvement depends on what is being pushed. When large resources (especially in high numbers) are being pushed, a situation can be reached where classic approach with browser utilizing multiple parallel connections turns out to be more efficient.

Moving to ActionFilter

Knowing that pushing too late might be an issue I've started thinking on a way to push as soon as possible. An ActionFilter was a natural choice. First I needed some kind of registration table with mappings between actions and resources to be pushed. I've also needed to be able to get all resources for given action quickly. A simple abstraction over nested dictionary seemed good enough to start with.

public class PushPromiseTable
{
    private readonly IDictionary<string, IDictionary<string, ICollection<string>>> _pushPromiseTable =
        new Dictionary<string, IDictionary<string, ICollection<string>>>();

    public void MapPushPromise(string controller, string action, string contentPath)
    {
        if (!_pushPromiseTable.ContainsKey(controller))
        {
            _pushPromiseTable.Add(controller, new Dictionary<string, ICollection<string>>());
        }

        if (!_pushPromiseTable[controller].ContainsKey(action))
        {
            _pushPromiseTable[controller].Add(action, new List<string>());
        }

        _pushPromiseTable[controller][action].Add(contentPath);
    }

    internal IEnumerable<string> GetPushPromiseContentPaths(string controller, string action)
    {
        IEnumerable<string> pushPromiseContentPaths = Enumerable.Empty<string>();

        if (_pushPromiseTable.ContainsKey(controller))
        {
            if (_pushPromiseTable[controller].ContainsKey(action))
            {
                pushPromiseContentPaths = _pushPromiseTable[controller][action];
            }
        }

        return pushPromiseContentPaths;
    }
}

The attribute implementation was pretty straight forward. All that needed to be done was getting resources from the registration table based on controller and action, and then pushing them all.

[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, Inherited = true, AllowMultiple = false)]
public class PushPromiseAttribute : FilterAttribute, IActionFilter
{
    private PushPromiseTable _pushPromiseTable;

    public PushPromiseAttribute(PushPromiseTable pushPromiseTable)
    {
        if (pushPromiseTable == null)
        {
            throw new ArgumentNullException(nameof(pushPromiseTable));
        }

        _pushPromiseTable = pushPromiseTable;
    }

    public void OnActionExecuted(ActionExecutedContext filterContext)
    { }

    public void OnActionExecuting(ActionExecutingContext filterContext)
    {
        if (filterContext == null)
        {
            throw new ArgumentNullException(nameof(filterContext));
        }

        IEnumerable<string> pushPromiseContentPaths = _pushPromiseTable.GetPushPromiseContentPaths(
            filterContext.ActionDescriptor.ControllerDescriptor.ControllerName,
            filterContext.ActionDescriptor.ActionName);

        foreach (string pushPromiseContentPath in pushPromiseContentPaths)
        {
            filterContext.HttpContext.Response.PushPromise(pushPromiseContentPath);
        }
    }
}

The only thing left to do was registering the ActionFilter with proper configuration and reverting view to its original form.

PushPromiseTable pushPromiseTable = new PushPromiseTable();
pushPromiseTable.MapPushPromise("Demo", "server-push", "~/content/normalize.css");
pushPromiseTable.MapPushPromise("Demo", "server-push", "~/content/site.css");

GlobalFilters.Filters.Add(new PushPromiseAttribute(pushPromiseTable));
@model IStarWarsCharacters
<!DOCTYPE html>
<html>
    <head>
        <link rel="stylesheet" href="~/content/normalize.css">
        <link rel="stylesheet" href="~/content/site.css">
    </head>
    <body>
        <ul>
            @foreach (string starWarsCharacter in Model.GetCharacters())
            {
                <li>@starWarsCharacter</li>
            }
        </ul>
        ...
    </body>
</html>

Thanks to the usage of ActionFilter the improvement is visible in timings again.

Chrome Developer Tools Network Tab - ActionFilter Server Push

What about ASP.NET Core

Unfortunately similar mechanisms are not available in ASP.NET Core yet as the API is not available. There are issues created for both Kestrel and HTTP abstractions to provide support but there are no information on planed delivery.

Some time ago I've written about HtmlHelper providing support for Content Security Policy. The solution presented worked well, but it required quite nasty markup and usage of <text> pseudo element.

@using (Html.BeginCspScript())
{
    <text>
    $(document).ready(function () {
        ...
    });
    </text>
}

ASP.NET Core MVC has introduced a new way for server-side code to participate in rendering HTML elements - TagHelpers. The key aspect of TagHelpers is that they attach to HTML elements and allow for modifying them. This is exactly the functionality which should allow improving the markup of previously mentioned solution, so I've decided to try creating one. I've started with class inheriting from TagHelper.

public class ContentSecurityPolicyTagHelper : TagHelper
{
    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
    }
}

The Process method is the one which will be executed when there will be an HTML element matching the TagHelper. The default matching is based on a naming convention which looks for TagHelper by class name (without the TagHelper suffix). This behavior can be adjusted with help of HtmlTargetElementAttribute. A single instance of HtmlTargetElementAttribute is being treated as logical "and" of all conditions while multiple instances are treated as logical "or". In this case I've decided that it would be nice to support two custom elements (<csp-script> and <csp-style>) as well as standard <script> and <style> elements with asp-csp attribute.

[HtmlTargetElement("csp-style")]
[HtmlTargetElement("style", Attributes = "asp-csp")]
[HtmlTargetElement("csp-script")]
[HtmlTargetElement("script", Attributes = "asp-csp")]
public class ContentSecurityPolicyTagHelper : TagHelper
{
    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
    }
}

The custom elements need to be translated into standard ones as soon as processing starts. Also the custom attribute should be removed.

[HtmlTargetElement("csp-style")]
[HtmlTargetElement("style", Attributes = "asp-csp")]
[HtmlTargetElement("csp-script")]
[HtmlTargetElement("script", Attributes = "asp-csp")]
public class ContentSecurityPolicyTagHelper : TagHelper
{
    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
        SetTagName(output);

        output.Attributes.RemoveAll("asp-csp");
    }

    private void SetTagName(TagHelperOutput output)
    {
        if (output.TagName == "csp-style")
        {
            output.TagName = "style";
        }
        else if (output.TagName == "csp-script")
        {
            output.TagName = "script";
        }
    }
}

Now the actual Content Security Policy related processing can be added. The ContentSecurityPolicyAttribute from previous post is putting all the relevant information into HttpContext.Items from which I needed to retrieve them. ASP.NET Core MVC provides ViewContextAttribute which informs the framework that property should be set with the current ViewContext when TagHelper is being created, this is how access to HttpContext can be achieved.

...
public class ContentSecurityPolicyTagHelper : TagHelper
{
    [ViewContext]
    public ViewContext ViewContext { get; set; }

    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
        ...
    }

    ...
}

After retrieving the needed values from the HttpContext support for nonce can be provided without much effort.

...
public class ContentSecurityPolicyTagHelper : TagHelper
{
    private static IDictionary<string, string> _inlineExecutionContextKeys =
        new Dictionary<string, string>
        {
            { "script", "ScriptInlineExecution" },
            { "style", "StyleInlineExecution" }
        };

    ...

    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
        ...

        ContentSecurityPolicyInlineExecution currentInlineExecutionPolicy =
            (ContentSecurityPolicyInlineExecution)ViewContext
            .HttpContext.Items[_inlineExecutionContextKeys[output.TagName]];

        if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Nonce)
        {
            output.Attributes.Add("nonce", (string)ViewContext.HttpContext.Items["NonceRandom"]);
        }
        else if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Hash)
        {
        }
    }

    ...
}

For the hash support the content of the element is needed. The TagHelper provides a GetChildContentAsync and GetContent methods for this purpose. In order to decide which one to use a check should be done to see if the content has been modified by other TagHelper (in another words if the content should be accessed directly or from the buffer). After obtaining the content the hash can be calculated and added to the list which ContentSecurityPolicyAttribute will use later to generate the response headers. As the content is being obtained by asynchronous method the asynchronous version of Process method must be used.

...
public class ContentSecurityPolicyTagHelper : TagHelper
{
    private static IDictionary<string, string> _inlineExecutionContextKeys =
        new Dictionary<string, string>
        {
            { "script", "ScriptInlineExecution" },
            { "style", "StyleInlineExecution" }
        };

    private static IDictionary<string, string> _hashListBuilderContextKeys =
        new Dictionary<string, string>
        {
            { "script", "ScriptHashListBuilder" },
            { "style", "StyleHashListBuilder" }
        };

    ...

    public override async Task Process(TagHelperContext context, TagHelperOutput output)
    {
        ...

        ContentSecurityPolicyInlineExecution currentInlineExecutionPolicy =
            (ContentSecurityPolicyInlineExecution)ViewContext
            .HttpContext.Items[_inlineExecutionContextKeys[output.TagName]];

        if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Nonce)
        {
            output.Attributes.Add("nonce", (string)ViewContext.HttpContext.Items["NonceRandom"]);
        }
        else if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Hash)
        {
            string content = output.Content.IsModified ?
                output.Content.GetContent() : (await output.GetChildContentAsync()).GetContent();

            content = content.Replace("\r\n", "\n");
            byte[] contentHashBytes = SHA256.Create().ComputeHash(Encoding.UTF8.GetBytes(content));
            string contentHash = Convert.ToBase64String(contentHashBytes);

            ((StringBuilder)ViewContext.HttpContext.Items[_hashListBuilderContextKeys[output.TagName]])
                .AppendFormat(" 'sha256-{0}'", contentHash);
        }
    }

    ...
}

After registering new TagHelper with addTagHelper directive it can be used through new custom elements or by simply adding the attribute to the standard ones.

<script asp-csp>
    $(document).ready(function() {
        ...
    });
</script>

The full version of code can be found at GitHub and it is ready to use together with ContentSecurityPolicyAttribute as part of Lib.AspNetCore.Mvc.Security.

In previous post I've shown how a temporary table can be generated based on IQueryable and then reused in subsequent queries. I have also pointed out a serious limitation. Described approach won't work if projection contains columns not coming directly from any tables. In such case the generated query will contain columns names like C2 which are not matching property names. One possible workaround is overriding columns names in configuration class by using HasColumnName("C2") but this is far from perfect. Something more generic would be desired.

What I did know is that after ToTraceString Entity Framework keeps the query plan in cache. As part of this plan there should be information how properties are being mapped to columns. The only thing I was able to find was this StackOverflow question. Apparently Entity Framework is storing only the positions of properties. This actually makes sense as access by index is the primary method provided by ADO.NET, but unfortunately this is not an information which can be used to properly build the SELECT clause. After some thinking I decided I will use those positions to properly order columns in INSERT INTO list.

First thing to change was my extension method for getting the entity columns because now I needed to get the names of the corresponding properties as well.

internal static class DbContextExtensions
{
    internal static IEnumerable<ScalarPropertyMapping> GetEntityPropertyMappings<TEntity>(this DbContext dbContext)
    {
        // Get the metadata
        MetadataWorkspace metadata = ((IObjectContextAdapter)dbContext).ObjectContext.MetadataWorkspace;

        // Get the space within the metadata which contains information about CLR types
        ObjectItemCollection clrSpace = ((ObjectItemCollection)metadata.GetItemCollection(DataSpace.OSpace));

        // Get the entity type from the metadata that maps to the CLR type
        EntityType entityEntityType = metadata.GetItems<EntityType>(DataSpace.OSpace).Single(e => clrSpace.GetClrType(e) == typeof(TEntity));

        // Get the entity set that uses this entity type
        EntitySet entityEntitySet = metadata.GetItems<EntityContainer>(DataSpace.CSpace).Single().EntitySets.Single(s => s.ElementType.Name == entityEntityType.Name);

        // Get the mapping between conceptual and storage model for this entity set
        EntitySetMapping entityEntitySetMapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace).Single().EntitySetMappings.Single(m => m.EntitySet == entityEntitySet);

        // Get the entity columns
        return entityEntitySetMapping.EntityTypeMappings.Single().Fragments.Single().PropertyMappings.OfType<ScalarPropertyMapping>();
    }
}

That was the easy part. Next step was getting property positions out of the IQueryable. Based on the answer to the question mentioned above I have extended my ObjectQuery extensions.

internal static class ObjectQueryExtensions
{
    internal static ObjectQuery<TQueryProjection> GetObjectQuery<TQueryProjection>(this IQueryable<TQueryProjection> query)
    {
        object internalQuery = GetField(query, "_internalQuery");

        ObjectQuery<TQueryProjection> objectQuery = GetField(internalQuery, "_objectQuery") as ObjectQuery<TQueryProjection>;

        return objectQuery;
    }

    internal static IReadOnlyDictionary<string, int> GetQueryPropertyPositions<TQueryProjection>(this IQueryable<TQueryProjection> query)
    {
        ObjectQuery<TQueryProjection> objectQuery = query.GetObjectQuery();

        objectQuery.ToTraceString();

        return GetQueryPropertyPositions(objectQuery);
    }

    internal static IReadOnlyDictionary<string, int> GetQueryPropertyPositions<TQueryProjection>(this ObjectQuery<TQueryProjection> objectQuery)
    {
        IDictionary<string, int> propertyPositions = new Dictionary<string, int>();

        // Get the query state.
        object objectQueryState = GetProperty(objectQuery, "QueryState");

        // Get the cached query execution plan.
        object cachedPlan = GetField(objectQueryState, "_cachedPlan");

        // Get the command definition.
        object commandDefinition = GetField(cachedPlan, "CommandDefinition");

        // Get the column map generator.
        Array columnMapGenerators = GetField(commandDefinition, "_columnMapGenerators") as Array;
        object columnMapGenerator = ((columnMapGenerators != null) && (columnMapGenerators.Length == 1)) ? columnMapGenerators.GetValue(0) : null;

        // Get the column map.
        object columnMap = GetField(columnMapGenerator, "_columnMap");

        // get the record column map.
        object columnMapElement = GetProperty(columnMap, "Element");

        // Get column map properties.
        Array properties = GetProperty(columnMapElement, "Properties") as Array;
        if (properties != null)
        {
            for (int propertyIndex = 0; propertyIndex < properties.Length; propertyIndex++)
            {
                object property = properties.GetValue(propertyIndex);
                propertyPositions.Add(GetProperty(property, "Name") as String, (int)GetProperty(property, "ColumnPos"));
            }
        }

        return new ReadOnlyDictionary<string, int>(propertyPositions);
    }

    private static object GetProperty(object objectInstance, string propertyName)
    {
        object propertyValue = null;

        if (objectInstance != null)
        {
            PropertyInfo property = objectInstance.GetType().GetProperty(propertyName, BindingFlags.NonPublic | BindingFlags.Instance);
            if (property != null)
            {
                propertyValue = property.GetValue(objectInstance, new object[0]);
            }
        }

        return propertyValue;
    }

    private static object GetField(object objectInstance, string fieldName)
    {
        object fieldValue = null;

        if (objectInstance != null)
        {
            FieldInfo field = objectInstance.GetType().GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Instance);

            if (field != null)
            {
                fieldValue = field.GetValue(objectInstance);
            }
        }

        return fieldValue;
    }
}

The change to actual temporary table creation code was pretty straightforward with usage of the methods above. Some ordering and couple StringBuilders did the trick.

public IQueryable<TTemporaryEntity> AsTemporarySnapshotQueryable<TQueryProjection, TTemporaryEntity>(IQueryable<TQueryProjection> query)
{
    IQueryable<TTemporaryEntity> snapshotQueryable = null;

    ObjectQuery<TQueryProjection> temporarySnapshotObjectQuery = query.GetObjectQuery();
    IDictionary<string, EdmProperty> temporarySnapshotColumns = _dBContext.GetEntityPropertyMappings<TTemporaryEntity>().ToDictionary(p => p.Property.Name, p => p.Column);


    if ((temporarySnapshotObjectQuery != null) && temporarySnapshotColumns.Any())
    {
        string temporarySnapshotTableName = "#" + typeof(TTemporaryEntity).Name;
        string temporarySnapshotQuerySql = temporarySnapshotObjectQuery.ToTraceString();
        IOrderedEnumerable<KeyValuePair<string, int>> temporarySnapshotObjectQueryColumnsPositions = temporarySnapshotObjectQuery.GetQueryPropertyPositions().OrderBy(cp => cp.Value);

        StringBuilder temporarySnapshotCreateColumnsListBuilder = new StringBuilder();
        StringBuilder temporarySnapshotFillColumnsListBuilder = new StringBuilder();
        foreach (KeyValuePair<string, int> temporarySnapshotObjectQueryColumnPosition in temporarySnapshotObjectQueryColumnsPositions)
        {
            EdmProperty temporarySnapshotColumn = temporarySnapshotColumns[temporarySnapshotObjectQueryColumnPosition.Key];

            temporarySnapshotCreateColumnsListBuilder.Append(GetTemporarySnapshotColumnCreateSql(temporarySnapshotColumn));
            temporarySnapshotFillColumnsListBuilder.AppendFormat("[{0}],", temporarySnapshotColumn.Name);
        }
        temporarySnapshotCreateColumnsListBuilder.Length -= 1;
        temporarySnapshotFillColumnsListBuilder.Length -= 1;

        // We need to handle "1 AS [C1]" column here
        if (temporarySnapshotObjectQueryColumnsPositions.First().Value == 1)
        {
            temporarySnapshotCreateColumnsListBuilder.Insert(0, "[RESERVED_EF_INTERNAL] INT,");
            temporarySnapshotFillColumnsListBuilder.Insert(0, "[RESERVED_EF_INTERNAL],");
        }

        string temporarySnapshotCreateSqlCommand = String.Format("IF OBJECT_ID('tempdb..{0}') IS NOT NULL BEGIN DROP TABLE {0} END{1}CREATE TABLE {0} ({2})", temporarySnapshotTableName, Environment.NewLine, temporarySnapshotCreateColumnsListBuilder);
        string temporarySnapshotFillSqlCommand = String.Format("INSERT INTO {0}({1}) (SELECT * FROM ({2}) AS [TemporarySnapshotQueryable])", temporarySnapshotTableName, temporarySnapshotFillColumnsListBuilder, temporarySnapshotQuerySql);
        object[] temporarySnapshotFillSqlCommandParameters = temporarySnapshotObjectQuery.Parameters.Select(parameter => new SqlParameter(parameter.Name, parameter.Value)).ToArray();

        // We are opening connection manually here because since Entity Framework 6 it will not be automatically closed until context disposal - this way the temporary table will be visible for other queries.
        _dbContext.Database.Connection.Open();
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotCreateSqlCommand);
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotFillSqlCommand, temporarySnapshotFillSqlCommandParameters);

        snapshotQueryable = _dbContext.Set<TTemporaryEntity>().AsNoTracking().AsQueryable();
    }

    return snapshotQueryable;
}

One thing worth explaining in the code above is the if statement commented by 'We need to handle "1 AS [C1]" column here'. If you happen to look at SQL Server profiler log when Entity Framework queries are being performed you would notice that for a lot of them 1 AS [C1] column is being generated. To my knowledge this allows Entity Framework to distinguish between empty tables and selecting only nullable fields (especially for sub queries scenarios), but for our use case this is one more thing we have to account for. Above code does it in simplest way possible - if the properties columns positions start at 1 it will create a dummy column which will store the 1 AS [C1].

This modified solution has less limitations than previous one but in order to achieve it a number of "internal" dependencies had to be introduced. This can easily be broken just by updating Entity Framework. The solution should be somehow protected against this. The absolute minimum which should be done are unit tests which will guard the most important aspects.

Unit Testing

The thing that can be broken in easiest way is the GetQueryPropertyPositions method as it is entirely based on non-public API. So first test should check if that method returns anything.

[TestMethod]
public void TemporarySnapshot_GetStudentsIdentities_GetQueryPropertyPositions_ReturnsNonEmptyCollection()
{
    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();

        Assert.IsTrue((studentsIdentitiesQueryableColumns != null) && (studentsIdentitiesQueryableColumns.Any()));
    }
}

For consistency the GetEntityPropertyMappings method should be checked in same way.

[TestMethod]
public void TemporarySnapshot_GetEntityPropertyMappings_ReturnsNonEmptyCollection()
{
    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        IEnumerable<ScalarPropertyMapping> temporarySnapshotColumns = temporarySnapshotTestContext.GetEntityPropertyMappings<TemporaryStudentIdentity>();

        Assert.IsTrue((temporarySnapshotColumns != null) && (temporarySnapshotColumns.Any()));
    }
}

Now we can test if (at least theoretically) the code can create correct temporary table. There are four preconditions which are absolute minimum for that:

  • The columns positions returned from GetQueryPropertyPositions must be in a sequence
  • First column index has to be either 0 or 1
  • All properties from entity representing temporary table exist in the query
  • The number of property mappings in entity and query is the same

Following unit tests cover those requirements.

[TestMethod]
public void TemporarySnapshot_GetStudentsIdentities_GetQueryPropertyPositions_ColumnsInSequence()
{
    bool inSequence = false;

    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();

        IEnumerable<int> orderedPositions = studentsIdentitiesQueryableColumns.Values.OrderBy(position => position);
        inSequence = orderedPositions.Zip(orderedPositions.Skip(1), (previousPosition, position) => (previousPosition + 1) == position).All(positionsInSequence => positionsInSequence);
    }

    Assert.IsTrue(inSequence);
}

[TestMethod]
public void TemporarySnapshot_GetStudentsIdentities_GetQueryPropertyPositions_CorrectFirstColumnIndex()
{
    bool correctFirstColumnIndex = false;

    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();

        int firstPosition = studentsIdentitiesQueryableColumns.Values.OrderBy(position => position).First();
        correctFirstColumnIndex = (firstPosition == 0) || (firstPosition == 1);
    }

    Assert.IsTrue(correctFirstColumnIndex);
}

[TestMethod]
public void TemporarySnapshot_GetEntityPropertyMappings_GetSupervisees_RetrievePhoto_CheckIsSupervisor_AllColumnsExists()
{
    bool allColumnsExists = true;

    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();
        IEnumerable<ScalarPropertyMapping> temporarySnapshotColumns = temporarySnapshotTestContext.GetEntityPropertyMappings<TemporaryStudentIdentity>();

        foreach (ScalarPropertyMapping temporarySnapshotColumn in temporarySnapshotColumns)
        {
            allColumnsExists = allColumnsExists && studentsIdentitiesQueryableColumns.ContainsKey(temporarySnapshotColumn.Property.Name);
        }
    }

    Assert.IsTrue(allColumnsExists);
}

[TestMethod]
public void TemporarySnapshot_GetEntityPropertyMappings_GetStudentsIdentities_GetQueryPropertyPositions_EqualColumnsCount()
{
    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();
        IEnumerable<ScalarPropertyMapping> temporarySnapshotColumns = temporarySnapshotTestContext.GetEntityPropertyMappings<TemporaryStudentIdentity>();

        Assert.AreEqual(temporarySnapshotColumns.Count(), studentsIdentitiesQueryableColumns.Count);
    }
}

Conclusion

The solution is complete and lets face it - it is "hacky". It solves a very specific problem and comes with cost of relying on Entity Framework internals. Even with unit tests there are maintainability risks here. Still, at least for the specific case mentioned, it was worth it. Also the journey to the solution was interesting and satisfactory one.

I'm a huge fan of entity based and result set based relational mapping (classic ORMs). I'm also huge fan of DML based relational mapping (micro ORMs). In general I'm a huge fan of every technology that allows me to get the job done in the best possible way. I believe that one should never limit himself to single approach within a project - it should always be about choosing the best tool for the job. But sometimes there are real life constraints (licensing, business, politics etc.) which are limiting the choices. This was one of those situations.

Entity Framework is not the perfect tool for complex reporting but still can be used to do it. Recently I was tasked with optimizing such a scenario. The code was building a quite complex report across several tables all filtered by the same sub query. In general it looked similar to the snippet below.

public IQueryable<StudentItem> GetStudentsItemsQueryable(Filters filters, IEnumerable<StudentIdentity> students)
{
    IQueryable<StudentItem> itemsQueryable = null;

    IQueryable<int> studentsIds = students.Select(s => s.Id)
    foreach (IStudentItemManager itemManager in _itemsManagers)
    {
        IQueryable<StudentItem> itemQueryable = itemManager.GetStudentsItemQueryable(filters, studentsIds);
        if (itemQueryable != null)
        {
            itemsQueryable = (itemsQueryable != null) ? itemsQueryable.Concat(itemQueryable) : itemQueryable;
        }
    }

    ...

    return itemsQueryable;
}

The top level method generates union from number of independent item queries. Each item query receives identifiers of students for whom the report is being generated.

public IQueryable<StudentItem> GetStudentsItemQueryable(Filters filters, IEnumerable<int> studentsIds)
{
    IQueryable<StudentItem> itemQueryable = null;

    ...

    itemQueryable = itemQueryable.Where(i => studentsIds.Contains(i.StudentId));

    ...

    return itemQueryable;
}

In the end Entity Framework was generating SQL query with pattern visible below.

SELECT ...
FROM (SELECT ...
        FROM ... AS [Extent1]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [Limit1] WHERE [Limit1].[Id] = [Extenten1].[StudentId])
      UNION ALL
      SELECT ...
        FROM ... AS [Extent2]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [Limit2] WHERE [Limit2].[Id] = [Extenten2].[StudentId])
      UNION ALL
      ...
      UNION ALL
      SELECT ...
        FROM ... AS [ExtentN]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [LimitN] WHERE [LimitN].[Id] = [ExtentenN].[StudentId])) AS UnionAll1

The redundancy in the query is immediately visible. Both the students and items queries were not a trivial ones, which resulted in pressure on the database. Quick look at SQL Server profiler provided CPU time ~750ms and overall duration ~1700ms. Taking into consideration that those data had to be rendered and returned to user this is "bad experience" territory. How to optimize this? All the sub queries by itself looked quite good (despite not being trivial) so the initial idea became removing the redundancy.

Changing the shared sub query to in-memory collection

First thing which came into mind was grabbing the students identifiers once and passing them as in-memory collection to all the sub queries. The code change was in fact easy and cheap.

public IQueryable<StudentItem> GetStudentsItemsQueryable(Filters filters, IEnumerable<StudentIdentity> students)
{
    IQueryable<StudentItem> itemsQueryable = null;

    IQueryable<int> studentsIds = students.Select(s => s.Id).ToList();
    ...

    return itemsQueryable;
}

Generated SQL query also looked promising.

SELECT ...
FROM (SELECT ...
        FROM ... AS [Extent1]
        WHERE ... AND [Extenten1].[StudentId] IN (...)
      UNION ALL
      SELECT ...
        FROM ... AS [Extent2]
        WHERE ... AND [Extenten2].[StudentId] IN (...)
      UNION ALL
      ...
      UNION ALL
      SELECT ...
        FROM ... AS [ExtentN]
        WHERE ... AND [ExtentenN].[StudentId] IN (...)) AS UnionAll1

Verification with SQL Server profiler showed that times have been nicely reduced (CPU time ~345ms and overall duration ~565ms) so the problem should be solved. Unfortunately the end-to-end stress performance test results didn't improve. The overall latency remained at similar level and application CPU usage went up. Why?

The only change that has been done was introduction of IN clause representing Enumerable.Contains. This scenario was indeed problematic with Entity Framework 4 as usage of Enumerable.Contains was translated into a tree of OR expressions, but Entity Framework 6 has introduced native support for DbInExpression so it should be ok. Still the Visual Studio profiler was pointing at query generation as the reason. Quick look at Entity Framework performance considerations regarding Autocompiled Queries brought the answer - the IN clause prevents query caching. Every time the code was being executed Entity Framework had to regenerate all the items queries (and the top query as well) which resulted in high CPU usage. This meant that different solution was needed.

Moving the optimization to SQL level

The repetition had to be removed, but it couldn't be done by IN clause with static identifiers list. The identifiers sub query results had to be gathered once and reused at SQL level. A potential solution I could think of was temporary table. The creation and access in case of temporary table shouldn't be expensive, it will also be automatically removed when the session ends. This seemed to be a reasonable approach so a POC was born.

IF OBJECT_ID('tempdb..#TemporaryStudentIdentity') IS NOT NULL
BEGIN
    DROP TABLE #TemporaryStudentIdentity
END

SELECT * INTO #TemporaryStudentIdentity
FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [TemporarySnapshotQueryable]

SELECT ...
FROM (SELECT ...
        FROM ... AS [Extent1]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit1] WHERE [Limit1].[Id] = [Extenten1].[StudentId])
      UNION ALL
      SELECT ...
        FROM ... AS [Extent2]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit2] WHERE [Limit2].[Id] = [Extenten2].[StudentId])
      UNION ALL
      ...
      UNION ALL
      SELECT ...
        FROM ... AS [ExtentN]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [LimitN] WHERE [LimitN].[Id] = [ExtentenN].[StudentId])) AS UnionAll1

Testing of POC has shown that this approach is more expensive than IN clause (CPU time ~550ms and overall duration ~660ms) but way better then original one (and can be considered acceptable). Now the same thing needed to be done using Entity Framework.

Implementing the temporary table creation

First of all I needed an entity to represent the temporary table. As the project was using Code First approach I've created a POCO with same properties as the projection DTO (having a separated class allowed me to protect constructor and properties setters).

public class TemporaryStudentIdentity
{
    protected TemporaryStudentIdentity()
    { }

    public virtual int Id { get; protected set; }

    ...
}

In order to map this entity to a temporary table TableAttribute can be used, but I prefer to keep mappings separated from entities so I have created a configuration class.

public sealed class TemporaryStudentIdentityConfiguration : EntityTypeConfiguration<TemporaryStudentIdentity>
{
    public TeamMonitorTemporarySuperviseesInfoMap()
    {
        ToTable("#" + typeof(TemporaryStudentIdentity).Name, "tempdb");

        HasKey(x => x.Id);

        Property(x => x.Id);
        ...
    }
}

Now I needed to get the actual parametrized query out of IQueryable (I didn't want to hardcode everything so the solution would be more generic in future), this post got me going. First I've created a small extension method.

internal static class ObjectQueryExtensions
{
    internal static ObjectQuery<TQueryProjection> GetObjectQuery<TQueryProjection>(this IQueryable<TQueryProjection> query)
    {
        object internalQuery = GetField(query, "_internalQuery");

        ObjectQuery<TQueryProjection> objectQuery = GetField(internalQuery, "_objectQuery") as ObjectQuery<TQueryProjection>;

        return objectQuery;
    }

    private static object GetField(object objectInstance, string fieldName)
    {
        object fieldValue = null;

        if (objectInstance != null)
        {
            FieldInfo field = objectInstance.GetType().GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Instance);

            if (field != null)
            {
                fieldValue = field.GetValue(objectInstance);
            }
        }

        return fieldValue;
    }
}

After separating those nasty internals I could write nice method which would get the job done.

public IQueryable<TTemporaryEntity> AsTemporarySnapshotQueryable<TQueryProjection, TTemporaryEntity>(IQueryable<TQueryProjection> query)
{
    IQueryable<TTemporaryEntity> snapshotQueryable = null;

    ObjectQuery<TQueryProjection> temporarySnapshotObjectQuery = query.GetObjectQuery();

    if (temporarySnapshotObjectQuery != null)
    {
        string temporarySnapshotTableName = "#" + typeof(TTemporaryEntity).Name;
        string temporarySnapshotSqlCommand = "IF OBJECT_ID('tempdb..{0}') IS NOT NULL BEGIN DROP TABLE {0} END SELECT * INTO {0} FROM ({1}) AS [TemporarySnapshotQueryable]", temporarySnapshotTableName, temporarySnapshotObjectQuery.ToTraceString());
        object[] temporarySnapshotSqlCommandParameters = temporarySnapshotObjectQuery.Parameters.Select(parameter => new SqlParameter(parameter.Name, parameter.Value)).ToArray();

        // Since Entity Framework 6 manually opened connection will not be automatically closed until context disposal - this way the temporary table will be visible for other queries.
        _dbContext.Database.Connection.Open();
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotSqlCommand, temporarySnapshotSqlCommandParameters);

        snapshotQueryable = _dbContext.Set<TTemporaryEntity>().AsNoTracking().AsQueryable();
    }

    return snapshotQueryable;
}

That wasn't that hard. Proud of myself I've quickly changed the method I wanted to optimize.

public IQueryable<StudentItem> GetStudentsItemsQueryable(Filters filters, IEnumerable<StudentiIdentity> students)
{
    IQueryable<StudentItem> itemsQueryable = null;

    IQueryable<int> studentsIds = AsTemporarySnapshotQueryable<StudentIdentity, TemporaryStudentIdentity>(students).Select(s => s.Id);
    ...

    return itemsQueryable;
}

First test... and it fails. The error log contained entry saying that #TemporaryStudentIdentity was not present. Did I miss something? I've made sure that connection stayed open so the session doesn't change, what else could be causing my temporary table not being visible to the query? After couple minutes of stupid staring into SQL Server profiler log it hit me. I've simplified my POC by omitting sp_executesql.

When a parametrized query is being executed by ADO.NET provider for SQL Server it is being wrapped by call to sp_executesql - this is how parameters are being passed. The tricky part of sp_executesql is the fact that the statement is being executed in "inner scope". Temporary table created by parent scope would be visible inside of sp_executesql but one created within sp_executesql is not visible outside. In my case this meant that I had to separate creation of the table from filling it with data (I didn't want to go away from parametrized query for obvious reasons).

Adjusting POC to sp_executesql usage

I've read all I could find regarding temporary tables but I wasn't able to find a way to create one ahead without specifying full definition, so the POC had to include it.

IF OBJECT_ID('tempdb..#TemporaryStudentIdentity') IS NOT NULL
BEGIN
    DROP TABLE #TemporaryStudentIdentity
END

CREATE TABLE #TemporaryStudentIdentity
(
    [Id] INT NOT NULL,
    ...
)

exec sp_executesql N'INSERT INTO #TemporaryStudentIdentity([Id], ...)
    (SELECT [Id], ...
    FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [TemporarySnapshotQueryable]'

exec sp_executesql N'SELECT ...
    FROM (SELECT ...
            FROM ... AS [Extent1]
            WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit1] WHERE [Limit1].[Id] = [Extenten1].[StudentId])
          UNION ALL
          SELECT ...
            FROM ... AS [Extent2]
            WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit2] WHERE [Limit2].[Id] = [Extenten2].[StudentId])
          UNION ALL
          ...
          UNION ALL
          SELECT ...
            FROM ... AS [ExtentN]
            WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [LimitN] WHERE [LimitN].[Id] = [ExtentenN].[StudentId])) AS UnionAll1'

This worked and was even faster than previous one (probably because SQL Server didn't had to invest processing into figuring out the definition) - CPU time ~420ms, overall duration ~520ms. The challenge was how to do that from code in as generic as possible way.

Adjusting the implementation

Putting this in simple words all I had to do was generating a "create table" script based on entity. This should be doable, Entity Framework should be storing the information I needed somewhere. After some research I've found what I was looking for. I've adopted the information provided into a from of extension method.

internal static class DbContextExtensions
{
    internal IEnumerable<EdmProperty> GetEntityPropertyColumns<TEntity>(this DbContext dbContext)
    {
        // Get the metadata
        MetadataWorkspace metadata = ((IObjectContextAdapter)_dbContext).ObjectContext.MetadataWorkspace;

        // Get the space within the metadata which contains information about CLR types
        ObjectItemCollection clrSpace = ((ObjectItemCollection)metadata .GetItemCollection(DataSpace.OSpace));

        // Get the entity type from the metadata that maps to the CLR type
        EntityType entityEntityType = metadata.GetItems<entitytype>(DataSpace.OSpace).Single(e => clrSpace.GetClrType(e) == typeof(TEntity));

        // Get the entity set that uses this entity type
        EntitySet entityEntitySet = metadata.GetItems<entitycontainer>(DataSpace.CSpace).Single().EntitySets.Single(s => s.ElementType.Name == entityEntityType.Name);

        // Get the mapping between conceptual and storage model for this entity set
        EntitySetMapping entityEntitySetMapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace).Single().EntitySetMappings.Single(m => m.EntitySet == entityEntitySet);

        // Get the entity columns
        return entityEntitySetMapping.EntityTypeMappings.Single().Fragments.Single().PropertyMappings.OfType<ScalarPropertyMapping>().Select(p => p.Column);
    }
}

Armed with this extension method I've started refactoring my temporary table generation code.

public IQueryable<TTemporaryEntity> AsTemporarySnapshotQueryable<TQueryProjection, TTemporaryEntity>(IQueryable<TQueryProjection> query)
{
    IQueryable<TTemporaryEntity> snapshotQueryable = null;

    ObjectQuery<TQueryProjection> temporarySnapshotObjectQuery = query.GetObjectQuery();
    IEnumerable<EdmProperty> temporarySnapshotColumns = _dbContext.GetEntityPropertyColumns<TTemporaryEntity>().ToArray();

    if ((temporarySnapshotObjectQuery != null) && temporarySnapshotColumns.Any())
    {
        string temporarySnapshotTableName = "#" + typeof(TTemporaryEntity).Name;

        StringBuilder temporarySnapshotCreateColumnsListBuilder = new StringBuilder();
        StringBuilder temporarySnapshotFillColumnsListBuilder = new StringBuilder();
        foreach (EdmProperty temporarySnapshotColumn in temporarySnapshotColumns)
        {
            temporarySnapshotCreateColumnsListBuilder.Append(GetTemporarySnapshotColumnCreateSql(temporarySnapshotColumn));
            temporarySnapshotFillColumnsListBuilder.AppendFormat("[{0}],", temporarySnapshotColumn.Name);
        }
        temporarySnapshotCreateColumnsListBuilder.Length -= 1;
        temporarySnapshotFillColumnsListBuilder.Length -= 1;

        string temporarySnapshotCreateSqlCommand = String.Format("IF OBJECT_ID('tempdb..{0}') IS NOT NULL BEGIN DROP TABLE {0} END{1}CREATE TABLE {0} ({2})", temporarySnapshotTableName, Environment.NewLine, temporarySnapshotCreateColumnsListBuilder);
        string temporarySnapshotFillSqlCommand = String.Format("INSERT INTO {0}({1}) (SELECT {1} FROM ({2}) AS [TemporarySnapshotQueryable])", temporarySnapshotTableName, temporarySnapshotFillColumnsListBuilder, temporarySnapshotObjectQuery.ToTraceString());
        object[] temporarySnapshotFillSqlCommandParameters = temporarySnapshotObjectQuery.Parameters.Select(parameter => new SqlParameter(parameter.Name, parameter.Value)).ToArray();

        // Since Entity Framework 6 manually opened connection will not be automatically closed until context disposal - this way the temporary table will be visible for other queries.
        _dbContext.Database.Connection.Open();
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotCreateSqlCommand.ToString());
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotFillSqlCommand, temporarySnapshotFillSqlCommandParameters);

        snapshotQueryable = _dbContext.Set<TTemporaryEntity>().AsNoTracking().AsQueryable();
    }

    return snapshotQueryable;
}

private static string GetTemporarySnapshotColumnCreateSql(EdmProperty temporarySnapshotColumn)
{
    string typeNameUpperCase = temporarySnapshotColumn.TypeName.ToUpperInvariant();
    string temporarySnapshotColumnCreateSqlSuffix = temporarySnapshotColumn.Nullable ? "," : " NOT NULL,";
    switch (typeNameUpperCase)
    {
        case "NUMERIC":
            return String.Format("[{0}] NUMERIC({1},{2}){3}", temporarySnapshotColumn.Name, temporarySnapshotColumn.Precision, temporarySnapshotColumn.Scale, temporarySnapshotColumnCreateSqlSuffix);
        case "NVARCHAR":
        case "VARCHAR":
            return String.Format("[{0}] {1}({2}){3}", temporarySnapshotColumn.Name, typeNameUpperCase, temporarySnapshotColumn.MaxLength, temporarySnapshotColumnCreateSqlSuffix);
        default:
            return String.Format("[{0}] {1}{2}", temporarySnapshotColumn.Name, typeNameUpperCase, temporarySnapshotColumnCreateSqlSuffix);
    }
}

I've added all the HasColumnType, HasMaxLength etc. to the configuration class and fired away... It worked!!!

Limitations

This solution has one serious limitation - the names of columns in temporary table must match the names of corresponding columns in query generated by Entity Framework. This won't always be true. If the query contains only columns coming directly from tables this should be safe assumption, but when calculated columns appears they will have names chosen by Entity Framework. In above approach this would require nasty hard coding like HasColumnName("C2") which should rather be avoided.

There is a potential way of working around this limitation - I will explore it in part two.

Older Posts