Some time ago I've written about HtmlHelper providing support for Content Security Policy. The solution presented worked well, but it required quite nasty markup and usage of <text> pseudo element.

@using (Html.BeginCspScript())
{
    <text>
    $(document).ready(function () {
        ...
    });
    </text>
}

ASP.NET Core MVC has introduced a new way for server-side code to participate in rendering HTML elements - TagHelpers. The key aspect of TagHelpers is that they attach to HTML elements and allow for modifying them. This is exactly the functionality which should allow improving the markup of previously mentioned solution, so I've decided to try creating one. I've started with class inheriting from TagHelper.

public class ContentSecurityPolicyTagHelper : TagHelper
{
    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
    }
}

The Process method is the one which will be executed when there will be an HTML element matching the TagHelper. The default matching is based on a naming convention which looks for TagHelper by class name (without the TagHelper suffix). This behavior can be adjusted with help of HtmlTargetElementAttribute. A single instance of HtmlTargetElementAttribute is being treated as logical "and" of all conditions while multiple instances are treated as logical "or". In this case I've decided that it would be nice to support two custom elements (<csp-script> and <csp-style>) as well as standard <script> and <style> elements with asp-csp attribute.

[HtmlTargetElement("csp-style")]
[HtmlTargetElement("style", Attributes = "asp-csp")]
[HtmlTargetElement("csp-script")]
[HtmlTargetElement("script", Attributes = "asp-csp")]
public class ContentSecurityPolicyTagHelper : TagHelper
{
    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
    }
}

The custom elements need to be translated into standard ones as soon as processing starts. Also the custom attribute should be removed.

[HtmlTargetElement("csp-style")]
[HtmlTargetElement("style", Attributes = "asp-csp")]
[HtmlTargetElement("csp-script")]
[HtmlTargetElement("script", Attributes = "asp-csp")]
public class ContentSecurityPolicyTagHelper : TagHelper
{
    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
        SetTagName(output);

        output.Attributes.RemoveAll("asp-csp");
    }

    private void SetTagName(TagHelperOutput output)
    {
        if (output.TagName == "csp-style")
        {
            output.TagName = "style";
        }
        else if (output.TagName == "csp-script")
        {
            output.TagName = "script";
        }
    }
}

Now the actual Content Security Policy related processing can be added. The ContentSecurityPolicyAttribute from previous post is putting all the relevant information into HttpContext.Items from which I needed to retrieve them. ASP.NET Core MVC provides ViewContextAttribute which informs the framework that property should be set with the current ViewContext when TagHelper is being created, this is how access to HttpContext can be achieved.

...
public class ContentSecurityPolicyTagHelper : TagHelper
{
    [ViewContext]
    public ViewContext ViewContext { get; set; }

    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
        ...
    }

    ...
}

After retrieving the needed values from the HttpContext support for nonce can be provided without much effort.

...
public class ContentSecurityPolicyTagHelper : TagHelper
{
    private static IDictionary<string, string> _inlineExecutionContextKeys =
        new Dictionary<string, string>
        {
            { "script", "ScriptInlineExecution" },
            { "style", "StyleInlineExecution" }
        };

    ...

    public override void Process(TagHelperContext context, TagHelperOutput output)
    {
        ...

        ContentSecurityPolicyInlineExecution currentInlineExecutionPolicy =
            (ContentSecurityPolicyInlineExecution)ViewContext
            .HttpContext.Items[_inlineExecutionContextKeys[output.TagName]];

        if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Nonce)
        {
            output.Attributes.Add("nonce", (string)ViewContext.HttpContext.Items["NonceRandom"]);
        }
        else if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Hash)
        {
        }
    }

    ...
}

For the hash support the content of the element is needed. The TagHelper provides a GetChildContentAsync and GetContent methods for this purpose. In order to decide which one to use a check should be done to see if the content has been modified by other TagHelper (in another words if the content should be accessed directly or from the buffer). After obtaining the content the hash can be calculated and added to the list which ContentSecurityPolicyAttribute will use later to generate the response headers. As the content is being obtained by asynchronous method the asynchronous version of Process method must be used.

...
public class ContentSecurityPolicyTagHelper : TagHelper
{
    private static IDictionary<string, string> _inlineExecutionContextKeys =
        new Dictionary<string, string>
        {
            { "script", "ScriptInlineExecution" },
            { "style", "StyleInlineExecution" }
        };

    private static IDictionary<string, string> _hashListBuilderContextKeys =
        new Dictionary<string, string>
        {
            { "script", "ScriptHashListBuilder" },
            { "style", "StyleHashListBuilder" }
        };

    ...

    public override async Task Process(TagHelperContext context, TagHelperOutput output)
    {
        ...

        ContentSecurityPolicyInlineExecution currentInlineExecutionPolicy =
            (ContentSecurityPolicyInlineExecution)ViewContext
            .HttpContext.Items[_inlineExecutionContextKeys[output.TagName]];

        if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Nonce)
        {
            output.Attributes.Add("nonce", (string)ViewContext.HttpContext.Items["NonceRandom"]);
        }
        else if (currentInlineExecutionPolicy == ContentSecurityPolicyInlineExecution.Hash)
        {
            string content = output.Content.IsModified ?
                output.Content.GetContent() : (await output.GetChildContentAsync()).GetContent();

            content = content.Replace("\r\n", "\n");
            byte[] contentHashBytes = SHA256.Create().ComputeHash(Encoding.UTF8.GetBytes(content));
            string contentHash = Convert.ToBase64String(contentHashBytes);

            ((StringBuilder)ViewContext.HttpContext.Items[_hashListBuilderContextKeys[output.TagName]])
                .AppendFormat(" 'sha256-{0}'", contentHash);
        }
    }

    ...
}

After registering new TagHelper with addTagHelper directive it can be used through new custom elements or by simply adding the attribute to the standard ones.

<script asp-csp>
    $(document).ready(function() {
        ...
    });
</script>

The full version of code can be found at GitHub and it is ready to use together with ContentSecurityPolicyAttribute as part of Lib.AspNetCore.Mvc.Security.

In previous post I've shown how a temporary table can be generated based on IQueryable and then reused in subsequent queries. I have also pointed out a serious limitation. Described approach won't work if projection contains columns not coming directly from any tables. In such case the generated query will contain columns names like C2 which are not matching property names. One possible workaround is overriding columns names in configuration class by using HasColumnName("C2") but this is far from perfect. Something more generic would be desired.

What I did know is that after ToTraceString Entity Framework keeps the query plan in cache. As part of this plan there should be information how properties are being mapped to columns. The only thing I was able to find was this StackOverflow question. Apparently Entity Framework is storing only the positions of properties. This actually makes sense as access by index is the primary method provided by ADO.NET, but unfortunately this is not an information which can be used to properly build the SELECT clause. After some thinking I decided I will use those positions to properly order columns in INSERT INTO list.

First thing to change was my extension method for getting the entity columns because now I needed to get the names of the corresponding properties as well.

internal static class DbContextExtensions
{
    internal static IEnumerable<ScalarPropertyMapping> GetEntityPropertyMappings<TEntity>(this DbContext dbContext)
    {
        // Get the metadata
        MetadataWorkspace metadata = ((IObjectContextAdapter)dbContext).ObjectContext.MetadataWorkspace;

        // Get the space within the metadata which contains information about CLR types
        ObjectItemCollection clrSpace = ((ObjectItemCollection)metadata.GetItemCollection(DataSpace.OSpace));

        // Get the entity type from the metadata that maps to the CLR type
        EntityType entityEntityType = metadata.GetItems<EntityType>(DataSpace.OSpace).Single(e => clrSpace.GetClrType(e) == typeof(TEntity));

        // Get the entity set that uses this entity type
        EntitySet entityEntitySet = metadata.GetItems<EntityContainer>(DataSpace.CSpace).Single().EntitySets.Single(s => s.ElementType.Name == entityEntityType.Name);

        // Get the mapping between conceptual and storage model for this entity set
        EntitySetMapping entityEntitySetMapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace).Single().EntitySetMappings.Single(m => m.EntitySet == entityEntitySet);

        // Get the entity columns
        return entityEntitySetMapping.EntityTypeMappings.Single().Fragments.Single().PropertyMappings.OfType<ScalarPropertyMapping>();
    }
}

That was the easy part. Next step was getting property positions out of the IQueryable. Based on the answer to the question mentioned above I have extended my ObjectQuery extensions.

internal static class ObjectQueryExtensions
{
    internal static ObjectQuery<TQueryProjection> GetObjectQuery<TQueryProjection>(this IQueryable<TQueryProjection> query)
    {
        object internalQuery = GetField(query, "_internalQuery");

        ObjectQuery<TQueryProjection> objectQuery = GetField(internalQuery, "_objectQuery") as ObjectQuery<TQueryProjection>;

        return objectQuery;
    }

    internal static IReadOnlyDictionary<string, int> GetQueryPropertyPositions<TQueryProjection>(this IQueryable<TQueryProjection> query)
    {
        ObjectQuery<TQueryProjection> objectQuery = query.GetObjectQuery();

        objectQuery.ToTraceString();

        return GetQueryPropertyPositions(objectQuery);
    }

    internal static IReadOnlyDictionary<string, int> GetQueryPropertyPositions<TQueryProjection>(this ObjectQuery<TQueryProjection> objectQuery)
    {
        IDictionary<string, int> propertyPositions = new Dictionary<string, int>();

        // Get the query state.
        object objectQueryState = GetProperty(objectQuery, "QueryState");

        // Get the cached query execution plan.
        object cachedPlan = GetField(objectQueryState, "_cachedPlan");

        // Get the command definition.
        object commandDefinition = GetField(cachedPlan, "CommandDefinition");

        // Get the column map generator.
        Array columnMapGenerators = GetField(commandDefinition, "_columnMapGenerators") as Array;
        object columnMapGenerator = ((columnMapGenerators != null) && (columnMapGenerators.Length == 1)) ? columnMapGenerators.GetValue(0) : null;

        // Get the column map.
        object columnMap = GetField(columnMapGenerator, "_columnMap");

        // get the record column map.
        object columnMapElement = GetProperty(columnMap, "Element");

        // Get column map properties.
        Array properties = GetProperty(columnMapElement, "Properties") as Array;
        if (properties != null)
        {
            for (int propertyIndex = 0; propertyIndex < properties.Length; propertyIndex++)
            {
                object property = properties.GetValue(propertyIndex);
                propertyPositions.Add(GetProperty(property, "Name") as String, (int)GetProperty(property, "ColumnPos"));
            }
        }

        return new ReadOnlyDictionary<string, int>(propertyPositions);
    }

    private static object GetProperty(object objectInstance, string propertyName)
    {
        object propertyValue = null;

        if (objectInstance != null)
        {
            PropertyInfo property = objectInstance.GetType().GetProperty(propertyName, BindingFlags.NonPublic | BindingFlags.Instance);
            if (property != null)
            {
                propertyValue = property.GetValue(objectInstance, new object[0]);
            }
        }

        return propertyValue;
    }

    private static object GetField(object objectInstance, string fieldName)
    {
        object fieldValue = null;

        if (objectInstance != null)
        {
            FieldInfo field = objectInstance.GetType().GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Instance);

            if (field != null)
            {
                fieldValue = field.GetValue(objectInstance);
            }
        }

        return fieldValue;
    }
}

The change to actual temporary table creation code was pretty straightforward with usage of the methods above. Some ordering and couple StringBuilders did the trick.

public IQueryable<TTemporaryEntity> AsTemporarySnapshotQueryable<TQueryProjection, TTemporaryEntity>(IQueryable<TQueryProjection> query)
{
    IQueryable<TTemporaryEntity> snapshotQueryable = null;

    ObjectQuery<TQueryProjection> temporarySnapshotObjectQuery = query.GetObjectQuery();
    IDictionary<string, EdmProperty> temporarySnapshotColumns = _dBContext.GetEntityPropertyMappings<TTemporaryEntity>().ToDictionary(p => p.Property.Name, p => p.Column);


    if ((temporarySnapshotObjectQuery != null) && temporarySnapshotColumns.Any())
    {
        string temporarySnapshotTableName = "#" + typeof(TTemporaryEntity).Name;
        string temporarySnapshotQuerySql = temporarySnapshotObjectQuery.ToTraceString();
        IOrderedEnumerable<KeyValuePair<string, int>> temporarySnapshotObjectQueryColumnsPositions = temporarySnapshotObjectQuery.GetQueryPropertyPositions().OrderBy(cp => cp.Value);

        StringBuilder temporarySnapshotCreateColumnsListBuilder = new StringBuilder();
        StringBuilder temporarySnapshotFillColumnsListBuilder = new StringBuilder();
        foreach (KeyValuePair<string, int> temporarySnapshotObjectQueryColumnPosition in temporarySnapshotObjectQueryColumnsPositions)
        {
            EdmProperty temporarySnapshotColumn = temporarySnapshotColumns[temporarySnapshotObjectQueryColumnPosition.Key];

            temporarySnapshotCreateColumnsListBuilder.Append(GetTemporarySnapshotColumnCreateSql(temporarySnapshotColumn));
            temporarySnapshotFillColumnsListBuilder.AppendFormat("[{0}],", temporarySnapshotColumn.Name);
        }
        temporarySnapshotCreateColumnsListBuilder.Length -= 1;
        temporarySnapshotFillColumnsListBuilder.Length -= 1;

        // We need to handle "1 AS [C1]" column here
        if (temporarySnapshotObjectQueryColumnsPositions.First().Value == 1)
        {
            temporarySnapshotCreateColumnsListBuilder.Insert(0, "[RESERVED_EF_INTERNAL] INT,");
            temporarySnapshotFillColumnsListBuilder.Insert(0, "[RESERVED_EF_INTERNAL],");
        }

        string temporarySnapshotCreateSqlCommand = String.Format("IF OBJECT_ID('tempdb..{0}') IS NOT NULL BEGIN DROP TABLE {0} END{1}CREATE TABLE {0} ({2})", temporarySnapshotTableName, Environment.NewLine, temporarySnapshotCreateColumnsListBuilder);
        string temporarySnapshotFillSqlCommand = String.Format("INSERT INTO {0}({1}) (SELECT * FROM ({2}) AS [TemporarySnapshotQueryable])", temporarySnapshotTableName, temporarySnapshotFillColumnsListBuilder, temporarySnapshotQuerySql);
        object[] temporarySnapshotFillSqlCommandParameters = temporarySnapshotObjectQuery.Parameters.Select(parameter => new SqlParameter(parameter.Name, parameter.Value)).ToArray();

        // We are opening connection manually here because since Entity Framework 6 it will not be automatically closed until context disposal - this way the temporary table will be visible for other queries.
        _dbContext.Database.Connection.Open();
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotCreateSqlCommand);
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotFillSqlCommand, temporarySnapshotFillSqlCommandParameters);

        snapshotQueryable = _dbContext.Set<TTemporaryEntity>().AsNoTracking().AsQueryable();
    }

    return snapshotQueryable;
}

One thing worth explaining in the code above is the if statement commented by 'We need to handle "1 AS [C1]" column here'. If you happen to look at SQL Server profiler log when Entity Framework queries are being performed you would notice that for a lot of them 1 AS [C1] column is being generated. To my knowledge this allows Entity Framework to distinguish between empty tables and selecting only nullable fields (especially for sub queries scenarios), but for our use case this is one more thing we have to account for. Above code does it in simplest way possible - if the properties columns positions start at 1 it will create a dummy column which will store the 1 AS [C1].

This modified solution has less limitations than previous one but in order to achieve it a number of "internal" dependencies had to be introduced. This can easily be broken just by updating Entity Framework. The solution should be somehow protected against this. The absolute minimum which should be done are unit tests which will guard the most important aspects.

Unit Testing

The thing that can be broken in easiest way is the GetQueryPropertyPositions method as it is entirely based on non-public API. So first test should check if that method returns anything.

[TestMethod]
public void TemporarySnapshot_GetStudentsIdentities_GetQueryPropertyPositions_ReturnsNonEmptyCollection()
{
    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();

        Assert.IsTrue((studentsIdentitiesQueryableColumns != null) && (studentsIdentitiesQueryableColumns.Any()));
    }
}

For consistency the GetEntityPropertyMappings method should be checked in same way.

[TestMethod]
public void TemporarySnapshot_GetEntityPropertyMappings_ReturnsNonEmptyCollection()
{
    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        IEnumerable<ScalarPropertyMapping> temporarySnapshotColumns = temporarySnapshotTestContext.GetEntityPropertyMappings<TemporaryStudentIdentity>();

        Assert.IsTrue((temporarySnapshotColumns != null) && (temporarySnapshotColumns.Any()));
    }
}

Now we can test if (at least theoretically) the code can create correct temporary table. There are four preconditions which are absolute minimum for that:

  • The columns positions returned from GetQueryPropertyPositions must be in a sequence
  • First column index has to be either 0 or 1
  • All properties from entity representing temporary table exist in the query
  • The number of property mappings in entity and query is the same

Following unit tests cover those requirements.

[TestMethod]
public void TemporarySnapshot_GetStudentsIdentities_GetQueryPropertyPositions_ColumnsInSequence()
{
    bool inSequence = false;

    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();

        IEnumerable<int> orderedPositions = studentsIdentitiesQueryableColumns.Values.OrderBy(position => position);
        inSequence = orderedPositions.Zip(orderedPositions.Skip(1), (previousPosition, position) => (previousPosition + 1) == position).All(positionsInSequence => positionsInSequence);
    }

    Assert.IsTrue(inSequence);
}

[TestMethod]
public void TemporarySnapshot_GetStudentsIdentities_GetQueryPropertyPositions_CorrectFirstColumnIndex()
{
    bool correctFirstColumnIndex = false;

    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();

        int firstPosition = studentsIdentitiesQueryableColumns.Values.OrderBy(position => position).First();
        correctFirstColumnIndex = (firstPosition == 0) || (firstPosition == 1);
    }

    Assert.IsTrue(correctFirstColumnIndex);
}

[TestMethod]
public void TemporarySnapshot_GetEntityPropertyMappings_GetSupervisees_RetrievePhoto_CheckIsSupervisor_AllColumnsExists()
{
    bool allColumnsExists = true;

    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();
        IEnumerable<ScalarPropertyMapping> temporarySnapshotColumns = temporarySnapshotTestContext.GetEntityPropertyMappings<TemporaryStudentIdentity>();

        foreach (ScalarPropertyMapping temporarySnapshotColumn in temporarySnapshotColumns)
        {
            allColumnsExists = allColumnsExists && studentsIdentitiesQueryableColumns.ContainsKey(temporarySnapshotColumn.Property.Name);
        }
    }

    Assert.IsTrue(allColumnsExists);
}

[TestMethod]
public void TemporarySnapshot_GetEntityPropertyMappings_GetStudentsIdentities_GetQueryPropertyPositions_EqualColumnsCount()
{
    using (var temporarySnapshotTestContext = new FakeDbContext())
    {
        StudentsManager studentsManager = PrepareStudentsManager(temporarySnapshotTestContext);
        IQueryable<StudentIdentity> studentsIdentitiesQueryable = studentsManager.GetStudentsIdentities(...);

        IReadOnlyDictionary<string, int> studentsIdentitiesQueryableColumns = studentsIdentitiesQueryable.GetQueryPropertyPositions();
        IEnumerable<ScalarPropertyMapping> temporarySnapshotColumns = temporarySnapshotTestContext.GetEntityPropertyMappings<TemporaryStudentIdentity>();

        Assert.AreEqual(temporarySnapshotColumns.Count(), studentsIdentitiesQueryableColumns.Count);
    }
}

Conclusion

The solution is complete and lets face it - it is "hacky". It solves a very specific problem and comes with cost of relying on Entity Framework internals. Even with unit tests there are maintainability risks here. Still, at least for the specific case mentioned, it was worth it. Also the journey to the solution was interesting and satisfactory one.

I'm a huge fan of entity based and result set based relational mapping (classic ORMs). I'm also huge fan of DML based relational mapping (micro ORMs). In general I'm a huge fan of every technology that allows me to get the job done in the best possible way. I believe that one should never limit himself to single approach within a project - it should always be about choosing the best tool for the job. But sometimes there are real life constraints (licensing, business, politics etc.) which are limiting the choices. This was one of those situations.

Entity Framework is not the perfect tool for complex reporting but still can be used to do it. Recently I was tasked with optimizing such a scenario. The code was building a quite complex report across several tables all filtered by the same sub query. In general it looked similar to the snippet below.

public IQueryable<StudentItem> GetStudentsItemsQueryable(Filters filters, IEnumerable<StudentIdentity> students)
{
    IQueryable<StudentItem> itemsQueryable = null;

    IQueryable<int> studentsIds = students.Select(s => s.Id)
    foreach (IStudentItemManager itemManager in _itemsManagers)
    {
        IQueryable<StudentItem> itemQueryable = itemManager.GetStudentsItemQueryable(filters, studentsIds);
        if (itemQueryable != null)
        {
            itemsQueryable = (itemsQueryable != null) ? itemsQueryable.Concat(itemQueryable) : itemQueryable;
        }
    }

    ...

    return itemsQueryable;
}

The top level method generates union from number of independent item queries. Each item query receives identifiers of students for whom the report is being generated.

public IQueryable<StudentItem> GetStudentsItemQueryable(Filters filters, IEnumerable<int> studentsIds)
{
    IQueryable<StudentItem> itemQueryable = null;

    ...

    itemQueryable = itemQueryable.Where(i => studentsIds.Contains(i.StudentId));

    ...

    return itemQueryable;
}

In the end Entity Framework was generating SQL query with pattern visible below.

SELECT ...
FROM (SELECT ...
        FROM ... AS [Extent1]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [Limit1] WHERE [Limit1].[Id] = [Extenten1].[StudentId])
      UNION ALL
      SELECT ...
        FROM ... AS [Extent2]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [Limit2] WHERE [Limit2].[Id] = [Extenten2].[StudentId])
      UNION ALL
      ...
      UNION ALL
      SELECT ...
        FROM ... AS [ExtentN]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [LimitN] WHERE [LimitN].[Id] = [ExtentenN].[StudentId])) AS UnionAll1

The redundancy in the query is immediately visible. Both the students and items queries were not a trivial ones, which resulted in pressure on the database. Quick look at SQL Server profiler provided CPU time ~750ms and overall duration ~1700ms. Taking into consideration that those data had to be rendered and returned to user this is "bad experience" territory. How to optimize this? All the sub queries by itself looked quite good (despite not being trivial) so the initial idea became removing the redundancy.

Changing the shared sub query to in-memory collection

First thing which came into mind was grabbing the students identifiers once and passing them as in-memory collection to all the sub queries. The code change was in fact easy and cheap.

public IQueryable<StudentItem> GetStudentsItemsQueryable(Filters filters, IEnumerable<StudentIdentity> students)
{
    IQueryable<StudentItem> itemsQueryable = null;

    IQueryable<int> studentsIds = students.Select(s => s.Id).ToList();
    ...

    return itemsQueryable;
}

Generated SQL query also looked promising.

SELECT ...
FROM (SELECT ...
        FROM ... AS [Extent1]
        WHERE ... AND [Extenten1].[StudentId] IN (...)
      UNION ALL
      SELECT ...
        FROM ... AS [Extent2]
        WHERE ... AND [Extenten2].[StudentId] IN (...)
      UNION ALL
      ...
      UNION ALL
      SELECT ...
        FROM ... AS [ExtentN]
        WHERE ... AND [ExtentenN].[StudentId] IN (...)) AS UnionAll1

Verification with SQL Server profiler showed that times have been nicely reduced (CPU time ~345ms and overall duration ~565ms) so the problem should be solved. Unfortunately the end-to-end stress performance test results didn't improve. The overall latency remained at similar level and application CPU usage went up. Why?

The only change that has been done was introduction of IN clause representing Enumerable.Contains. This scenario was indeed problematic with Entity Framework 4 as usage of Enumerable.Contains was translated into a tree of OR expressions, but Entity Framework 6 has introduced native support for DbInExpression so it should be ok. Still the Visual Studio profiler was pointing at query generation as the reason. Quick look at Entity Framework performance considerations regarding Autocompiled Queries brought the answer - the IN clause prevents query caching. Every time the code was being executed Entity Framework had to regenerate all the items queries (and the top query as well) which resulted in high CPU usage. This meant that different solution was needed.

Moving the optimization to SQL level

The repetition had to be removed, but it couldn't be done by IN clause with static identifiers list. The identifiers sub query results had to be gathered once and reused at SQL level. A potential solution I could think of was temporary table. The creation and access in case of temporary table shouldn't be expensive, it will also be automatically removed when the session ends. This seemed to be a reasonable approach so a POC was born.

IF OBJECT_ID('tempdb..#TemporaryStudentIdentity') IS NOT NULL
BEGIN
    DROP TABLE #TemporaryStudentIdentity
END

SELECT * INTO #TemporaryStudentIdentity
FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [TemporarySnapshotQueryable]

SELECT ...
FROM (SELECT ...
        FROM ... AS [Extent1]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit1] WHERE [Limit1].[Id] = [Extenten1].[StudentId])
      UNION ALL
      SELECT ...
        FROM ... AS [Extent2]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit2] WHERE [Limit2].[Id] = [Extenten2].[StudentId])
      UNION ALL
      ...
      UNION ALL
      SELECT ...
        FROM ... AS [ExtentN]
        WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [LimitN] WHERE [LimitN].[Id] = [ExtentenN].[StudentId])) AS UnionAll1

Testing of POC has shown that this approach is more expensive than IN clause (CPU time ~550ms and overall duration ~660ms) but way better then original one (and can be considered acceptable). Now the same thing needed to be done using Entity Framework.

Implementing the temporary table creation

First of all I needed an entity to represent the temporary table. As the project was using Code First approach I've created a POCO with same properties as the projection DTO (having a separated class allowed me to protect constructor and properties setters).

public class TemporaryStudentIdentity
{
    protected TemporaryStudentIdentity()
    { }

    public virtual int Id { get; protected set; }

    ...
}

In order to map this entity to a temporary table TableAttribute can be used, but I prefer to keep mappings separated from entities so I have created a configuration class.

public sealed class TemporaryStudentIdentityConfiguration : EntityTypeConfiguration<TemporaryStudentIdentity>
{
    public TeamMonitorTemporarySuperviseesInfoMap()
    {
        ToTable("#" + typeof(TemporaryStudentIdentity).Name, "tempdb");

        HasKey(x => x.Id);

        Property(x => x.Id);
        ...
    }
}

Now I needed to get the actual parametrized query out of IQueryable (I didn't want to hardcode everything so the solution would be more generic in future), this post got me going. First I've created a small extension method.

internal static class ObjectQueryExtensions
{
    internal static ObjectQuery<TQueryProjection> GetObjectQuery<TQueryProjection>(this IQueryable<TQueryProjection> query)
    {
        object internalQuery = GetField(query, "_internalQuery");

        ObjectQuery<TQueryProjection> objectQuery = GetField(internalQuery, "_objectQuery") as ObjectQuery<TQueryProjection>;

        return objectQuery;
    }

    private static object GetField(object objectInstance, string fieldName)
    {
        object fieldValue = null;

        if (objectInstance != null)
        {
            FieldInfo field = objectInstance.GetType().GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Instance);

            if (field != null)
            {
                fieldValue = field.GetValue(objectInstance);
            }
        }

        return fieldValue;
    }
}

After separating those nasty internals I could write nice method which would get the job done.

public IQueryable<TTemporaryEntity> AsTemporarySnapshotQueryable<TQueryProjection, TTemporaryEntity>(IQueryable<TQueryProjection> query)
{
    IQueryable<TTemporaryEntity> snapshotQueryable = null;

    ObjectQuery<TQueryProjection> temporarySnapshotObjectQuery = query.GetObjectQuery();

    if (temporarySnapshotObjectQuery != null)
    {
        string temporarySnapshotTableName = "#" + typeof(TTemporaryEntity).Name;
        string temporarySnapshotSqlCommand = "IF OBJECT_ID('tempdb..{0}') IS NOT NULL BEGIN DROP TABLE {0} END SELECT * INTO {0} FROM ({1}) AS [TemporarySnapshotQueryable]", temporarySnapshotTableName, temporarySnapshotObjectQuery.ToTraceString());
        object[] temporarySnapshotSqlCommandParameters = temporarySnapshotObjectQuery.Parameters.Select(parameter => new SqlParameter(parameter.Name, parameter.Value)).ToArray();

        // Since Entity Framework 6 manually opened connection will not be automatically closed until context disposal - this way the temporary table will be visible for other queries.
        _dbContext.Database.Connection.Open();
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotSqlCommand, temporarySnapshotSqlCommandParameters);

        snapshotQueryable = _dbContext.Set<TTemporaryEntity>().AsNoTracking().AsQueryable();
    }

    return snapshotQueryable;
}

That wasn't that hard. Proud of myself I've quickly changed the method I wanted to optimize.

public IQueryable<StudentItem> GetStudentsItemsQueryable(Filters filters, IEnumerable<StudentiIdentity> students)
{
    IQueryable<StudentItem> itemsQueryable = null;

    IQueryable<int> studentsIds = AsTemporarySnapshotQueryable<StudentIdentity, TemporaryStudentIdentity>(students).Select(s => s.Id);
    ...

    return itemsQueryable;
}

First test... and it fails. The error log contained entry saying that #TemporaryStudentIdentity was not present. Did I miss something? I've made sure that connection stayed open so the session doesn't change, what else could be causing my temporary table not being visible to the query? After couple minutes of stupid staring into SQL Server profiler log it hit me. I've simplified my POC by omitting sp_executesql.

When a parametrized query is being executed by ADO.NET provider for SQL Server it is being wrapped by call to sp_executesql - this is how parameters are being passed. The tricky part of sp_executesql is the fact that the statement is being executed in "inner scope". Temporary table created by parent scope would be visible inside of sp_executesql but one created within sp_executesql is not visible outside. In my case this meant that I had to separate creation of the table from filling it with data (I didn't want to go away from parametrized query for obvious reasons).

Adjusting POC to sp_executesql usage

I've read all I could find regarding temporary tables but I wasn't able to find a way to create one ahead without specifying full definition, so the POC had to include it.

IF OBJECT_ID('tempdb..#TemporaryStudentIdentity') IS NOT NULL
BEGIN
    DROP TABLE #TemporaryStudentIdentity
END

CREATE TABLE #TemporaryStudentIdentity
(
    [Id] INT NOT NULL,
    ...
)

exec sp_executesql N'INSERT INTO #TemporaryStudentIdentity([Id], ...)
    (SELECT [Id], ...
    FROM (<sql representation="" of="" studentidentity="" enumerable="">) AS [TemporarySnapshotQueryable]'

exec sp_executesql N'SELECT ...
    FROM (SELECT ...
            FROM ... AS [Extent1]
            WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit1] WHERE [Limit1].[Id] = [Extenten1].[StudentId])
          UNION ALL
          SELECT ...
            FROM ... AS [Extent2]
            WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [Limit2] WHERE [Limit2].[Id] = [Extenten2].[StudentId])
          UNION ALL
          ...
          UNION ALL
          SELECT ...
            FROM ... AS [ExtentN]
            WHERE ... AND EXISTS (SELECT 1 AS [C1] FROM [tempdb].[#TemporaryStudentIdentity] AS [LimitN] WHERE [LimitN].[Id] = [ExtentenN].[StudentId])) AS UnionAll1'

This worked and was even faster than previous one (probably because SQL Server didn't had to invest processing into figuring out the definition) - CPU time ~420ms, overall duration ~520ms. The challenge was how to do that from code in as generic as possible way.

Adjusting the implementation

Putting this in simple words all I had to do was generating a "create table" script based on entity. This should be doable, Entity Framework should be storing the information I needed somewhere. After some research I've found what I was looking for. I've adopted the information provided into a from of extension method.

internal static class DbContextExtensions
{
    internal IEnumerable<EdmProperty> GetEntityPropertyColumns<TEntity>(this DbContext dbContext)
    {
        // Get the metadata
        MetadataWorkspace metadata = ((IObjectContextAdapter)_dbContext).ObjectContext.MetadataWorkspace;

        // Get the space within the metadata which contains information about CLR types
        ObjectItemCollection clrSpace = ((ObjectItemCollection)metadata .GetItemCollection(DataSpace.OSpace));

        // Get the entity type from the metadata that maps to the CLR type
        EntityType entityEntityType = metadata.GetItems<entitytype>(DataSpace.OSpace).Single(e => clrSpace.GetClrType(e) == typeof(TEntity));

        // Get the entity set that uses this entity type
        EntitySet entityEntitySet = metadata.GetItems<entitycontainer>(DataSpace.CSpace).Single().EntitySets.Single(s => s.ElementType.Name == entityEntityType.Name);

        // Get the mapping between conceptual and storage model for this entity set
        EntitySetMapping entityEntitySetMapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace).Single().EntitySetMappings.Single(m => m.EntitySet == entityEntitySet);

        // Get the entity columns
        return entityEntitySetMapping.EntityTypeMappings.Single().Fragments.Single().PropertyMappings.OfType<ScalarPropertyMapping>().Select(p => p.Column);
    }
}

Armed with this extension method I've started refactoring my temporary table generation code.

public IQueryable<TTemporaryEntity> AsTemporarySnapshotQueryable<TQueryProjection, TTemporaryEntity>(IQueryable<TQueryProjection> query)
{
    IQueryable<TTemporaryEntity> snapshotQueryable = null;

    ObjectQuery<TQueryProjection> temporarySnapshotObjectQuery = query.GetObjectQuery();
    IEnumerable<EdmProperty> temporarySnapshotColumns = _dbContext.GetEntityPropertyColumns<TTemporaryEntity>().ToArray();

    if ((temporarySnapshotObjectQuery != null) && temporarySnapshotColumns.Any())
    {
        string temporarySnapshotTableName = "#" + typeof(TTemporaryEntity).Name;

        StringBuilder temporarySnapshotCreateColumnsListBuilder = new StringBuilder();
        StringBuilder temporarySnapshotFillColumnsListBuilder = new StringBuilder();
        foreach (EdmProperty temporarySnapshotColumn in temporarySnapshotColumns)
        {
            temporarySnapshotCreateColumnsListBuilder.Append(GetTemporarySnapshotColumnCreateSql(temporarySnapshotColumn));
            temporarySnapshotFillColumnsListBuilder.AppendFormat("[{0}],", temporarySnapshotColumn.Name);
        }
        temporarySnapshotCreateColumnsListBuilder.Length -= 1;
        temporarySnapshotFillColumnsListBuilder.Length -= 1;

        string temporarySnapshotCreateSqlCommand = String.Format("IF OBJECT_ID('tempdb..{0}') IS NOT NULL BEGIN DROP TABLE {0} END{1}CREATE TABLE {0} ({2})", temporarySnapshotTableName, Environment.NewLine, temporarySnapshotCreateColumnsListBuilder);
        string temporarySnapshotFillSqlCommand = String.Format("INSERT INTO {0}({1}) (SELECT {1} FROM ({2}) AS [TemporarySnapshotQueryable])", temporarySnapshotTableName, temporarySnapshotFillColumnsListBuilder, temporarySnapshotObjectQuery.ToTraceString());
        object[] temporarySnapshotFillSqlCommandParameters = temporarySnapshotObjectQuery.Parameters.Select(parameter => new SqlParameter(parameter.Name, parameter.Value)).ToArray();

        // Since Entity Framework 6 manually opened connection will not be automatically closed until context disposal - this way the temporary table will be visible for other queries.
        _dbContext.Database.Connection.Open();
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotCreateSqlCommand.ToString());
        _dbContext.Database.ExecuteSqlCommand(temporarySnapshotFillSqlCommand, temporarySnapshotFillSqlCommandParameters);

        snapshotQueryable = _dbContext.Set<TTemporaryEntity>().AsNoTracking().AsQueryable();
    }

    return snapshotQueryable;
}

private static string GetTemporarySnapshotColumnCreateSql(EdmProperty temporarySnapshotColumn)
{
    string typeNameUpperCase = temporarySnapshotColumn.TypeName.ToUpperInvariant();
    string temporarySnapshotColumnCreateSqlSuffix = temporarySnapshotColumn.Nullable ? "," : " NOT NULL,";
    switch (typeNameUpperCase)
    {
        case "NUMERIC":
            return String.Format("[{0}] NUMERIC({1},{2}){3}", temporarySnapshotColumn.Name, temporarySnapshotColumn.Precision, temporarySnapshotColumn.Scale, temporarySnapshotColumnCreateSqlSuffix);
        case "NVARCHAR":
        case "VARCHAR":
            return String.Format("[{0}] {1}({2}){3}", temporarySnapshotColumn.Name, typeNameUpperCase, temporarySnapshotColumn.MaxLength, temporarySnapshotColumnCreateSqlSuffix);
        default:
            return String.Format("[{0}] {1}{2}", temporarySnapshotColumn.Name, typeNameUpperCase, temporarySnapshotColumnCreateSqlSuffix);
    }
}

I've added all the HasColumnType, HasMaxLength etc. to the configuration class and fired away... It worked!!!

Limitations

This solution has one serious limitation - the names of columns in temporary table must match the names of corresponding columns in query generated by Entity Framework. This won't always be true. If the query contains only columns coming directly from tables this should be safe assumption, but when calculated columns appears they will have names chosen by Entity Framework. In above approach this would require nasty hard coding like HasColumnName("C2") which should rather be avoided.

There is a potential way of working around this limitation - I will explore it in part two.

HTTPS is the core mechanism for accessing web resources in secure way. One of the limitations of HTTPS is the fact that user can manually provide an URL which doesn't contain the proper schema. In most cases this will result in application sending redirect response which will tell the browser to re-request the resource using HTTPS. Unfortunately this redirect creates a risk of man-in-the-middle attack. Strict Transport Security is a security enhancement which allows web applications to inform browsers that they should always use HTTPS when accessing given domain.

Strict Transport Security defines Strict-Transport-Security header with two directives: required max-age and optional includeSubDomains. From the moment browser receives the Strict-Transport-Security header it should consider the host as a Known HSTS Host for the number of seconds specified in max-age directive. Being a Known HSTS Host means, that browser should always use HTTPS for communication. In the initially described scenario (user providing HTTP schema or no schema at all) browser should cancel the initial request by itself and change the schema to HTTPS. Specifying includeSubDomains directive means that given rule applies also to all subdomains of current domain.

In order to implement this behavior in ASP.NET MVC application we need to fulfill two requirements: issue a redirect when request is being done with HTTP and send the header when request is being done with HTTPS. The first behavior is already available through RequireHttpsAttribute so we can inherit it - we just need to add the second.

public class RequireHstsAttribute : RequireHttpsAttribute
{
    private readonly uint _maxAge;

    public uint MaxAge { get { return _maxAge; } }

    public bool IncludeSubDomains { get; set; }

    public RequireHstsAttribute(uint maxAge)
        : base()
    {
        _maxAge = maxAge;
        IncludeSubDomains = false;
    }

    public override void OnAuthorization(AuthorizationContext filterContext)
    {
        if (filterContext == null)
        {
            throw new ArgumentNullException("filterContext");
        }

        if (filterContext.HttpContext.Request.IsSecureConnection)
        {
            StringBuilder headerBuilder = new StringBuilder();
            headerBuilder.AppendFormat("max-age={0}", _maxAge);

            if (IncludeSubDomains)
            {
                headerBuilder.Append("; includeSubDomains");
            }

            filterContext.HttpContext.Response.AppendHeader("Strict-Transport-Security", headerBuilder.ToString());
        }
        else
        {
            HandleNonHttpsRequest(filterContext);
        }
    }
}

We can now use this attribute for example by adding it global filters collection.

protected void Application_Start()
{
    ...
    GlobalFilters.Filters.Add(new RequireHstsAttribute(31536000) { IncludeSubDomains = true, Preload = true });
}

From this moment our application will be "enforcing" HSTS. But the initial problem still has not been fully resolved - there is still that one redirect which can happen if the application is accessed for the first time not over HTTPS. This is why HSTS Preload List has been created. This service allows for submitting domains which should be hardcoded as Known HSTS Hosts in the browsers - this removes the risk of that one potential redirect. The service is hosted by Google, but all major browsers vendors have stated that they will be using the submitted list of domains.

If one wants to included his application on the HSTS Preload List, after submitting the domain additional steps needs to be taken. The application must confirm the submission by including preload directive in Strict-Transport-Security header and fulfill some additional criteria:

  • Be HTTPS only and serve all subdomains over HTTPS.
  • The value of max-age directive must be at least eighteen weeks.
  • The includeSubdomains directive must be present.

Some small adjustments to our attribute are needed in order to handle this additional scenario.

public class RequireHstsAttribute : RequireHttpsAttribute
{
    ...
    public bool Preload { get; set; }

    public RequireHstsAttribute(uint maxAge)
        : base()
    {
        ...
        Preload = false;
    }

    public override void OnAuthorization(AuthorizationContext filterContext)
    {
        ...

        if (filterContext.HttpContext.Request.IsSecureConnection)
        {
            if (Preload && (MaxAge < 10886400))
            {
                throw new InvalidOperationException("In order to confirm HSTS preload list subscription expiry must be at least eighteen weeks (10886400 seconds).");
            }

            if (Preload && !IncludeSubDomains)
            {
                throw new InvalidOperationException("In order to confirm HSTS preload list subscription subdomains must be included.");
            }

            ...

            if (Preload)
            {
                headerBuilder.Append("; preload");
            }

            filterContext.HttpContext.Response.AppendHeader("Strict-Transport-Security", headerBuilder.ToString());
        }
        else
        {
            HandleNonHttpsRequest(filterContext);
        }
    }
}

Now we have full HSTS support with preloading in easy to use form of attribute - just waiting to be used in your application. You can find cleaned up source code here.

Every now and then I have a chance to review a book on this blog. This time it is Mastering jQuery by Alex Libby. I've been using jQuery in my day to day work for years, so when I saw mastering in book title I've approached it with single expectation - to teach me something.

Mastering jQuery Cover Book has about 400 pages of content divided into 14 chapters. Some of them seems to talk about basics (Installing jQuery or Organizing Your Code) while others go for quite specific and advanced topics (Manipulating Images or Using the Web Performance APIs ), but every single one of them hides interesting information. Also despite jQuery being the main theme of the book one can learn a little bit about CSS3 or Node.js - there is even a whole chapter dedicated to Node-WebKit (NW.js).

Every piece of knowledge is accompanied by dedicate samples - sometimes even to many of them. On several occasions we will see repetitive examples which show the same technique without visible added value between them. Still the accompanying code is very well organized. Author nicely references the code files which allows him to put only the most relevant source lines into the book, without depriving reader of an easy way for trying the solution he is reading about.

Author is using casual language across the book. There is an issue of this casual style going too far in some cases, which makes it feel unnatural. Luckily that unnatural feeling is not strong enough to prevent from absorbing the content. But the thing that was distracting for me is chapters order. For example chapters Animating in jQuery and Using jQuery Effects are separated by Advanced Event Handling which breaks the logical flow. This forced me to jump around chapters a little bit. But this is only a preference and I'm sure it will not be an issue for everybody.

The book is a comprehensive source of organized knowledge about jQuery. It managed to teach me a couple of things, what I believe is the most important characteristic of technical book. Reading this book really allowed me for mastering my jQuery skills a little bit more.

Older Posts