Performance

Cold vs. Warm Query Execution

The very first time any query is made against a given model, the Entity Framework does a lot of work behind the scenes to load and validate the model. We frequently refer to this first query as a "cold" query. Further queries against an already loaded model are known as "warm" queries, and are much faster.

View Generation:

The process of computing mapping views based on the specification of the mapping is what we call view generation.

Mapping Views are nothing but executable representations of the transformations specified in the mapping for each entity set and association. Mapping views can be either

  • Query views: these represent the transformation necessary to go from the database schema to the conceptual model.
  • Update views: these represent the transformation necessary to go from the conceptual model to the database schema.

As the number of connected Entities and tables in your schemas increase, the view generation cost increases. Validating generated view costs much too.

  1. Use Pre-Generated Views to decrease model load time. To generate pre-generated views use
    • Entity Framework Power Tools Community Edition
    • T4 Templates
  2. Using Foreign Key Associations to reduce view generation cost.
  3. Moving your model to a separate assembly
  4. Disable validation of an edmx-based model

Caching in the Entity Framework

Entity Framework has the following forms of caching built-in:

  1. Object caching – the ObjectStateManager built into an ObjectContext instance keeps track in memory of the objects that have been retrieved using that instance. This is also known as first-level cache. By default when an entity is returned in the results of a query, just before EF materializes it, the ObjectContext will check if an entity with the same key has already been loaded into its ObjectStateManager. If an entity with the same keys is already present EF will include it in the results of the query. Although EF will still issue the query against the database, this behavior can bypass much of the cost of materializing the entity multiple times.
  2. Query Plan Caching - reusing the generated store command when a query is executed more than once. The first time a query is executed, it goes through the internal plan compiler to translate the conceptual query into the store command (for example, the T-SQL which is executed when run against SQL Server). If query plan caching is enabled, the next time the query is executed the store command is retrieved directly from the query plan cache for execution, bypassing the plan compiler.
    • Once the cache contains a set number of entries (800), we start a timer that periodically (once-per-minute) sweeps the cache.
    • During cache sweeps, entries are removed from the cache on a LFRU (Least frequently – recently used) basis. This algorithm takes both hit count and age into account when deciding which entries are ejected.
    • At the end of each cache sweep, the cache again contains 800 entries.
    • Use CompiledQuery to improve performance with LINQ queries
  3. Metadata caching - sharing the metadata for a model across different connections to the same model. This is essentially caching of type information and type-to-database mapping information across different connections to the same model. The Metadata cache is unique per AppDomain.
  4. Results caching - With results caching (also known as "second-level caching"), you keep the results of queries in a local cache. When issuing a query, you first see if the results are available locally before you query against the store. While results caching isn't directly supported by Entity Framework, it's possible to add a second level cache by using a wrapping provider. An example wrapping provider with a second-level cache is Alachisoft's Entity Framework Second Level Cache based on NCache.

Autocompiled Queries

When a query is issued against a database using Entity Framework, it must go through a series of steps before actually materializing the results; one such step is Query Compilation.

Since caching is now done automatically without the use of a CompiledQuery, we call this feature “autocompiled queries”.

Entity Framework detects when a query requires to be recompiled, and does so when the query is invoked even if it had been compiled before. Common conditions that cause the query to be recompiled are:

  • Using IEnumerable<T>.Contains<>(T value).
  • Using functions that produce queries with constants.
  • Using the properties of a non-mapped object.
  • Linking your query to another query that requires to be recompiled.

NoTracking Queries

Disabling change tracking to reduce state management overhead

If you are in a read-only scenario and want to avoid the overhead of loading the objects into the ObjectStateManager, you can issue "No Tracking" queries. Change tracking can be disabled at the query level.

Query Execution Options

  • LINQ to Entities.
var q = context.Products.Where(p => p.Category.CategoryName == "Beverages");
  • No Tracking LINQ to Entities.

When the context derives ObjectContext:

context.Products.MergeOption = MergeOption.NoTracking;
var q = context.Products.Where(p => p.Category.CategoryName == "Beverages");

OR

When the context derives DbContext:

var q = context.Products.AsNoTracking()
                        .Where(p => p.Category.CategoryName == "Beverages");
  • Entity SQL over an ObjectQuery.
ObjectQuery<Product> products = context.Products.Where("it.Category.CategoryName = 'Beverages'");
  • Entity SQL over an EntityCommand.
  • ExecuteStoreQuery.
var q1 = context.Database.SqlQuery<Product>("select * from products");
  • SqlQuery.
  • CompiledQuery.

Design time performance considerations

Inheritance Strategies

Entity Framework supports 3 basic types of inheritance and their combinations:

  • Table per Hierarchy (TPH) – where each inheritance set maps to a table with a discriminator column to indicate which particular type in the hierarchy is being represented in the row.
  • Table per Type (TPT) – where each type has its own table in the database; the child tables only define the columns that the parent table doesn’t contain.
  • Table per Class (TPC) – where each type has its own full table in the database; the child tables define all their fields, including those defined in parent types.

Upgrading from EF4 to improve model generation time

A SQL Server-specific improvement to the algorithm that generates the store-layer (SSDL) of the model is available in Entity Framework 5 and 6, and as an update to Entity Framework 4 when Visual Studio 2010 SP1 is installed.

Splitting Large Models with Database First and Model First

As model size increases, the designer surface becomes cluttered and difficult to use. We typically consider a model with more than 300 entities to be too large to effectively use the designer. The following blog post describes several options for splitting large models: http://blogs.msdn.com/b/adonet/archive/2008/11/25/working-with-large-models-in-entity-framework-part-2.aspx.

Performance considerations with the Entity Data Source Control

We've seen cases in multi-threaded performance and stress tests where the performance of a web application using the EntityDataSource Control deteriorates significantly. The underlying cause is that the EntityDataSource repeatedly calls MetadataWorkspace.LoadFromAssembly on the assemblies referenced by the Web application to discover the types to be used as entities.

The solution is to set the ContextTypeName of the EntityDataSource to the type name of your derived ObjectContext class. This turns off the mechanism that scans all referenced assemblies for entity types.

Setting the ContextTypeName field also prevents a functional problem where the EntityDataSource in .NET 4.0 throws a ReflectionTypeLoadException when it can't load a type from an assembly via reflection. This issue has been fixed in .NET 4.5.

POCO entities and change tracking proxies

POCO Entities:

  • Entity Framework enables you to use custom data classes together with your data model without making any modifications to the data classes themselves. This means that you can use "plain-old" CLR objects (POCO), such as existing domain objects, with your data model. These POCO data classes (also known as persistence-ignorant objects), which are mapped to entities that are defined in a data model, support most of the same query, insert, update, and delete behaviors as entity types that are generated by the Entity Data Model tools.
  • Entity Framework can also create proxy classes derived from your POCO types, which are used when you want to enable features such as lazy loading and automatic change tracking on POCO entities.
  • Your POCO classes must meet certain requirements to allow Entity Framework to use proxies, as described here: http://msdn.microsoft.com/library/dd468057.aspx.

Change Tracking proxies

Chance tracking proxies will notify the object state manager each time any of the properties of your entities has its value changed, so Entity Framework knows the actual state of your entities all the time. This is done by adding notification events to the body of the setter methods of your properties, and having the object state manager processing such events.

When a POCO entity does not have a change tracking proxy, changes are found by comparing the contents of your entities against a copy of a previous saved state. This deep comparison will become a lengthy process when you have many entities in your context, or when your entities have a very large amount of properties, even if none of them changed since the last comparison took place.

In summary: you’ll pay a performance hit when creating the change tracking proxy, but change tracking will help you speed up the change detection process when your entities have many properties or when you have many entities in your model. For entities with a small number of properties where the amount of entities doesn’t grow too much, having change tracking proxies may not be of much benefit.

Loading Related Entities

Lazy Loading vs. Eager Loading

Eager Loading: the related entities are loaded along with your target entity set. You use an Include statement in your query to indicate which related entities you want to bring in.

Lazy Loading: your initial query only brings in the target entity set. But whenever you access a navigation property, another query is issued against the store to load the related entity.

How to choose between Lazy Loading and Eager Loading

  • Think about Tradeoff between multiple requests against the database versus a single request that may contain a large payload.
  • If your requirement is to access many(more than 3) navigation properties from the fetched entities then use lazy loading.However, if the payload your query is bringing is not too big, you may experience performance benefits by using Eager loading
  • If you do not what exactly the data will be needed at run time then lazy loading is preferred.
  • If the database server is located across network then you need to consider latency into the picture. In this case typically Eager loading will be better because it requires fewer round trips. When the network latency is not an issue, using Lazy loading may simplify your code

Performance concerns with multiple Includes

  • It takes a relatively long time for a query with multiple Include statements in it to go through our internal plan compiler to produce the store command.
  • The majority of this time is spent trying to optimize the resulting query.
  • The generated store command will contain an Outer Join or Union for each Include, depending on your mapping.
  • Queries like this will bring in large connected graphs from your database in a single payload, which will acerbate any bandwidth issues, especially when there is a lot of redundancy in the payload

You can check for cases where your queries are returning excessively large payloads by accessing the underlying TSQL for the query by using ToTraceString and executing the store command in SQL Server Management Studio to see the payload size.

In such cases you can try to reduce the number of Include statements in your query to just bring in the data you need. Or you may be able to break your query into a smaller sequence of subqueries

Workaround to get lazy loading of properties

Entity Framework currently doesn’t support lazy loading of scalar or complex properties like large object as BLOB.

Use table splitting to separate the large properties into a separate entity and then load it separately only when needed.

Like photo or signature data only can be loaded if required.

Other considerations

Server Garbage Collection

Whenever EF is used in a multithreaded scenario, or in any application that resembles a server-side system, make sure to enable Server Garbage Collection.

AutoDetectChanges

Object cache and the Object state manager try to stay as synchronized as possible on each operation performed to a context so that the produced data is guaranteed to be correct under a wide array of scenarios.

Entity Framework might show performance issues when the object cache has many entities. Certain operations, such as Add, Remove, Find, Entry and SaveChanges, trigger calls to DetectChanges which might consume a large amount of CPU based on how large the object cache has become.

Consider temporarily turning off AutoDetectChanges in the sensitive portion of your code

Context per request

Entity Framework’s contexts are meant to be used as short-lived instances in order to provide the most optimal performance experience. Contexts are expected to be short lived and discarded, and as such have been implemented to be very lightweight and reutilize metadata whenever possible. In web scenarios it’s important to keep this in mind and not have a context for more than the duration of a single request.

Database null semantic

Async

Entity Framework 6 introduced support of async operations when running on .NET 4.5 or later. For the most part, applications that have IO related contention will benefit the most from using asynchronous query and save operations.

NGEN

Entity Framework 6 does not come in the default installation of .NET framework. As such, the Entity Framework assemblies are not NGEN’d by default which means that all the Entity Framework code is subject to the same JIT’ing costs as any other MSIL assembly. This might degrade the F5 experience while developing and also the cold startup of your application in the production environments. In order to reduce the CPU and memory costs of JIT’ing it is advisable to NGEN the Entity Framework images as appropriate. For more information on how to improve the startup performance of Entity Framework 6 with NGEN, see Improving Startup Performance with NGen.

Code First versus EDMX

Entity Data Model (EDM) : On left side put conceptual model (the objects) on the right side put storage schema (the database) and in the middle there is a bridge that maps left and right.

From this EDM, Entity Framework will derive the views to roundtrip data from the objects in memory to the database and back.

EDMX

Code First

  1. the storage schema, and the mapping, then the model loading stage only has to validate that the EDM is correct
  2. then generate the views
  3. then validate the views and have this metadata ready for use
  4. then can a query be executed or new data be saved to the data store.
  1. the Entity Framework has to produce an EDM from the provided code it does so by:
    • analyzing the classes involved in the model.
    • applying conventions and configuring the model via the Fluent API.
  2. After the EDM is built, validating EDM happens.
  3. then generate the views
  4. then validate the views and have this metadata ready for use
  5. then can a query be executed or new data be saved to the data store.

Thus, building the model from Code First adds extra complexity that translates into a slower startup time for the Entity Framework when compared to having an EDMX

When choosing to use EDMX versus Code First, it’s important to know that the flexibility introduced by Code First increases the cost of building the model for the first time. If your application can withstand the cost of this first-time load then typically Code First will be the preferred way to go.