Speed Up Read Only Queries in Entity Framework

Terminal Updated: March four, 2021 | Created: February 23, 2021

This is a companion article to the EF Cadre Community Standup called "Operation tuning an EF Cadre app" where I apply a series of performance enhancements to a demo ASP.Internet Cadre e-commerce book selling site chosen the Book App. I start with 700 books, then 100,000 books and finally ½ one thousand thousand books.

This article, plus the EF Core Customs Standup video, pulls information from capacity xiv to sixteen from my book "Entity Framework Core in Action, twond edition" and uses code in the associated GitHub repo https://github.com/JonPSmith/EfCoreinAction-SecondEdition.

NOTE: You lot tin can download the lawmaking and run the awarding described in this commodity/video via the https://github.com/JonPSmith/EfCoreinAction-SecondEdition GitHub repo. Select the Part3 co-operative and run the projection called BookApp.UI. The dwelling house page of the Book App has data on how to change the Volume App's settings for chapter xv (iv SQL versions) and chapter 16 (Cosmos DB).

Other articles that are relevant to the performance tuning shown in this article

  • A technique for building loftier-performance databases with EF Core – describes the SQL (+cached) approach.
  • Building a robust CQRS database with EF Core and Cosmos DB – described an older CQRS/Cosmos DB approach (Book App uses integration events)
  • An in-depth report of Creation DB and the EF Core iii to 5 database provider – differences/limitations when using Creation DB.
  • Building loftier operation database queries using Entity Framework Core and AutoMapper – a way to create select queries automatically.

TL;DR – summary

  • The demo eastward-commerce book selling site displays books with diverse sort, filter and paging that you lot might expect to need. 1 of the hardest of the queries is to sort the volume by their average votes (think Amazon'due south star ratings).
  • At 700 books a well-designed LINQ query is all you demand.
  • At 100,000 books (and ½ one thousand thousand reviews) LINQ on its own isn't expert enough. I add 3 new ways to handle the book display, each one improving performance, but as well takes more than development effort.
  • At ½ million books (and 2.7 meg reviews) SQL on its ain has some serious problems, so I bandy to a Control Query Responsibility Segregation (CQRS) compages, with the read-side using a Cosmos DB database (Cosmos DB is a NOSQL database)
  • The use of Cosmos DB with EF Core highlights
    • How Cosmos DB is different from a relational (SQL) database
    • The limitations in EF Core's Cosmos DB database provider
  • At the end I give my view of performance gain confronting development time.

The Volume App and its features

The Book App is a demo e-commerce site that sells books. In my volume "Entity Framework Core in Activeness, 2nd edition" I use this Book App as an example of using various EF Core features. Information technology starts out with about fifty books in it, but in Part3 of the volume I spend three chapters on functioning tuning and take the number of books up to 100,000 volume and then to ½ 1000000 books. Here is a screenshot of the Book App running in "Chapter 15" mode, where it shows four dissimilar modes of querying a SQL Server database.

The Book App query which I meliorate has the following Sort, Filter, Page features

  • Sort: Price, Publication Date, Average votes, and main key (default)
  • Filter: By Votes (one+, 2+, iii+, 4+), By year published, By tag, (defaults to no filter)
  • Paging: Num books shown (default 100) and page num

Note: that a volume can be soft deleted, which means at that place is e'er an actress filter on the books shown.

The book part of the database (the office of the database that handles orders isn't shown) looks like this.

Kickoff level of performance tuning – Good LINQ

I manner to load a Book with its relationships is past using Includes (see lawmaking below)

var books = context.Books     .Include(book => volume.AuthorsLink         .OrderBy(bookAuthor => bookAuthor.Lodge))              .ThenInclude(bookAuthor => bookAuthor.Author)     .Include(volume => volume.Reviews)     .Include(book => book.Tags)     .ToList();          

By that isn't the best way to load books if you want skilful performance. That's considering a) y'all are loading a lot of data that you lot don't need and b) you would demand to do sorting and filter in software, which is slow. So hither are my five rules for building fast, read-only queries.

  1. Don't load data you don't need, e.m.  Utilise Select method selection out what is needed.
    Encounter lines xviii to 24 of my MapBookToDto form.
  2. Don't Include relationships but option out what you demand from the relationships.
    See lines 25 to 30 of my MapBookToDto class.
  3. If possible, move calculations into the database.
    See lines xiii to 34 of my MapBookToDto course.
  4. Add SQL indexes to any property you sort or filter on.
    Encounter the configuration of the Book entity.
  5. Add AsNoTracking method to your query (or don't load any entity classes).
    See line 29 in ListBookService course

NOTE: Dominion iii is the hardest to get correct. Just remember that some SQL commands, like Average (SQL AVE) can return cipher if there are no entries, which needs a cast to a nullable type to make it piece of work.

So, combining the Select, Sort, Filter and paging my lawmaking looks like this.

public async Task<IQueryable<BookListDto>> SortFilterPageAsync     (SortFilterPageOptions options) {     var booksQuery = _context.Books          .AsNoTracking()          .MapBookToDto()          .OrderBooksBy(options.OrderByOptions)          .FilterBooksBy(options.FilterBy, options.FilterValue);       await options.SetupRestOfDtoAsync(booksQuery);       return booksQuery.Folio(options.PageNum - ane,          options.PageSize);  }          

Using these rules volition first you off with a practiced LINQ query, which is a groovy starting point. The side by side sections are what to do if that doesn't' requite you the performance yous want.

When the five rules aren't enough

The query in a higher place is going to work well when at that place aren't many books, but in affiliate fifteen I create a database containing 100,000 books with 540,000 reviews. At this point the "five rules" version has some performance bug and I create three new approaches, each of which a) improves performance and b) take development endeavor. Here is a list of the four approaches, with the Proficient LINQ version as our base of operations performance version.

  1. Skilful LINQ: This uses the "five rules" arroyo. We compare all the other version to this query.
  2. SQL (+UDFs): This combines LINQ with SQL UDFs (user-divers functions) to movement concatenations of Author's Names and Tags into the database.
  3. SQL (Dapper): This creates the required SQL commands and and so uses the Micro-ORM Dapper to execute that SQL to read the data.
  4. SQL (+caching): This pre-calculates some of the plush query parts, like the averages of the Review's NumStars (referred to as votes).

In the video I describe how I build each of these queries and the functioning for the hardest query, this is sort by review votes.

NOTE: The SQL (+caching) version is very circuitous, and I skipped over how I built it, just I have an commodity called "A technique for building high-performance databases with EF Cadre" which describes how I did this. Also, chapter 15 on my book "Entity Framework Core in Action, iind edition" covers this too.

Here is a chart in the I showed in the video which provides performances timings for 3 queries from the hardest (sort past votes) down to a simple query (sort by engagement).

The other chart I showed was a breakup of the parts of the simple query, sort by date. I wanted to show this to point out that Dapper (which is a micro-ORM) is only significantly faster than EF Core if you have improve SQL then EF Core produces.

Once you take a performance problem just taking a few milliseconds off isn't going to be enough – typically y'all demand cutting its time by at to the lowest degree 33% and often more. Therefore, using Dapper to shave off a few milliseconds over EF Cadre isn't worth the development fourth dimension. So, my advice is and study the SQL that EF Cadre creates and if you know away to improve the SQL, then Dapper is a expert solution.

Going bigger – how to handle ½ meg or more books

In chapter 16 I build what is chosen a Control Query Responsibility Segregation (CQRS) architecture. The CQRS architecture acknowledges that the read side of an application is dissimilar from the write side. Reads are often complicated, drawing in information from multiple places, whereas in many applications (only not all) the write side tin be simpler, and less onerous. This is true in the Book App.

To build my CQRS organisation I decided to make the read-side live in a different database to the write-side of the CQRS architecture, which allowed me to employ a Cosmos DB for my read-side database. I did this considering Cosmos DB designed for operation (speed of queries) and scalability (how many requests it tin can handle). The effigy beneath shows this two-database CQRS organization.

The key point is the data saved in the Cosmos DB has equally many of the calculations equally possible pre-calculated, rather like the SQL (+buried) version – that's what the project stage does when a Book or its associated relationships are updated.

If y'all want to find out how to build a two-database CQRS code using Cosmos DB then my commodity Building a robust CQRS database with EF Core and Cosmos DB describes ane way, while chapter xvi on my book provides another style using events.

Limitations using Creation DB with EF Core

It was very interesting to work with Cosmos DB with EF Core every bit at that place were two parts to deal with

  • Cosmos DB is a NoSQL database and works differently to a SQL database (read this Microsoft article for one view)
  • The EF Core five Cosmos DB database provider has many limitations.

I had already look at these 2 parts back in 2019 and written an commodity, which I have updated to EF Core 5, and renamed it to "An in-depth study of Cosmos DB and the EF Core iii to 5 database provider".

Some of the issues I encountered, listed with the bug that fabricated the biggest change to my Book App are:

  • EF Core five limitation: Counting the number of books in Cosmos DB is Tiresome!
  • EF Core v limitation: EF Cadre 5 cannot do subqueries on a Cosmos DB database.
  • EF Core 5 limitation: No relationships or joins.
  • Cosmos departure: Complex queries might need breaking upwardly
  • EF Cadre 5 limitation: Many database functions non implemented.
  • Cosmos difference: Complex queries might need breaking upward.
  • Creation divergence: Skip is slow and expensive.
  • Creation divergence: By default, all properties are indexed.

I'm not going to get though all of these – the "An in-depth study of Cosmos DB and the EF Cadre three to v database provider" covers most of these.

Because of the EF Core limitation on counting books, I changed the way that that paging works. Instead of yous picking what page you want you have a Side by side/Prev approach, like Amazon uses (see figure after list of query approaches). And to permit a counterbalanced performance comparing with the SQL version and the Cosmos DB version I added the best two SQL approaches, merely turned of counting also (SQL is ho-hum on that).

It too turns out that Creation DB tin count very fast so I built another way to query Creation DB using its Internet (pseudo) SQL API. With this the Book App had four query approaches.

  1. Creation (EF): This accesses the Cosmos DB database using EF Core (with some parts using the SQL database where EF Core didn't have the features to implement parts of the query.
  2. Cosmos (Directly): This uses Cosmos DB's NET SQL API and I wrote raw commands – bit like using Dapper for SQL.
  3. SQL (+cacheNC): This uses the SQL cache arroyo using the 100,000 books version, merely with counting turned off to compare with Creation (EF).
  4. SQL (DapperNC): This uses Dapper, which has the all-time SQL functioning, merely with counting turned off to compare with Cosmos (EF).

The following figure shows the Book App in CQRS/Cosmos DB style with the four query approaches, and the Prev/Next paging approach.

Performance if the CQRS/Creation DB version

To examination the performance, I used an Azure SQL Server and Cosmos DB service from a local Azure site in London. To compare the SQL performance and the Cosmos DB performance I used databases with a similar cost (and low enough it didn't price me as well much!). The table below shows what I used.

Database type Azure service name Functioning units Price/month
Azure SQL Server Standard 20 DTUs $37
Cosmos DB Pay-as-you-go transmission calibration, 800 RUs $47

I did operation tests on the Cosmos DB queries while I was adding books to the database to run into if the size of the database effected functioning. Its difficult to get a skilful test of this as there is quite a bit of variation in the timings.

The chart beneath compares EF Core calling Cosmos DB, referred to as Cosmos (EF), against using direct Cosmos DB commands via its Internet SQL API – referred to every bit Cosmos (Direct).

This chart (and other timing I took) tells me 2 things:

  • The increase in the number in the database doesn't brand much effect on the operation (the Creation (Direct) 250,000 is well within the variation)
  • Counting the Books costs ~25 ms, which is much better than the SQL count, which added about ~150 ms.

The important performance examination was to expect at Cosmos DB against the all-time of our SQL accesses. I picked a cross-section of sorting and filtering queries and run them on all four query approaches – run into the nautical chart below.

From the timings in the figure about here some conclusions.

  1. Even the best SQL version, SQL (DapperNC), doesn't piece of work in this application because whatever sort or filter on the Reviews took then long that the connection timed out at thirty seconds.
  2. The SQL (+cacheNC) version was at parity or better with Cosmos DB (EF) on the first two queries, but equally the query got more complex it cruel behind in performance.
  3. The Cosmos DB (direct), with its book count, was ~25% slower than the Creation DB (EF) with no count merely is twice as fast every bit the SQL count versions.

Of course, there are some downsides of the CQRS/Cosmos DB approach.

  • The add and update of a book to the Cosmos DB takes a bit longer: this is because the CQRS requires four database accesses (two to update the SQL database and ii to update the Cosmos database) – that adds upwards to about 110 ms, which is more than than double the fourth dimension a unmarried SQL database would have. There are means around this (see this function of my article about CQRS/Cosmos DB) but information technology takes more work.
  • Cosmos DB takes longer and costs more if you skip items in its database. This shouldn't exist a problem with the Book App as many people would give up later on a few pages, but if your application needs deep skipping through data, then Cosmos DB is non a good fit.

Fifty-fifty with the downsides I still retrieve CQRS/Creation DB is a practiced solution, specially when I add in the fact that implementing this CQRS was easier and quicker than edifice the original SQL (+enshroud) version. Besides, the Cosmos concurrency handling is easier than the SQL (+cache) version.

Annotation: What I didn't examination is Cosmos DB's scalability or the power to have multiple versions of the Cosmos DB around the work. Mainly because it's hard to do and it costs (more than) money.

Performance confronting development endeavor

In the end it'southward a trade-off of a) performance proceeds and b) development time. I have tried to summarise this in the post-obit table, giving a number from i to 9 for difficultly (Diff? in table) and performance (Perf? In the table).

The other matter to consider is how much more than complexity does your performance tuning add together to your application. Badly implemented functioning tuning can make an application harder to enhance and extend. That is one reason why employ similar the event arroyo I used on the SQL (+enshroud) and CQRS / Cosmos DB approaches because it makes the least changes to the existing code.

Conclusion

As a freelance developer/architect I have had to performance melody many queries, and sometimes writes, on real applications. That'southward not because EF Core is bad at functioning, merely because real-globe awarding has a lot of data and lots of relationships (oft hierarchical) and it takes some extra piece of work to get the performance the client needs.

I have already used a variation of the SQL (+cache) on a client's app to improve the performance of their "has the warehouse got all the parts for this job?". And I wish Cosmos DB was around when I built a multi-tenant service that needed to encompass the whole of the Usa.

Hopefully something in this article and video will be useful if (when!) you demand performance tune your application.

NOTE: You might like to look at the commodity "My experience of using modular monolith and DDD architectures" and its companion article to look at the architectural approaches I used on the Part3 Book App. I found the Modular Monolith architectural approach really nice.

I am a freelance programmer who wrote the book "Entity Framework Core in Action". If you lot need help performance tuning an EF Core application I am available for work. If yous desire hire me please contact me to discuss your needs.

allensciarger.blogspot.com

Source: https://www.thereformedprogrammer.net/five-levels-of-performance-tuning-for-an-ef-core-query/

0 Response to "Speed Up Read Only Queries in Entity Framework"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel