Friday, September 21, 2012

Hierarchical data and Entity Framework 4

When writing any kind of application it is not uncommon to deal with hierarchical data. If these data need to be persisted, then relational databases seem perfectly valid for this purpose. Consider developing an online store application where each category of products has its parent category and subcategories. The requirement is that user can display given category and all subcategories down below.

Category objects can be stored in a single table that is referencing itself. The ParentCategoryId column should contain the id of parent category. Root category will have null value assigned for parent category, every other category should point to another category. The integrity of data can be modeled at database level by using the foreign key constraints.
A Category table with one-to-many relationship (one category can have single parent category and multiple subcategories).
In this blog post I would like to present two options for querying data with parent-children relationship. Both approaches will use EntityFramework as an ORM. In the first technique I will traverse the categories hierarchy using only navigation properties provided by EntityFramework. The second approach will rely on a database stored procedure.

Preparing data.


To populate the table with test data I created a script that inserts 1000 categories, each on separate sublevel.

DECLARE @Level INT = 1
INSERT INTO Category VALUES ('RootCategory', 'Root category', NULL)

WHILE @Level < 1000
BEGIN
 INSERT INTO Category
 SELECT 'Subcategory No. ' + CAST(@Level AS VARCHAR(4)) AS Name,
     '' AS Description,
     SCOPE_IDENTITY() AS ParentCategoryId
 SET @Level += 1
END


RootCategory
 -> Subcategory No. 1
  -> Subcategory No. 2
   -> Subcategory No. 3
     ...
         -> Subcategory No. 999

The data is now prepared for fetching.

Navigation properties.


In the C# application I included ADO.NET Entity Data Model and renamed the entity container to StoreContext. I generated the model from database and included only Category table leaving the option Include foreign key columns in the model checked. The entity type for Category will appear inside of the model (as shown in the picture). I renamed the navigation properties to Subcategories and ParentCategory for better readability.
From that moment I was able to create the database context object and retrieve the root category.

var context = new StoreContext();
var root = context.Category
                  .First(c => c.Name.Equals("RootCategory"));

foreach (var subcategory in root.AllSubcategories())
        Console.WriteLine(subcategory.Name);


If you inspect the Subcategories property of root you will see only one direct subcategory. I am not aware of any built-in mechanisms in Entity Framework that could load any hierarchical data to an arbitratry depth. To load the entire hierarchy I will use the method below to recursively retrieve all subcategories.

public partial class Category
{
  public IEnumerable<Category> AllSubcategories()
  {
    yield return this;
    foreach (var directSubcategory in Subcategories)
      foreach (var subcategory in directSubcategory.AllSubcategories())
      {
        yield return subcategory;
      }
  }
}


The entity classes are generated as partial hence may be freely extended.

Stored procedure.


In the stored procedure approach the entire logic of preparation of the data set is performed at the database level. SQL Server (since the 2005 version) supports recursive common table expressions (CTE). Recursive CTE is a temporary result set that can reference itself and allows for writing concise queries against hierarchical data. The following stored procedure is designed to query all subcategories of a category passed as a parameter. CTE is introduced with the WITH statement.

CREATE PROCEDURE GetCategoriesTree(@RootName NVARCHAR(50))
AS BEGIN

WITH 
  CategoryTree (CategoryId, Name, Description, ParentCategoryId)
  AS
  (
    SELECT CategoryId, Name, Description, ParentCategoryId
    FROM Category
    WHERE Name = @RootName
    UNION ALL
    SELECT cat.CategoryId, cat.Name, cat.Description, cat.ParentCategoryId
    FROM Category cat
      INNER JOIN CategoryTree parent
        ON cat.ParentCategoryId = parent.CategoryId
  )
  
SELECT CategoryId, Name, Description, ParentCategoryId
FROM CategoryTree
OPTION (MAXRECURSION 32767);

END


Entity framework can encapsulate a stored procedure in a very elegant way. Function import allows to select existing stored procedure and map the returned rows to a collection of domain types. To perform seamless conversion, column names should be identical to property names.
After adding function import to the model, the function is accessible directly on the database context object.

var spSubcategories = context.GetCategoriesTree("RootCategory");

foreach (var spSubcategory in spSubcategories)
  Console.WriteLine(spSubcategory.Name);


The effect is identical as in the previous code.

Conclusions.


Both methods do not differ in terms of final results. However, there is a very important issue related to performance. In order to compare the execution times I used Stopwatch.

var context = new StoreContext();

var stopwatch = new Stopwatch();
stopwatch.Start();
var subcategories = context.Category
                    .First(c => c.Name.Equals("RootCategory"))
                    .AllSubcategories()
                    .ToList();
stopwatch.Stop();
Console.WriteLine("Loading {0} cat. with navigation properties took {1} ms",
                  subcategories.Count,
                  stopwatch.ElapsedMilliseconds);

stopwatch.Reset();

stopwatch.Start();
subcategories = context
                    .GetCategoriesTree("RootCategory")
                    .ToList();
stopwatch.Stop();
Console.WriteLine("Loading {0} cat. with stored procedure took {1} ms",
                  subcategories.Count,
                  stopwatch.ElapsedMilliseconds);


//output:
Loading 1000 cat. with navigation properties took 15259 ms
Loading 1000 cat. with stored procedure took 169 ms


The results highlight the difference that is almost two orders of magnitude in favor of the second solution. The test data used was a tree-like structure with 1000 levels. Certainly, if the data was not so nested then the first method would have perform much better (less number of queries to the database). In cases similar to presented, you can get huge performance gains by using the CTE approach.

The complete example can be downloaded from here.

1 comment:

  1. Nosal

    This is even better than Entity Framework:
    https://www.kellermansoftware.com/p-47-net-data-access-layer.aspx

    ReplyDelete