The n+1 selects problem..

Posted by - raja 2010/06/20 18:45:16

How the n+1 selects problem effectively kills performance in db and SOA..

A few years ago, one of my numerous job sojourns took me to an interesting project at a telecom company.  I was a developer then - as I would like to think of myself today as well - and had to maintain code that connected to numerous databases and published various services. In one of the C++ classes, the code issued a database query that obtained a set of rows from one database table - let us call it T1. It then looped through the result set and for each one of the returned rows, it invoked another query to do some checks. The query obtained stuff from another table T2 which had T1's primary key as a foreign key.  The code intrigued me at that time since I was somewhat mystified as to why the developer did not do a database join to obtain the results from both T1 and T2 at the same time.  Instead, he was executing a query on T2 for each of the rows obtained from T1.

I did not know it then but I was staring face to face at the (n+1) selects problem! This is the Loch Ness Monster that has devoured a lot of apps in its time and continues strong in many an enterprise!

Problem statement

Imagine a somewhat heavy process that returned a set of rows in response to a query. If for each of the rows returned, we are invoking another heavyweight process to obtain some information about them, then we are supposed to be in the midst of the (n+1) selects problem!

Let us illustrate this with an example. Imagine a CRM system that returns a set of customers in response to a query. Let us say that for a particular query Q1 (Ex: get all customers whose last name is 'SMITH') the CRM system returned 50 rows with customer Ids from C1 through C50. Now for each of the 50 customers returned, we need to obtain account details information. We first issue a query Q2 that would return all the account details for customer C1. Then we issue a second query for customer C2 and so on till we get to query Q51 that returns account details for customer ID C50. Quite clearly, we have issued  51 queries - the first customer query and 50 account queries for processing the request. If the code base is liberally sprayed with this kind of epidemic, you can imagine how responsive the system would be! The first query Q1 is what I would call a "looping query" and the second queries (Q2 thru Q51) are looped queries i.e. queries that are executed in a loop created out of the result set of the looping query.

If we say that the (n+1) selects problem is an impediment to optimization, we would have understated the obvious! I would like to state here that this problem is not constrained to databases alone although it is prevalent and is readily recognized there.  It can be in any kind of enterprise system. We would discuss this more as we go.

Causes & Mitigations

ORM

Most instances of this epidemic have their origin in Object Relational Mapping (ORM)  systems. The purpose of ORM is to abstract out database interactions from the core code. Hence the application code deals with entity objects that are manufactured by the ORM system. These objects may spark other queries if certain methods in them are invoked by the application code. Let us go back to the Customer example and define a Customer class that has associated accounts (instances of Account class).

Look at the following code:

    public class Customer {
     private String id;
    // other stuff. with getters and setters.
     private List<Account> accounts;
     // getter and setter for the list of accounts.

Now, let us say that the client gets a Customer object from the database and decides to invoke the getAccounts() method in it. One of two things can happen:

The ORM system may fire a query to actually obtain the accounts information – a strategy called “lazy loading” since the accounts are only loaded on demand and not pre-fetched on the expectation of the client eventually calling the ORM. The ORM system might have already obtained the accounts at the time the customer query has been fired (doing a join to the relevant account tables) These pre-fetched accounts are returned without the need for firing any query. Obviously, the lazy loading strategy helps out when the Customer object is accessed without the need for accessing account related information. The pre-fetching helps out when you know upfront that you would also be interested to access account related information for each Customer. The (n+1) selects problem springs from the fact that many clients of ORM employ the lazy loading strategy (which is in many cases the default strategy) when a pre fetching strategy is warranted. They would do well to RTFM the ORM manual instead of relying on default sub-optimal retrieval strategies. Many ORMs provide very fine grained control between the two strategies. Pre-fetching eliminates most of the ORM related (n+1) selects problems.

Enterprise Architecture and SOA

Let us consider a massive enterprise where the Customer and Account entities are maintained by disparate Systems of Record (SOR). The customer entity is maintained in a CRM system while the Account information is maintained in a system that depends on the type of account. (Ex: one system can maintain savings account, another one current account and yet another one for brokerage account etc.)

In such a situation, it is not possible for the CRM system to return the account related information for each customer. It just does not have it! No single system in fact has it all! So we would be forced to go to other systems to obtain the account related information for each customer under consideration thereby vitiating optimization. So what is the solution?

The solution in a single word is “caching”.

We should build caches that contain the account objects and hence have the ability to expedite the account retrieval process for each customer. These caches can be memory caches or it can also be a database cache that combines all the entities into one single database. This database serves as a “hub” for data and hence would be able to support a “joined” query that combines account and customer information into one result set.

Caches always have problems with latency. Hence the enterprise needs to get realistic about how “real time” the data needs to be. Incremental ETL and Event Driven Architecture (EDA) serve to mitigate the latency problem to an extent. These would be discussed in another article at some point in time.

Security and Horizontal Requirements

Horizontal applications are usually implemented without touching the functional code using interception strategies such as AOP. Consider a user who executes a query on the Customers. If the user is not authorized to view a certain customer then he should not be allowed to see the customer data. Now this is a security requirement that is usually addressed using an AOP advice that implements domain level security. For each of the returned customers, we may have to fire additional queries to obtain some data. This data is used to determine if the logged in user has the ability to see the customer data. This also causes the (n+1) selects problem because for each Customer we are firing additional queries.

One crude way of solving this problem is to combine the horizontal and functional requirements and let the functional code implement the domain level security. This is inelegant but at least is optimal. However, there are other ways of achieving this goal. Some of these are given below:

Query Enrichment
The user query must be silently enriched by the AOP security advice to also filter out customers whom the user is not authorized to see. Hence in addition to the query search criteria that have been issued by the user, non authorized customers also get filtered out at the database itself. This strategy can get extremely elegant and also supports database based pagination – all the filtering is done in the database itself and hence the number of customers retrieved would not be affected post facto. Caching
This is similar to what was advocated for the enterprise SOR problem. The necessary attributes are cached for quick retrieval so that the looped query is not that expensive. Option 1 without database filtering: This is similar to option 1 but in this case the necessary attributes are retrieved from the database by query enrichment. The filtering logic itself is implemented in code. Hence the filtering logic is repeated for every row returned. But there would be no need to issue any more queries. This is a good “middle ground” approach because it allows the ability to incorporate complex rule based filtering security logic in the application.

Summary

The (n+1) selects problem is one of the most pervasive problems in enterprise applications. A proper overall strategy is necessary to mitigate and control this to an extent.