KPI - Key Performance Indicators
One of the often over looked aspects in an application is the capture of Key Performance Indicators(KPI). As an application executes over a period of time, its performance needs to be monitored for determining bottle necks so that it can be engineered better in future iterations. The capture of KPI is itself a very challenging task. Typically, the most readily available KPIs are produced using logging frameworks. Everyone of us have sprayed our code with log statements that indicate the entry and exit times of various methods invoked. This is important but hard to utilize to determine anything meaningful. Besides it violates the Single Responsibility Principle (SRP) since all components would be tracking KPIs in addition to whatever else they are doing. The sophisticated solution would involve the development of a KPI framework that facilitates the capture of these indicators. This post attempts to analyze the requirements of a typical KPI framework. It also shows code that implements some of the framework components. The orchestration of these components into the actual application would become specific to the particular application and should be handled at the application level. A decent KPI framework must aspire to do the following:
- It should seamlessly integrate with an existing application and be able to log metrics into a persistent store such as a database.
- It should do minimal writes and that too, preferably in an asynchronous fashion, to the persistent store.
- The persistent store must be "query-able" in a flexible manner. Ideally, it should be a schema-less NOSQL data store.
- It should obviate the necessity of individual components doing their own logging.
A desirable feature is to eliminate multiplicity of log records. For instance, let us consider a typical situation when a controller assumes charge of a particular request. It might invoke a service which resides in the business layer. The server may in turn use a DAO for doing database operations. This entire sequence of invocations might lead to 3x2 log statements - three entry and three exit time statements by the controller, service and dao layers respectively. This multiplicity complicates analysis since a lot of records need to be examined for making any decisions. Since each component logs the times separately, it becomes an arduous task to present a consolidated picture of the entry and exit times of each component given a single user request.
The first thing is that every request, be it a customer interaction or any automated invocations either in the business layer or using web services, needs to be individually tracked for various times among different components. This is facilitated by using the notion of a request ID per request. There should be one record that should capture all these different metrics so that it can be persisted in one go to an underlying store. The different classes involved, are described below.
- This is the chief domain object of this system. It consists of multiple SLA Time records each with its own start and end time as shown in the code. An SLA Type is associated with each instance of SLATime. The SLA type determines what kind of SLA is measured. For instance is it the DAO time or the service time or the web time? The SLARecord flows end to end for each request and is designed to be enhanced by various components with their entry and exit timings. So who creates it or enhances it? Here is where SLALoggingInterceptor comes in.
- This is designed to do both the tasks of creating and enhancing the SLARecord for a given request. The SLALoggingInterceptor surrounds every logging component. How it surrounds it depends on the place it is hooked into the application. For instance, in the web layer it can potentially be implemented as a servlet filter. In Struts2 it can be implemented as a Struts2 interceptor. In an of late typical service layer orchestrated using Spring, it can be implemented as a Spring AOP interceptor. The SLALoggingInterceptor creates an SLARecord and puts it into a "context" if it does not exist already. This behavior of taking an SLARecord and putting it into a context depends on the actual situation. The diagram shows the end to end flow.
- This persists the SLARecord. There should be implementations of SLALogger to accomplish this persistence asynchronously or to a database. I have specific code for most layers.
But even this simple framework can pave a long way towards creating the quintessential SLA record. It is then possible to use ELK or other tools to further consolidate these logs across multiple applications and do some meaningful KPI analysis on them.