The Search Problem

Feb 8th, 2009 | By | Category: design

In almost every application, there exists some kind of ability to execute a search for information. In fact, an application without a search screen is probably too trivial to even be justifiably called as an application. A search screen allows the user to look for “some stuff” by entering some criteria for searching. A new screen is shown that then displays the requested “stuff” that satisfies the entered criteria. Extensive search capabilities serve as Unique Selling Propositions for a computer application. The computer’s ability to sift through endless data and unearth information that satisfy certain criteria, has elevated the productivity of a multitude of businesses. Ever went to a bank and got sent back because you did not remember your account number? Not if your bank is automated. Now your account number  can be retrieved using various search criteria. (last name, first name, address, telephone number etc.)Search is used to find bank accounts, credit cards, users, products et. all ad infinitum. The “stuff” that can be retrieved depends on the exact application.  

In this post I am going to cover two types of searches – structured search and unstructured search (my terms) . In structured search, the information that is being searched exists in a very structured form (typically but not restricted to a relational database) . The information can be queried using well documented techniques and retrieved. The retrieved information is typically structured as well i.e. there exists a well known schema for the obtained results. Ex: retrieving the name and phone numbers of all customers – In this case, the structure of the result is known upfront and can be designed using some of the widely used reporting softwares available.  

Unstructured search on the other hand, tries to attribute structure to an unstructured mass of data. Unstructured search can also deal with a “gigantic mass of structured data”. In an unstructured search scenario, the intent is to allow the user to obtain information from a lot of data by revealing their intention in some way. We will talk about this a little later.

 

Structured Search – Salient Features

Most applications need to implement structured search.  Often, search was so well integrated into the application requirements that it was never treated as an isolated problem with potential for re-use.  This kind of myopic attitude towards search resulted in non-reusability of the search logic. Identical or near identical search logic was duplicated across multiple applications.

 Captured below are the salient commonalities between the search logic across multiple applications:

  • Most searches have a “search query” screen and a “search results” screen. The former captures the search criteria and the latter displays the results of the search. It is possible that both these screens are also combined into one. AJAX based web applications and desktop applications have even more richer approaches to search without refreshing entire screens. But the idea is still the same. We have a “Search Query” part of the screen and a “Search Results” part of the screen with a unique web flow between them. In fact, many web flow frameworks were invented keeping the search web flow in mind.
  • There may be a third “Detail screen” which displays “one search result” – i.e. if the result of the search matched exactly one record, then that solitary record is displayed in the detail screen. The web flow would also embrace this screen. For instance, look below for a typical web flow snippet for search.
  • Search results would have to be filtered to accommodate security constraints on the user. If the user is not allowed to see some results, then those must be filtered automatically.
  • Search may give the user the ability to display which columns need to be displayed in the results. For instance, if there is a Customer entity with various attributes such as name, phone, address etc., the user may only be interested in a few of these attributes(say the name and the phone number). In such a situation, there is no need to clutter the results screen with other attributes.
  • Search results would typically need to be sorted by one or more columns.
  • Search results may have to be paginated if the result set exceeds a “certain threshold” size. 
  • Search may also group the results by selected columns and provide results aggregated by groups.
  • Here is a typical search workflow:
  1.  showSearchScreen();
  2.   waitForInput();
  3.   searchResults = executeSearchQueryWithInputCriteria();
  4.   if ( count(searchResults) == 1)
  5.      showDetailScreen();
  6.   else {
  7.     showSearchResultsScreen();
  8.     waitForUserToClickOnOneOfTheHyperlinks();
  9.     showDetailScreen(clickedHyperlink);
  10.   }

Attempts at Encapsulating Structured Search Logic 

The above remarks might leave the reader with the impression that there was no “standardization” ever made with respect to the search logic.  That would be a pretty erroneous picture since there were many strides made in encapsulating search. Here are a few of them listed:

  • The biggest innovation ever made on search is in crafting out the Structured Query Language (SQL) which is used to talk to databases. The “select” statement in particular has been one of the biggest attempts at standardizing search across multiple databases. The select statement attempts to hide the nuances and the idiosyncracies of each database with a consistent programming facade. Except for a consistent pagination strategy, all of the other tenets above are handled by the SQL SELECT statement.
  • The UI workflow described above is of late being handled by UI Webflow frameworks such as  Spring WebFlow. Hence another part of the search logic is slowly being standardized albeit partially –  for UI is one of the hardest parts to handle due to the proliferation of multiple frameworks with a widely varying variety of patterns and paradigms. 
  •  Security frameworks (such as Spring Acegi for instance) attempt to standardize on the security constraints involved in search. Security is one ubiquitous horizontal concern that needs to be carefully incorporated across multiple layers of the application. Hence a standardized way of handling security would impart consistency across the application.

Unstructured Search & The Search Funnel

The stratospheric growth of the internet has precipitated the movement of information from small systems into the open. Hitherto, information such as pricing, item availability etc. were available only for users with access to very specific applications. Now, these things have become widely available so that anybody with internet access can query them at will. For instance, consider the travel industry. Now it is possible for potential consumers to know the prices of all the hotels along with their availability. The same thing applies for pricing information for cars, appliances etc. In short, there is a lot of information available in the web.

A lot of information is available in a structured form. For instance, in the travel industry the travel sites host web services that allow people to query for tickets availability etc. These web services have structured inputs and output formats. Sites like expedia.com would use this information to make this information available to a wide audience. But the information available is still structured. We know what information we want and we seek it using search criteria and we get the results. 

But how about the information that is widely available in the internet itself. There are web pages that document all kinds of useful information. These web pages don’t follow any semantic structures. It is hard for a program to infer any useful information about a web site by reading the HTML that is produced by it. HTML is only used for formatting data not to lend any structure to it. 

Search Engines parse through all this data and try to make sense of it in an objective fashion. When the consumers decide to query this kind of information, they would not only have to indicate what kind of information they are interested (ex: cars, travel etc.) but also have to indicate what part of this information would be useful to them (ex: pricing, model information, manufacturers etc.) So in case of unstructured search both the data queried on and the query itself are unknown upfront. 

search-funnelIn this case, we enter into what I call the search funnel. The top of the funnel is the search query. The bottom of the funnel is a more precise definition of the search criteria culminating in the obtaining of the search results from the search query. 

In the process of arriving at the criteria definition, we have to decipher the intent behind the query. (i.e. what does the user intend to achieve by issuing this query).Unstructured search presents the biggest problem due to the gargantuan nature of everything involved. The data that needs to be handled is massive. The user base who would query against this data is huge. The amount of processing involved to decipher the intent of the user, creating the criteria definition and finally obtaining the results based on the criteria definition is also humongous. 

In this respect, unstructured search poses a very big challenge. 

 

A Hybrid Search Model

Let us say we have a lot of data about different kinds of entities – ex: customers, bank accounts, employees, prospects, transactions etc. All these are structured information i.e. each of this data have a well known structure. We want users to have the ability to obtain this information by specifying “canned queries” upfront. A query would define the structured information that needs to be captured and the results as well. This enters into the realm of the report writers, ad hoc querying systems and the like.  A hybrid search model requires the data and also a substantial amount of meta data for the report writers and the ad hoc querying systems to work against. 

The search funnel model is also useful for diagnosing initial intent of the user and then deflecting them (depending on their intent) to the appropriate  “canned query”.  

The Trend

Increasingly, the internet has made a lot of information available, albeit in an unstructured way through web pages. Coupled with this, there is a trend to expose most services as web services. Hence unstructured and hybrid search models would become more dominant.

Be Sociable, Share!

 Raja has been blogging in this forum since 2006. He loves a technical discussion any day. He finds life incomplete without a handy whiteboard and a marker.


Tags: , ,

Leave Comment