Full Text Search in PostgreSQL

What is wrong with the good old SQL searches?

  • There is no support for common languages. Regular expressions do not suffice because they cannot easily handle words that essentially mean the same, i.e., blocked and blocking is one such example. You might miss documents that contain blocked, although you probably would like to find them when searching for blocking. Now don’t tell me you are going to use OR through all the derived words – that’s not what a good developer would do 😉
  • There is no support for indexes and thus, you have to process the whole document every time you need to search something which is rather tedious and slow.

What is full text search?

As the official documentation defines it – Full Text Searching (or just text search) provides the capability to identify natural-language documents that satisfy a query, and optionally to sort them by relevance to the query. The most common type of search is to find all documents containing given query terms and return them in order of their similarity to the query. Notions of query and similarity are very flexible and depend on the specific application. The simplest search considers query as a set of words and similarity as the frequency of query words in the document.

This essentially means that you can now just search for block and you’re going get all it’s derivatives in the document that you want! Isn’t that cool?

How does it work though?

PostgreSQL has two utility functions that will help us through this quest – to_tsvector() and to_tsquery()

  • The to_tsvector() command will create a set of lexemes using the document provided to it. It’ll conveniently omit any words that have little meaning – words like ‘the’, ‘an’, etc. When you run this,
    SELECT to_tsvector('english', 'Full text search is an awesome feature');

The result is a ‘map’ of words or a dictionary which represents the location of each word.
’awesom’:6’featur’:7’full’:1’search’:3’text’:2

  • The to_tsquery() takes in a list of words that will be searched against the result of our to_tsvector() function.

SELECT to_tsvector(‘Full text search is an awesome feature’) @@ to_tsquery(‘full’);

The query above will give us a result true.

Now how to use it? The most apt use would be to create a tsvector of the columns of your database you want to search on and run a tsquery command against that to select the rows you need. What’s more interesting is that you can use operators like &, |, !, etc in your queries!

The official documentation does an awesome job of explaining things! This is just an overview – for more details please go through this link.

No Comments

Post A Comment