Patent Searching Gets Serious in PatSeer

When we launched PatSeer in April 2012 we thought we had worked out the search tools and capabilities completely since the syntax supported Boolean, wildcards, fuzzy, proximity, range and term boosting queries. Well this was until professional (and novice) searchers took on PatSeer and drove us back to the drawing board with their feature demands.

Since then we have continuously improved the search engine to include many advanced techniques enjoyed by power users. Some of these are:

1) Complex boolean within proximity searches

Without this, searching for something like (A or B or X) within 5 words of (C or D or E or F or G or H) would require 18 different search phrases. Thankfully you don’t have to create such long queries now.

2) Per Word Stemming

A must have for professional searching, we implemented this in a fashion that makes it transparent to searchers and gives them full control of the stemming process. And did we say it supports 6 languages too!

3) Ordered/Unordered Proximity

While earlier we only had ordered proximity, there are many cases where you need words appearing in any order and so this was added soon after we got the feedback.

4) Range bound proximity

A chemical searcher didn’t want results in which two compounds appeared together or even with a single word in between. So we added support for defining a lower range to proximity syntax so that “X Y”~[3 TO 10] matched only those records that have X and Y appearing with 3 to 10 words in between.

5) Composite search field for easy cross-language searching

PatSeer stores patent text fields on a per language basis to avoid false matches and to improve the quality of results. So for instance searching in TAC looks up only English content, TACDE searches only German content and TACJA searches only Japanese content. Users wanting to search across all TAC content irrespective of the language earlier had to repeat the search term across the fields and combine it using an OR. However a composite field character $ now allows the user to easily search across content in all languages. So TAC$:virus* searches for virus* across TAC of patents in all languages.

6) Simple Search Form

This one is for R&D professionals who dont want to construct complex boolean queries. The simple interface allows users to only provide the set of search terms and the  query is constructed by the portal itself

7) Multi-lingual / Non-Latin Searching

Support for searching in Korean, Chinese (Simplified) and Japanese language required rethinking of the way the Non-Latin data is indexed and also the way the query is analyzed. However those have been corrected with the right rules and atleast one of our Japanese user is quite pleased now!

Still, am pretty sure we are soon going to meet a patent searcher who needs something not currently present! (Fingers crossed…)