E-Book/ Presentation
Going Beyond the Hype
Benchmarking AI Search Quality in Patent Databases
Need of the hour: Benchmarking AI quality
Many companies struggle with benchmarking AI search quality, especially when selecting a database from available AI-based patent search systems. Without standard metrics, users cannot quantify how AI results compare to professional searches or other result sets. Metrics like Precision and Recall fall short, as they ignore the order of results, leaving evaluation incomplete.
The presentation covers a first-of-its-kind benchmarking between AI search and manual Boolean search. It is a result of more than 4 months of grassroot benchmarking done and reveals new metrics to analyse patent search quality. Here is what is covered:
- 3 real-world search examples (Narrow to broad) taken
- Two Expert search techniques – Iterative and “put-in-a-pile” approach used
- A side-by-side comparison of PatSeer AI search quality vs. Boolean only, Boolean+Human, Boolean+Human+reranked across 4 different quality metrics
- Metrics used Weighted Average Score, Precision, Overlap % and Normalised Discounted Cumulative Gain (NDCG)
The analysis and subsequent presentation was prepared via a joint effort between Andrea Davis (Bodkin IP) and Manish Sinha (PatSeer).