Evaluation of methods for full-text search in patents

University essay from Lunds universitet/Institutionen för elektro- och informationsteknik

Abstract: In this thesis we have evaluated methods for doing full-text searches in patent documents. The aim of patent searches is to find evidence and relevant documents when an invalidity search is done on a patent. With three different language models, BOW, SPECTER and SBERT, we have evaluated the results of two different text segmentation methods, greedy sentence split and paragraph split, and two different clustering methods, euclidean and spherical. We have found that the spherical clustering outperforms the euclidean one and that both segmentation methods works well for finding relevant parts of documents, both methods with its own advantages and drawbacks. The configurations were evaluated in four stages, where the first three were automatic and the last one was a manual evaluation by employees at AWA and Lund University. We conclude that our methods have great potential but more testing on a better engineered test set as well as more data from the manual evaluation is needed to draw further conclusions.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)