Definition Extraction From Swedish Technical Documentation : Bridging the gap between industry and academy approaches
Abstract: Terminology is concerned with the creation and maintenance of concept systems, terms and definitions. Automatic term and definition extraction is used to simplify this otherwise manual and sometimes tedious process. This thesis presents an integrated approach of pattern matching and machine learning, utilising feature vectors in which each feature is a Boolean function of a regular expression. The integrated approach is compared with the two more classic approaches, showing a significant increase in recall while maintaining a comparable precision score. Less promising is the negative correlation between the performance of the integrated approach and training size. Further research is suggested.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)