Automated Digitization and Summarization of Analog Archives : Comparing summaries made by GPT-3 and a human

University essay from Uppsala universitet/Signaler och system

Author: Maja Linderholm; [2022]

Keywords: ;

Abstract: This thesis aimed to create a tool that could assist climate researchers in their fieldwork. Through dialog with researchers at Stockholms University a need and interest for automated digitization and summarization of their handwritten notes could be identified. Climate research may require work conducted out in the field and during fieldwork, many researchers prefer to take handwritten notes which can generate large physical archives. A downside with only physical archives is that the data and knowledge stored here become less available and create a threshold for researchers to use the data since manually digitizing handwritten texts can be very time-consuming. At the end of the thesis, a software program was created which could automatically digitize and summarize handwritten texts to save time for researchers. The tool consists of (1) Google Cloud Vision API used to digitize a photo of handwritten text by using a convolutional neural network (CNN) and (2) the transformer-based algorithm GPT-3 used to summarize the digitized text. The GPT-3 algorithm provided two different engines, Davinci and Curie. The performance of the algorithms was evaluated with a data set consisting of handwritten texts provided by Stockholms University. The results indicated that the performance of Google Cloud Vision API was highly correlated to the quality of the image and the way of handwriting. With a unique handwriting follows a poor classification of letters since the algorithm performed badly on shapes that were unfamiliar. A survey was used to evaluate the performance of GPT-3. The survey got 73 responses where the subjects would grade five summaries conducted by a human and the GPT-3 engines Davinci and Curie respectively from the same text. The results from the survey indicated that the performance of the engine Davinci was comparable to the performance of a human while Curie was not a preferable option.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)