Detection of deceptive reviews : using classification and natural language processing features

University essay from Uppsala universitet/Institutionen för teknikvetenskaper

Abstract: With the great growth of open forums online where anyone can givetheir opinion on everything, the Internet has become a place wherepeople are trying to mislead others. By assuming that there is acorrelation between a deceptive text's purpose and the way to writethe text, our goal with this thesis was to develop a model fordetecting these fake texts by taking advantage of this correlation.Our approach was to use classification together with threedifferent feature types, term frequency-inverse document frequency,word2vec and probabilistic context-free grammar. We have managed todevelop a model which have improved all, to us known, results for twodifferent datasets.With machine translation, we have detected that there is apossibility to hide the stylometric footprints and thecharacteristics of deceptive texts, making it possible to slightlydecrease the accuracy of a classifier and still convey a message.Finally we investigated whether it was possible to train and test ourmodel on data from different sources and managed to achieve anaccuracy hardly better than chance. That indicated the resultingmodel is not versatile enough to be used on different kinds ofdeceptive texts than it has been trained on.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)