Prediction of the Type for Web Page : A Practical Application in Classification

University essay from Informationssystem

Author: Mao Qian; [2013]

Keywords: ;

Abstract: As more and more data are generated in daily life, traditional data analysis methods reach their bottoms and often fail to discover unknown factors deep inside the data, which cause the adoption of data mining. Classification means mapping data into known groups, and it is one of primary tasks of data mining. The study in this thesis is about finding an automated solution to predict whether the value of the web page doesn’t decrease as time goes on, in other words evergreen or not. When recommending web pages to users according to their interests, it is valuable to know which pages are evergreen. There is no doubt this study belongs to the area of classification. In order to solve this problem, the knowledge and techniques involved in machine learning and web text mining are required to implement the solution. A number of models or classifiers are built during the implementation based on different features and optimizations, and they are evaluated by a method called cross validation. The best solution in this thesis is an ensemble of some simple models, which achieves highest accuracy in prediction. Moreover, limitations of solution are also presented and future improvements are suggested.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)