Cross Site Product Page Classification with Supervised Machine Learning

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Abstract: This work outlines a possible technique for identifying webpages that contain product  specifications. Using support vector machines a product web page classifier was constructed and tested with various settings. The final result for this classifier ended up being 0.958 in precision and 0.796 in recall for product pages. The scores imply that the method could be considered a valid technique in real world web classification tasks if additional features and more data were made available.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)