Less Detectable Web Scraping Techniques

University essay from Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

Abstract: Web scraping is an efficient way of gathering data, and it has also become much eas- ier to perform and offers a high success rate. People no longer need to be tech-savvy when scraping data since several easy-to-use platform services exist. This study conducts experiments to see if people can scrape in an undetectable fashion using a popular and intelligent JavaScript library (Puppeteer). Three web scraper algorithms, where two of them use movement patterns from real-world web users, demonstrate how to retrieve information automatically from the web. They operate on a website built for this research that utilizes known semi-security mechanisms, honeypot, and activity logging, making it possible to collect and evaluate data from the algorithms and the website. The result shows that it may be possible to construct a web scraper algorithm with less detectability using Puppeteer. One of the algorithms reveals that it is possible to control computer performance using built-in methods in Puppeteer.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)