Towards Realistic Datasets forClassification of VPN Traffic : The Effects of Background Noise on Website Fingerprinting Attacks

University essay from Karlstads universitet/Institutionen för matematik och datavetenskap (from 2013)

Abstract: Virtual Private Networks (VPNs) is a booming business with significant margins once a solid user base has been established and big VPN providers are putting considerable amounts of money into marketing. However, there exists Website Fingerprinting (WF) attacks that are able to correctly predict which website a user is visiting based on web traffic even though it is going through a VPN tunnel. These attacks are fairly accurate when it comes to closed world scenarios but a problem is that these scenarios are still far away from capturing typical user behaviour.In this thesis, we explore and build tools that can collect VPN traffic from different sources. This traffic can then be combined into more realistic datasets that we evaluate the accuracy of WF attacks on. We hope that these datasets will help us and others better simulate more realistic scenarios.Over the course of the project we developed automation scripts and data processing tools using Bash and Python. Traffic was collected on a server provided by our university using a combination of containerisation, the scripts we developed, Unix tools and Wireshark. After some manual data cleaning we combined our captured traffic together with a provided dataset of web traffic and created a new dataset that we used in order to evaluate the accuracy of three WF attacks.By the end we had collected 1345 capture files of VPN traffic. All of the traffic were collected from the popular livestreaming website twitch.tv. Livestreaming channels were picked from the twitch.tv frontpage and we ended up with 245 unique channels in our dataset. Using our dataset we managed to decrease the accuracy of all three tested WF attacks from 90% down to 47% with a WF attack confidence threshold of0.0 and from 74% down to 17% with a confidence threshold of 0.99. Even though this is a significant decrease in accuracy it comes with a roughly tenfold increase in the number of captured packets for the WF attacker.Thesis artifacts are available at github.com/C-Sand/rds-collect.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)