Characterization of phishing website characteristics

University essay from Linköpings universitet/Institutionen för datavetenskap

Abstract: The occurrence of phishing domains are increasing continuously as attackers are able to make use of tool kits that creates the phishing websites for them. When knowledge in web development is no longer needed, anyone can perform a phishing attack and existing detection methods can not seem to keep up. Finding new techniques to identify these malicious domains are crucial to protect the potential victims visiting the website. Many of the existing methods are focusing on the visual appearance of the websites. This thesis choose to focus on the underlying structure instead. By collecting data on style sheets and certificates from both verified phishing domains and benign domains, datasets were created for both types of domains. Using a token-based similarity algorithm on the collected style sheet data, subsets were created based on style sheet similarity. Our analysis were focused on three main parts of the results, the characteristics of phishing domains compared to benign domains, the created subsets based on style sheet similarities and the matching style sheets in two of the subsets. The characteristics of the phishing domains were for the most part rather different compared to the benign domains, except for similarities found in the data on the style sheets. The created subsets using style sheet similarities where grouped into three datasets based on the amount of matching style sheets. The three datasets, despite originating from the same dataset, proved to have distinct differences in characteristics. From the two chosen subsets, one of the subsets contained style sheets indicating the domains in the subset were created by a phishing kit. We conclude that a method based on structural similarities to identify both phishing kits and phishing domains is possible to implement. Our methodology shows the possibilities of this method, but further development and research are required to make it reliable.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)