Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter)

University essay from Högskolan i Halmstad/Akademin för informationsteknologi

Abstract: The exponential growth of social media usage has led to massive data sharing, posing challenges for traditional systems in managing and analyzing such vast amounts of data. This surge in data exchange has also resulted in an increase in cyber threats from individuals and criminal groups. Traditional forensic methods, such as evidence collection and data backup, become impractical when dealing with petabytes or terabytes of data. To address this, Big Data Analytics has emerged as a powerful solution for handling and analyzing structured and unstructured data. This thesis explores the use of Apache Flink, an open-source tool by the Apache Software Foundation, to enhance cybercrime forensic research. Unlike batch processing engines like Apache Spark, Apache Flink offers real-time processing capabilities, making it well-suited for analyzing dynamic and time-sensitive data streams. The study compares Apache Flink's performance against Apache Spark in handling various workloads on a single node. The literature review reveals a growing interest in utilizing Big Data Analytics, including platforms like Apache Flink, for cybercrime detection and investigation, especially on social media platforms like X (formerly known as Twitter). Sentiment analysis is a vital technique, but challenges arise due to the unique nature of social data. X (formerly known as Twitter), as a valuable source for cybercrime forensics, enables the study of fraudulent, extremist, and other criminal activities. This research explores various data mining techniques and emphasizes the need for real-time analytics to combat cybercrime effectively. The methodology involves data collection from X, preprocessing to remove noise, and sentiment analysis to identify cybercrime-related tweets. The comparative analysis between Apache Flink and Apache Spark demonstrates Flink's efficiency in handling larger datasets and real-time processing. Parallelism and scalability are evaluated to optimize performance. The results indicate that Apache Flink outperforms Apache Spark regarding response time, making it a valuable tool for cybercrime forensics. Despite progress, challenges such as data privacy, accuracy improvement, and cross-platform analysis remain. Future research should focus on refining algorithms, enhancing scalability, and addressing these challenges to further advance cybercrime forensics using Big Data Analytics and platforms like Apache Flink.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)