Essays about: "Text Cleaning"

Showing result 1 - 5 of 14 essays containing the words Text Cleaning.

  1. 1. Optimising Machine Learning Models for Imbalanced Swedish Text Financial Datasets: A Study on Receipt Classification : Exploring Balancing Methods, Naive Bayes Algorithms, and Performance Tradeoffs

    University essay from Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

    Author : Li Ang Hu; Long Ma; [2023]
    Keywords : Imbalanced datasets; Swedish text financial datasets; Accuracy; Matthews correlation coefficient; Recall; Multinomial Naive Bayes; SMOTE; TomekLinks; Performance optimization;

    Abstract : This thesis investigates imbalanced Swedish text financial datasets, specifically receipt classification using machine learning models. The study explores the effectiveness of under-sampling and over-sampling methods for Naive Bayes algorithms, collaborating with Fortnox for a controlled experiment. READ MORE

  2. 2. Fake Mass-Produced Advertisements Detection on Global Online Adult Service Websites

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Ernest Pokropek; [2023]
    Keywords : Machine learning; Spam detection; Mass-produced spam; Global adult online services; Maskininlärning; Detektering av Spam; Massproducerad Spam; Globala Webbplatser som Erbjuder Eskorttjänster;

    Abstract : A significant amount of sex trafficking victims are being advertised on online adult services, which are currently being flooded with spam. Investigators rely on online adult services to track cases of sex trafficking; however, the ever-increasing volume of spam poses a mounting challenge, making their task progressively more difficult. READ MORE

  3. 3. Neural Cleaning of Swedish Textual Data : Using BERT-based methods for Token Classification of Running and Non-Running Text

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Andreas Ericsson; [2023]
    Keywords : Natural Language Processing; Text Cleaning; Transformers; BERT; Token Classification; Deep Learning; Språkteknologi; Textrensning; Transformers; BERT; Token-klassificering; Djupinlärning;

    Abstract : Modern natural language processing methods requires big textual datasets to function well. A common method is to scrape the internet to acquire the needed data. This does, however, come with the issue that some of the data may be unwanted – for instance, spam websites. READ MORE

  4. 4. Hemtandvård hos hundar med parodontit. En uppföljande enkätstudie efter professionell tandrengöring (PTR)

    University essay from SLU/Dept. of Clinical Sciences

    Author : John Svärd; [2023]
    Keywords : enkät; parodontit; tandborstning; hemtandvård; hund;

    Abstract : Parodontal sjukdom (gingivit och parodontit) är en av hundens vanligaste sjukdomar. Daglig hemtandvård är gold standard för profylax och de flesta hundägare anser att hundens tandhälsa är mycket viktig. Trots detta visar flera studier att efterlevnaden av råd är mycket låg avseende tandborstning. READ MORE

  5. 5. Preprocessing method comparison and model tuning for natural language data

    University essay from Högskolan Dalarna/Mikrodataanalys

    Author : Peter Tempfli; [2020]
    Keywords : Natural language processing; sentiment analysis; machine learning;

    Abstract : Twitter and other microblogging services are a valuable source for almost real-time marketing, public opinion and brand-related consumer information mining. As such, collection and analysis of user-generated natural language content is in the focus of research regarding automated sentiment analysis. READ MORE