Advanced search

Showing result 1 - 5 of 13 essays matching the above criteria.

  1. 1. Neural Cleaning of Swedish Textual Data : Using BERT-based methods for Token Classification of Running and Non-Running Text

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Andreas Ericsson; [2023]
    Keywords : Natural Language Processing; Text Cleaning; Transformers; BERT; Token Classification; Deep Learning; Språkteknologi; Textrensning; Transformers; BERT; Token-klassificering; Djupinlärning;

    Abstract : Modern natural language processing methods requires big textual datasets to function well. A common method is to scrape the internet to acquire the needed data. This does, however, come with the issue that some of the data may be unwanted – for instance, spam websites. READ MORE

  2. 2. Less Detectable Web Scraping Techniques

    University essay from Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

    Author : Fredric Färholt; [2021]
    Keywords : web scraping; data mining; javascript; puppeteer; algorithms; data collection; security mechanisms; honeypot; security tools; undetectability; webbskrapning; data mining; javascript; puppeteer; algoritmer; datakollektion; säkerhetsmekanismer; honeypot; säkerhetsverktyg; oupptäckbar;

    Abstract : Web scraping is an efficient way of gathering data, and it has also become much eas- ier to perform and offers a high success rate. People no longer need to be tech-savvy when scraping data since several easy-to-use platform services exist. READ MORE

  3. 3. Web Scraping using Machine Learning

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Victor Carle; [2020]
    Keywords : ;

    Abstract : This thesis explores the possibilities of creating a robust Web Scraping algorithm, designed to continously scrape a specific website even though the HTML code is altered. The algorithm is intended to be used on websites that have a repetitive HTML structure containing data that can be scraped. READ MORE

  4. 4. Implementation of Collisions in FEMIC for Modelling of the RF Heating with a Lower Hybrid Resonance

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Léo Belleil; [2020]
    Keywords : ;

    Abstract : Fusion energy is a potential sustainable solution to provide clean energy for the human society. The fusion reaction requires that the ionised fuel (plasma) is heated to extreme temperatures. Magnetic confinement is used to keep the plasma away from the wall and the components mounted on it. READ MORE

  5. 5. Scrape off or throw away? : consumer attitudes to mouldy foods at home

    University essay from SLU/Department of Molecular Sciences

    Author : Lotta Jonasson; [2020]
    Keywords : mould; survey; consumer; food waste; mycotoxin;

    Abstract : Knowledge about how consumers handle mouldy food products at home is limited. It is of interest to investigate these aspects more closely, since some moulds can produce mycotoxins and other secondary metabolites that could be harmful to human health. READ MORE