Decoding communication of non-human species - Unsupervised machine learning to infer syntactical and temporal patterns in fruit-bats vocalizations.

University essay from Stockholms universitet/Institutionen för data- och systemvetenskap

Abstract: Decoding non-human species communication offers a unique chance to explore alternative intelligence forms using machine learning. This master thesis focuses on discreteness and grammar, two of five linguistic areas machine learning can support, and tackles inferring syntax and temporal structures from bioacoustics data annotated with animal behavior. The problem lies in a lack of species-specific linguistic knowledge, time-consuming feature extraction and availability of limited data; additionally, unsupervised clustering struggles to discretize vocalizations continuous to human perception due to unclear parameter tuning to preprocess audio. This thesis investigates unsupervised learning to generalize deciphering syntax and short-range temporal patterns in continuous-type vocalizations, specifically fruit-bats, to address the research questions: How does dimensionality reduction affect unsupervised manifold learning to quantify size and diversity of the animal repertoire? and How do syntax and temporal structure encode contextual information? An experimental strategy is designed to improve effectiveness of unsupervised clustering for quantifying the repertoire and to investigate linguistic properties with classifiers and sequence mining; acoustic segments are collected from a dataset of fruit-bat vocalizations annotated with behavior. The methodology keeps clustering methods constant while varying dimensionality reduction techniques on spectrograms and their latent representations learnt by Autoencoders. Uniform Manifold Approximation and Projection (UMAP) embeds data into a manifold; density-based clusterings are applied to its embeddings and compared with agglomerative-based labels, used as ground-truth proxy to test robustness of models. Vocalizations are encoded into label sequences. Syntactic rules and short-range patterns in sequences are investigated with classifiers (Support Vector Machines, Random Forests); graph-analytics and prefix-suffix trees. Reducing the temporal dimension of Mel-spectrograms outperformed previous clustering baseline (Silhouette score > 0.5, 95% assignment accuracy). UMAP embeddings from sequential autoencoders showed potential advantages over convolutional autoencoders. The study revealed a repertoire between seven and approximately 20 vocal-units characterized by combinatorial patterns: context-classification achieved F1-score > 0.9 also with permuted sequences; repetition characterized vocalizations of isolated pups. Vocal-unit distributions were significantly different (p < 0.05) across contexts; a truncated-power law (alpha < 2) described the distribution of maximal repetitions. This thesis contributed to unsupervised machine learning in bioacoustics for decoding non-human communication, aiding research in language evolution and animal cognition.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)