Interactionwise Semantic Awareness in Visual Relationship Detection
Abstract: Visual Relationship Detection (VRD) is a relatively young research area, where thegoal is to develop prediction models for detecting the relationships between objectsdepicted in an image. A relationship is modeled as a subject-predicate-object triplet,where the predicate (e.g an action, a spatial relation, etc. such as “eat”, “chase”or “next to”) describes how the subject and the object are interacting in the givenimage. VRD can be formulated as a classification problem, but suffers from theeffects of having a combinatorial output space; some of the major issues to overcomeare long-tail class distribution, class overlapping and intra-class variance. Machinelearning models have been found effective for the task and, more specifically, manyworks proved that combining visual, spatial and semantic features from the detectedobjects is key to achieving good predictions. This work investigates on the use ofdistributional embeddings, often used to discover/encode semantic information, inorder to improve the results of an existing neural network-based architecture forVRD. Some experiments are performed in order to make the model semantic-awareof the classification output domain, namely, predicate classes. Additionally, differentword embedding models are trained from scratch to better account for multi-wordobjects and predicates, and are then fine-tuned on VRD-related text corpora.We evaluate our methods on two datasets. Ultimately, we show that, for some set ofpredicate classes, semantic knowledge of the predicates exported from trained-fromscratchdistributional embeddings can be leveraged to greatly improve prediction,and it’s especially effective for zero-shot learning.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)