Multimodal Learning Based Spatial Relation Identification

Sandeep Kumar Dash, Y. V. Sureshchandra, Yatharth Mishra, Partha Pakray, Ranjita Das, Alexander Gelbukh


Spatial Relation identification is one of the integral parts of Spatial Information Retrieval. It deals with identifying the spatially related objects in view of their physical orientation or placement with respect to each other. The concept is widely used in many fields such as Robotics, Image Caption Generation and many more such areas. In this work the focus is to gather information from multiple modalities such as Image and its corresponding Text so as to strengthen the learning process for the identification of Spatial Relation pairs from a given text. Two different multimodal approaches are proposed in this work. In the first approach, information is explored as a sequential learning process where the individual Spatial Roles are identified as connected entities, which makes the Spatial Relation retrieval easy and efficient enough. To counter the small size of the dataset along with necessity to avoid overfitting, an efficient backward propagation based Neural Network was used to classify the candidate roles and the relations. The feature selection was different for all the classification tasks. Building on the selected feature from the first approach, the second approach uses a transfer learning method that utilizes an existing image caption generation model to retrieve the vital topic based information from image which is then used for the task. Thereby both approaches used information from two modalities which are further used to train the system in the respective approach. The model achieves state-of-the-art performance in terms of Precision for two of the Spatial Roles identification. This validates the advantage of using multimodal learning when compared with other partial-multimodal processes.


Spatial role labeling, spatial relation identification, multimodal learning, multi layer perceptron

Full Text: PDF