Word Embeddings for IoT Based on Device Activity Footprints
Abstract
With the expansion of IoT ecosystem, there is an explosion of the number of devices and sensors and the data generated by these devices. However, the tools available to analyze such data are limited. Word embeddings, widely used in the natural language processing (NLP) domain, provides a way to get similar words to the current word. In this paper, we extend the theory of word embeddings to the area of IoT devices, proposing a method to generate the word embeddings for IoT devices and sensors in a smart home based on their activity. We model IoT devices as vectors using a concept like Word2Vec and App2Vec, where the time between the device firings is also taken into account. These computed word embeddings can be used for a variety of use cases, such as to find similar devices in an IoT device store, or as a signature of each type of IoT device. We show results of a feasibility study on the CASAS dataset and a private real-world dataset of IoT device activity logs, using our method to identify the patterns in embeddings of various types of IoT devices in a household. We get a probability of more than 0.65 for similar types of devices clustering together, independent of session gap value and embedding vector size for the CASAS dataset. We also get a prob-ability of 0.4 on the private dataset, independent of session gap value and embedding vector size.
Keywords
Word2Vec, IoT2Vec, word eEmbeddings, smart home, internet of things, natural language processing