Towards Product Attributes Extraction in Indonesian e-Commerce Platform

Autores/as

  • Muhammad Rif’at Universitas Indonesia, Faculty of Computer Science, Depok
  • Rahmad Mahendra Universitas Indonesia, Faculty of Computer Science, Depok
  • Indra Budi Universitas Indonesia, Faculty of Computer Science, Depok
  • Haryo Akbarianto Wibowo Universitas Indonesia, Faculty of Computer Science, Depok

DOI:

https://doi.org/10.13053/cys-22-4-3073

Palabras clave:

Attributes extraction, e-commerce, product title, Named-Entity Recognition, Indonesian language

Resumen

Product attribute extraction is an important task in e-commerce domain. Extracting pairs of attribute label and value from free-text product descriptions can be useful for many tasks, such as product matching, product categorization, faceted product search, and product recommendation. In this paper, we present a study of attribute extraction from Indonesiane-commerce product titles. We annotate 1,721 product titles with 16 attribute labels. We apply supervised learning technique using CRF algorithm. We propose combination of lexical, word embedding, and dictionary features to learn the attribute using joint extraction model. Our model achieves F1-measure 47.30% and 68.49% respectively for full match and partial match evaluation. Based on the experiment, we find that doing attributes extraction on more various number and diverse attributes simultane ously does not necessarily give worse result compared to extraction on less number of attributes.

Descargas

Publicado

2018-12-30