Soft Cardinality in Semantic Text Processing: Experience of the SemEval International Competitions

Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh

Abstract


Soft cardinality is a generalization of the classic set cardinality (i.e., the number of elements in a set), which exploits similarities between elements to provide a “soft” counting of the number of elements in a collection. This model is so general that can be used interchangeability as cardinality function in resemblance coefficients such as Jaccard’s, Dice’s, cosine and others. Beyond that, cardinality-based features can be extracted from pairs of objects being compared to learn adaptive similarity functions from training data. This approach can be used for comparing any object that can be represented as a set or bag. We and other international teams used soft cardinality to address a series of natural language processing (NLP) tasks in the recent SemEval (semantic evaluation) competitions from 2012 to 2014. The systems based on soft cardinality have always been among the best systems in all the tasks in which they participated. This paper describes our experience in that journey by presenting the generalities of the model and some practical techniques for using soft cardinality for NLP problems.

Keywords


Similarity measure; soft computing; set cardinality; semantics; natural language processing

Full Text: PDF

Refbacks

  • There are currently no refbacks.