Entity Extraction in Biochemical Text using Multiobjective Optimization

Authors

  • Utpal Kumar Sikdar Department of Computer Science and Engineering, Indian Institute of Technology
  • Asif Ekbal Department of Computer Science and Engineering, Indian Institute of Technology
  • Sriparna Saha Department of Computer Science and Engineering, Indian Institute of Technology

DOI:

https://doi.org/10.13053/cys-18-3-2034

Keywords:

Multiobjective modified differential evolution (MODE), feature selection, ensemble learning, conditional random field (CRF), named entity (NE).

Abstract

In this paper we propose a multiobjectivemodified differential evolution based feature selectionand classifier ensemble approach for biochemical entityextraction. The algorithm performs in two layers. Thefirst layer concerns with determining an appropriate setof features for the task within the framework of a supervisedstatistical classifier, namely, Conditional RandomField (CRF). This produces a set of solutions, a subsetof which is used to construct an ensemble in the secondlayer. The proposed approach is evaluated for entity extractionin chemical texts, which involves identification ofIUPAC and IUPAC-like names and classification of theminto some predefined categories. Experiments that werecarried out on a benchmark dataset show the recall,precision and F-measure values of 86.15%, 91.29% and88.64%, respectively.

Author Biographies

Utpal Kumar Sikdar, Department of Computer Science and Engineering, Indian Institute of Technology

is a Senior Research Fellowin the Department of Computer Science and Engineering,IIT Patna, India. He received his Bachelorand Master degrees in Information Technology andSoftware Engineering from Vidyasagar Universityand Jadavapur University in 2004 and 2008, respectively.Prior to joining to his Ph.D. programmehe served as a specialist at Tata Elxsi Ltd., India.His research interests include anaphora resolutionand information extraction.

Asif Ekbal, Department of Computer Science and Engineering, Indian Institute of Technology

is an Assistant Professor in the Departmentof Computer Science and Engineering,IIT Patna. He received his Master and PhD inComputer Science and Engineering from JadavpurUniversity in 2004 and 2009, respectively. HisMaster and PhD theses were related to the broadareas of Natural Language Processing. Beforejoining IITP, he served as postdoctoral researchfellow at the University of Trento, Italy and HeidelbergUniversity, Germany. His broad areasof research include Natural Language Processing(NLP), Information Extraction, Bio-text Mining etc.He has authored/co-authored more than 80 technicalarticles in international journals, book chapters,and conference/workshop proceedings. Hereceived the Best Innovative Project Award fromthe Indian National Academy of Engineering in theyear 2000.

Sriparna Saha, Department of Computer Science and Engineering, Indian Institute of Technology

is an Assistant Professor in the Departmentof Computer Science and Engineering,IIT Patna. She received her Master and PhD inComputer Science from Indian Statistical Institute,Kolkata in 2005 and 2011, respectively. Her currentresearch interests include pattern recognition, multiobjectiveoptimization and biomedical informationextraction. She is the recipient of the Lt RashiRoy Memorial Gold Medal from the Indian StatisticalInstitute for outstanding performance in MTech(computerscience). She is the recipient of theGoogle India Women in Engineering Award,2008.She received India4EU fellowship of the EuropeanUnion to work as a Post-doctoral Research Fellowin the University of Trento, Italy from September2010-January 2011. She is also the recipientof Erasmus Mundus Mobility with Asia (EMMA)fellowship of the European Union to work as aPost-doctoral Research Fellow in the HeidelbergUniversity, Germany from September 2009-June2010.

Downloads

Published

2014-09-29