SIGMOD 2016 Tutorial: Automatic Entity Recognition and Typing in Massive Text DataIn today's computerized and information-based society, we are constantly exposed to vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text data (especially in massive, domain-specific text data). These methods can automatically identify token spans as entity mentions in text and label their types (eg, people, product, organization) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.
University of Illinois at Urbana Champaign1, Rensselaer Polytechnic Institute2
- Introduction to entity recognition and typing.
- Entity recognition: An overview and phrase-mining approaches
- Supervised and Semisupervised Entity Mention Detection
- Unsupervised Entity Mention Detection
- Weakly and Distantly Supervised Mention Detection
- Supervised Entity Typing
- Semi-supervised Entity Typing
- Entity linking for typing
- Weakly-supervised Entity Typing
- Unsupervised Entity Typing
- Distantly Supervised Entity Typing
- Xiang Ren*, Wenqi He*, Meng Qu, Heng Ji, Clare R. Voss, Jiawei Han. Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding, in Proc. 2016 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'16)
- Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R Voss, and Jiawei Han, Scalable Topical Phrase Mining from Text Corpora, in Proceedings of the VLDB Endowment, vol. 8, no. 3, VLDB, 2015.
- Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, and Jiawei Han, ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering, in Proc. 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'15)
- Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han, Mining Quality Phrases from Massive Text Corpora, in 2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'15)
Xiang Ren, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research focuses on knowledge acquisition from text data and mining linked data. In 2016, he received a Google PhD Fellowship for his work in Structured Data and Database Managment. He is the recipient of C. L. and Jane W.-S. Liu Award and Yahoo!-DAIS Research Excellence Gold Award in 2015. He received Microsoft Young Fellowship from Microsoft Research Asia in 2012.
Ahmed El-Kishky, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research interests include mining large unstructured data, text mining, and network mining. He is the recipient of both the National Science Foundation Graduate Research Fellowship as well as National Defense Science and Engineering Fellowship.
Heng Ji, Edward P. Hamilton Development Chair Associate Professor of Computer Science Department of Rensselaer Polytechnic Institute. Her research interests focus on Natural Language Processing and its connections with Data Mining and Vision. She received "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013 and NSF CAREER award in 2009. She coordinated the NIST TAC Knowledge Base Population task in 2010, 2011, 2014, 2015 and 2016.
Jiawei Han, Abel Bliss Professor, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research areas encompass data mining, data ware-housing, information network analysis, and database systems, with over 600 conference and journal publications. He is Fellow of ACM and Fellow of IEEE, and received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), and IEEE Computer Society W. Wallace McDowell Award (2009). His co-authored textbook "Data Mining: Concepts and Techniques", 3rd ed., (Morgan Kaufmann, 2011) has been adopted popularly world-wide.