WWW 2017 Tutorial: Contructing Structured Information Networks from Massive Text Corpora
Time: 9-10:30am & 11-12:30pm, April 3.
Location: Meeting Room 10.The success of data mining technology is largely attributed to the efficient and effective analysis of structured data. The construction of a well-structured, machine-actionable database from unstructured or loosely-structured data sources is often the premise of consequent applications. Although the majority of existing data generated in our society is unstructured, big data leads to big opportunities to uncover structures of real-world entities, attributes, relations from massive text corpora. By integrating these semantic-rich structures with other inter-related structured data (e.g., product specification, user transaction log), one can construct a powerful StructNet as a conceptual abstraction of the original text corpora. The uncovered StructNets will facilitate browsing information and inferring knowledge that are otherwise locked in the text corpora. Computational machines can effectively perform algorithmic analysis at a large scale over these StructNets, and apply the new insights and knowledge to improve human productivity in various downstream tasks.
University of Illinois at Urbana Champaign
Outline & Slides
- Introduction [PDF]
- Structurd network of factual knowledge
- Text to network to knowledge
- Supervised approaches
- Unsupervised approaches
- Weakly and Distantly Supervised phrase mining
- Entity recognition and coarse-grained typing
- Fine-grained entity typing
- Joint extraction of typed entities and relations
- Supervised attribute learning
- Pattern-based boostrapping
Code & Systems[CoType] [AFET] [PLE] [ClusType] [AutoPhrase] [SegPhrase][TopMine] [MetaPAD]
CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han.
International World-Wide Web Conference (WWW), 2017.
- Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding
Xiang Ren, Wenqi He, Meng Qu, Heng Ji, Clare R. Voss, Jiawei Han.
In Proc. 2016 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), 2016.
AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding
Xiang Ren, Wenqi He, Meng Qu, Lifu Huang, Heng Ji, Jiawei Han.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
- Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R Voss, and Jiawei Han, Scalable Topical Phrase Mining from Text Corpora, in Proceedings of the VLDB Endowment, vol. 8, no. 3, VLDB, 2015.
- Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, and Jiawei Han, ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering, in Proc. 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), 2015.
- Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, and Jiawei Han, Mining Quality Phrases from Massive Text Corpora, in 2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), 2015.
Xiang Ren, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research focuses on creating computational tools for better understanding and exploring massive text data. He has published over 25 papers in major conferences. He received Google PhD Fellowship in Structured Data and Database Management in 2016, KDD Rising Star by Microsoft Academic Search in 2016, C. W. Gear Outstanding Graduate Student Award by CS@Illinois in 2016, and Yahoo!-DAIS Research Excellence Award in 2015. Mr. Ren has rich experiences in delivering tutorials in major conferences, including SIGKDD 2015, SIGMOD 2016 and WWW 2016.
Meng Jiang, Postdoctoral Research Associate, Department of Computer Science, Univ.\ of Illinois at Urbana-Champaign. His research focuses on behavioral modeling and social media analysis. He got his Ph.D. of Computer Science from Tsinghua University, Beijing in 2015. His Ph.D. thesis won the Dissertation Award at Tsinghua. His recent research won the SIGKDD 2014 Best Paper Finalist. His ICDM 2015 Tutorial won the honorarium.
Jingbo Shang, Ph.D. candidate, Department of Computer Science, Univ.\ of Illinois at Urbana-Champaign. His research focuses on mining and constructing structured knowledge from massive text corpora. He is the recipient of Computer Science Excellence Scholarship and Grand Prize of Yelp Dataset Challenge in 2015.
Jiawei Han, Abel Bliss Professor, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research areas encompass data mining, data ware-housing, information network analysis, and database systems, with over 600 conference and journal publications. He is Fellow of ACM and Fellow of IEEE, and received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), and IEEE Computer Society W. Wallace McDowell Award (2009). His co-authored textbook "Data Mining: Concepts and Techniques", 3rd ed., (Morgan Kaufmann, 2011) has been adopted popularly world-wide.