WWW 2017 Tutorial: Contructing Structured Information Networks from Massive Text Corpora

Time: 9-10:30am & 11-12:30pm, April 3.

Location: Meeting Room 10.

The success of data mining technology is largely attributed to the efficient and effective analysis of structured data. The construction of a well-structured, machine-actionable database from unstructured or loosely-structured data sources is often the premise of consequent applications. Although the majority of existing data generated in our society is unstructured, big data leads to big opportunities to uncover structures of real-world entities, attributes, relations from massive text corpora. By integrating these semantic-rich structures with other inter-related structured data (e.g., product specification, user transaction log), one can construct a powerful StructNet as a conceptual abstraction of the original text corpora. The uncovered StructNets will facilitate browsing information and inferring knowledge that are otherwise locked in the text corpora. Computational machines can effectively perform algorithmic analysis at a large scale over these StructNets, and apply the new insights and knowledge to improve human productivity in various downstream tasks.

Xiang Ren (xren7@illinois.edu), Meng Jiang, Jingbo Shang, Jiawei Han

University of Illinois at Urbana Champaign

Outline & Slides

  1. Introduction [PDF]
  2. Part I: Quality phrase mining: An overview and data-driven approaches [PDF]
  3. Part II: Entity and Relation typing: An overview and a joint typing approach [PDF]
  4. Part III: Attribute discovery for network construction [PDF]
  5. Summary and future directions [PDF]
[PDF] for entire tutorial.

Code & Systems

[CoType] [AFET] [PLE] [ClusType] [AutoPhrase] [SegPhrase][TopMine] [MetaPAD]

Publications


Presenters


Xiang Ren, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research focuses on creating computational tools for better understanding and exploring massive text data. He has published over 25 papers in major conferences. He received Google PhD Fellowship in Structured Data and Database Management in 2016, KDD Rising Star by Microsoft Academic Search in 2016, C. W. Gear Outstanding Graduate Student Award by CS@Illinois in 2016, and Yahoo!-DAIS Research Excellence Award in 2015. Mr. Ren has rich experiences in delivering tutorials in major conferences, including SIGKDD 2015, SIGMOD 2016 and WWW 2016.

Meng Jiang, Postdoctoral Research Associate, Department of Computer Science, Univ.\ of Illinois at Urbana-Champaign. His research focuses on behavioral modeling and social media analysis. He got his Ph.D. of Computer Science from Tsinghua University, Beijing in 2015. His Ph.D. thesis won the Dissertation Award at Tsinghua. His recent research won the SIGKDD 2014 Best Paper Finalist. His ICDM 2015 Tutorial won the honorarium.

Jingbo Shang, Ph.D. candidate, Department of Computer Science, Univ.\ of Illinois at Urbana-Champaign. His research focuses on mining and constructing structured knowledge from massive text corpora. He is the recipient of Computer Science Excellence Scholarship and Grand Prize of Yelp Dataset Challenge in 2015.

Jiawei Han, Abel Bliss Professor, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research areas encompass data mining, data ware-housing, information network analysis, and database systems, with over 600 conference and journal publications. He is Fellow of ACM and Fellow of IEEE, and received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), and IEEE Computer Society W. Wallace McDowell Award (2009). His co-authored textbook "Data Mining: Concepts and Techniques", 3rd ed., (Morgan Kaufmann, 2011) has been adopted popularly world-wide.