Invited Talks

Effort-Light StructMine: Turning Massice Text Corpora into Structures

Selected Conference Tutorials

Constructing Structured Networks of Factual Knowledge from Massive Text Corpora

The success of data mining technology is largely attributed to the efficient and effective analysis of structured data. The construction of a well-structured, machine-actionable database from unstructured or loosely-structured data sources is often the premise of consequent applications. Although the majority of existing data generated in our society is unstructured, big data leads to big opportunities to uncover structures of real-world entities, attributes, relations from massive text corpora. By integrating these semantic-rich structures with other inter-related structured data (e.g., product specification, user transaction log), one can construct a powerful StructDB as a conceptual abstraction of the original text corpora. The uncovered StructDBs will facilitate browsing information and inferring knowledge that are otherwise locked in the text corpora. Computational machines can effectively perform algorithmic analysis at a large scale over these StructDBs, and apply the new insights and knowledge to improve human productivity in various downstream tasks.

Automatic Entity Recognition and Typing

In today's computerized and information-based society, we are constantly exposed to vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in different kinds of text data (especially in massive, domain-specific text data). These methods can automatically identify token spans as entity mentions in text and label their types (eg, people, product, organization) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.