My research spans the areas of data mining, machine learning and natural language processing, focusing on making sense of massive text corpora.

My dissertation research is on constructing structured networks of factual knowledge from unstructured text corpora, to support data exploration ("Find the natural disasters happened in Asia Pacific area in 2016."), power intelligent systems ("Where did the terrorist attacks happen? What organizations were involved?"), and facilitate knowledge discovery ("Is bacteria X or gene Y a potential cause of the disease Z?").

[Effort-Light StructMine]  State-of-the-art information extraction (IE) systems have strong reliance on large amounts of task-specific labeled data for training supervised models (e.g., deep neural networks). In practice, the scale and efficiency of such a manual curation process are rather limited, especially when dealing with text corpora of various kinds. A crucial question that runs through my research is: how to design a generic solution to the efficient construction of customized machine-learning models for given text corpora, without explicit human labeling effort.

Here are some representative work on addressing above question:

I gave tutorials at KDD, WWW and CIKM on Knowledge Base Construction and Querying. Check out my Research Statement for more details.

[Impact]  Systems and algorithms we developed were successful in different domains and disciplines: knowledge graph produced by Life-iNet system are used for literature search and drug repurposing; our entity extraction system was shipped as parts of the productions in Microsoft Bing and U.S. Army Research Lab; our phrase mining tool won the grand prize of Yelp Dataset Challenge in 2015, and was adopted by TripAdvisor; our survival prediction algorithm was the task 1 winner of the Prostate Cancer DREAM challenge. Check out our event exploration system on protest-related news corpora.