CIKM 2017 Tutorial: Construction and Querying of Large-scale Knowledge Bases

In today's computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media post, scientific publications, to a wide range of textual information from various domains (corporate reports, advertisements, legal acts, medical reports). How to turn such massive unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge to the research community.

In the first half of the tutorial, we introduce data-driven methods on mining structured facts (i.e., entities and their relations for types of interest) from massive text corpora to construct knowledge bases, with a focus on methods that are minimally-supervised, domain-independent, and language-independent for timely knowledge base construction across various application domains (news, social media, biomedical, business). In the second half of the tutorial, we discuss the challenges of querying large-scale knowledge bases, and give a systematic discussion on several emerging schema-agnostic querying paradigms for knowledge bases, including keyword query, graph query, natural language query (i.e., question answering), and query by example, which allows users to easily query knowledge bases without writing complex structured queries like SPARQL.

Xiang Ren1, Yu Su2, Xifeng Yan2

University of Southern California1, University of California, Santa Barbara2


  1. Overview of Knowledge Base Construction and Querying [slides]
  2. Effort-light Knowledge Base Construction [slides]
  3. Schema-agnostic Knowledge Base Querying [slides]
  4. Trends and research problems [slides]
[Full Slides]


  • Multi-tasking sequence labeling [project]
  • Learning with Heterogeneous Supervision [project]
  • Learning with Indirection Supervision [project]

  • Code & Data

  • Sequence Tagging: [LM-LSTM-CRF]
  • Phrase Mining: [AutoPhrase]
  • Entity Typing: [PLE] [AFET]
  • Relation Extraction: [ReHession] [ReQuest]
  • Co-extraction of Entities and Relations: [CoType]
  • Knowledge-based Question Answering: [GraphQuestions]

  • Publications


    Xiang Ren, Assistant Professor, Department of Computer Science, University of Southern California. His research focuses on creating computational tools for better understanding and exploring massive text data. He has published over 25 papers in major conferences. He received Google PhD Fellowship, KDD Rising Star by Microsoft, Yahoo! DAIS Research Excellence Award, C. W. Gear Outstanding Graduate Student Award by UIUC and Yelp Dataset Challenge Award. Mr. Ren has rich experiences in delivering tutorials in major conferences, including SIGKDD 2015, SIGMOD 2016 and WWW 2017.

    Yu Su, Ph.D. candidate, Department of Computer Science, University of California, Santa Barbara. His research interests lie in data mining and natural language processing, with a focus on understanding the interplay of natural and formal languages to increase the accessibility of structured data (e.g., knowledge bases, web tables, and relational databases) and services (e.g., web APIs). He has published over 10 papers on question answering, graph mining and querying, and crowdsourcing at major conferences including SIGKDD, WWW, EMNLP, and CIKM. He has interned in IBM T.J. Watson Research Center, Microsoft Research Redmond, and U.S. Army Research Laboratory.

    Xifeng Yan, Venkatesh Narayanamurti Chair Professor, Department of Computer Science, University of California, Santa Barbara. His research focuses on modeling, managing, and mining graphs in information networks, computer systems, social media and bioinformatics. His works were extensively referenced, with over 15,000 citations per Google Scholar and thousands of software downloads. He received NSF CAREER Award, IBM Invention Achievement Award, ACM-SIGMOD Dissertation Runner-Up Award, and IEEE ICDM 10-year Highest Impact Paper Award.