Blog Posts Process Management Process Modeling

Keynote from Google Research on Building Knowlege Bases at #ICWE2016

I report here some highlights of the keynote speech by Xin Luna Dong at the 16th International Conference on Web Engineering (ICWE 2016). Incidentally, she is now moving to Amazon for starting a new project on building an Amazon knowledge base.
Building knowledge bases still remains a challenging task.
First, one has to decide how to build the knowledge: automatically or manually?
A survey in 2014 reported the following list of large efforts in knowledge building: the top 4 approaches are manually curated, the bottom 3 are automatic.
Google’s knowledge vault and knowledge Graph are the big winners in terms of volume.
When you move to long tail content, curation does not scale. Automation must be viable and precise.
This is in line with our own research line we are starting on Extracting Changing Knowledge (we presented a short paper at a Web Science 2016 workshop last month). Here is a summary of our approach:
On the Quest for Changing Knowledge. Capturing emerging entities from social media. WebScience 2016 DDI
Where knowledge can be extracted from? In Knowledge Valut:

Knowledge Vault is a matrix based approach to knowledge building, with rows = entities and columns= attributes.

It assumes the entities to be available (e.g. in Freebase), and builds a training over that.
One can build KBs by building buckets of triples, with similar probability of being correct. It’s important to precisely estimate correctness probability.
Errors can include mistakes on:
  • triple identification
  • entity linkage
  • predicate linkage
  • source data

Besides general purpose KBs, Google built lightweight vertical knowledge bases (more than 100 available now).

When extracting knowledge, the ingredients are: datasource, extractor approach, the data items themselves, facts and their probability of truth.

Several models can be used for extracting knowledge. Two extremes of the spectrum are:

  1. Single-truth model. Every fact has only one truth. We trust the value of the highest number of datasources.
  2. Multilaeyer model. separates source quality from extractor quality and data errors from extraction errors. One can build a knowledge-based trust model, defining trustworthiness of web pages. One can compare this measure with respect to page rank of web pages:

In general, the challenge is to move from individual information and data points, to integrated and connected knowledge. Building the right edges is really hard though.
Overall, a lot of ingredients influence the correctness of knowledge: temporal aspects, data source correctness, capability of extraction and validation, and so on–

In summary: Plenty of research challenges to be addressed, both by the datascience and modeling communities!

To keep updated on my activities you can subscribe to the RSS feed of my blog or follow my twitter account (@MarcoBrambi).

Leave a Comment

Get the BPI Web Feed

Using the HTML code below, you can display this Business Process Incubator page content with the current filter and sorting inside your web site for FREE.

Copy/Paste this code in your website html code:

<iframe src="https://www.businessprocessincubator.com/content/keynote-from-google-research-on-building-knowlege-bases-at-icwe2016-2/?feed=html" frameborder="0" scrolling="auto" width="100%" height="700">

Customizing your BPI Web Feed

You can click on the Get the BPI Web Feed link on any of our page to create the best possible feed for your site. Here are a few tips to customize your BPI Web Feed.

Customizing the Content Filter
On any page, you can add filter criteria using the MORE FILTERS interface:

Customizing the Content Filter

Customizing the Content Sorting
Clicking on the sorting options will also change the way your BPI Web Feed will be ordered on your site:

Get the BPI Web Feed

Some integration examples

BPMN.org

XPDL.org

×