Wikipedia “Knows” more than it “Tells”
Blog: Strategic Structures
When pointing out the benefits of Linked Data, I’m usually talking about integrating data from heterogeneous sources in a way that’s quite independent of the local schemas and not fixed to past integration requirements. But even if we take a single data source, and a very popular one, Wikipedia, it’s easy to demonstrate what the web of data can bring that the web of documents can’t.
In fact, you can do it yourself in less than two minutes. Go to the page of Ludwig Wittgenstein. At the bottom of the infobox on the right of the page, you’ll find the sections “Influences” and “Influenced”. The first one contains the list (of links to the Wikipedia pages) of people that influenced Wittgenstein, and the second – those that he influenced. Expand the sections and count the people. Depending on when you are doing this, you might get a different number, but if you are reading this text by the end of 2017, you are likely to find out that, according to Wikipedia, Wittgenstein was influenced by 18 and influenced 32 people, respectively.
Now, if you look at the same data source, Wikipedia, but viewed as Linked Data, you’ll get a different result. Try it yourself by clicking here or use this link:
The influencers are 19 and the influenced are 95 at the moment of writing this post, or these numbers if you click now.
Note that the query is taking the data from the actual Wikipedia, not from the dump used in the regular DBpedia. Using the same data source as document web and as Semantic Web gives us different results. It turns out that Wikipedia “knows” more than it “tells” if asked properly.
Of course, Wikipedia can improve the application logic updating the pages, but that would be a local patch, while the logic is already in the data and there is no need for specific rules to be added, which would serve one use case but would not be able to envisage many others.
And the result you got from DBpedia-live could be just a starting point for exploring the knowledge graph. Click on one of the results and choose how you want to browse. If you prefer visual exploration, LodLive would bring a nice experience, but for faster browsing use one of the other options.
Or, using relFinder, you can check what goes on between two nodes from the two columns, for example:
Or, you might want to rank all influencers of Wittgenstein by their influence, counting also the influence of the influenced and their influence on till the last known by Wikipedia. We might call this the “reach” of the influencers of Wittgenstein. And for the top ten, we get this. This is another thing that Wikipedia knows, but wouldn’t tell.
Now, if you are concerned about a corporate application landscape, imagine how many things your applications know but wouldn’t tell due to limitations of their application logic built to meet certain historical requirements, and how many of them know only part of the answer. To get a complete and accurate one takes investing in interfaces, data warehouses, MDM systems, data lakes and various new and fancy, but in most cases proprietary methods of data integration and governance.