“The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.[…] [It] isn’t inherently complex. The Semantic Web language, at its heart, is very, very simple. It’s just about the relationships between things. [Its] real power will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs.”
—Tim Berners-Lee, “Scientific American”, 2001
Almost two decades ago, in 2001, the concept of the Semantic Web, aka Web 3.0, was first introduced by Tim Berners-Lee, inventor of the World Wide Web. The core idea was to provide machine-processible content and data representation that adheres to strict content strategic building blocks to help process, structure and share online information better. The Semantic Web essentially allows for the connection of information using a network that can be easily read by machines.
I am a strong believer in a content-first approach in web development. Content is key and good content needs to be shared. Today, that has become best practice in building content-heavy websites. But did the flow of information with the evolution of the Semantic Web get streamlined and improved? Has the Semantic Web succeeded in harvesting, cataloging and sharing the vast amounts of data and content we produce? Has the Semantic Web succeeded?
Already in 2006, Berners-Lee and colleagues stated, “This simple idea…remains largely unrealized”. And by 2013, only about four million web domains had contained Semantic Web markup out of the 271 million registered top-level domains back then. That said, today, many bits and pieces of the core idea behind the Semantic Web are very much alive and well and have become the standard.
A Brief Look Back
At its outset, the Semantic Web was meant to help structure content by describing their exact meaning and hierarchy in the content flow by labeling it at the backend, the programming side of a website, consistently and uniformly throughout. The goal was to form a sort of globally linked database, by using for example attributes like article, sidebar, section, summary, main content, author. Formerly unstructured content wouldn’t just live on its own on your site but could be processed and linked with similar data worldwide.
The Semantic Web was meant to be largely decentralized, inclusive and inter-operable and with the proper checks and balances in place. It became a huge blessing, a revolutionary overhaul of information and curated content, an example of freedom of knowledge and a bastion of interactivity. Maybe that was too much to ask. You see, I am still pessimistic when it comes to believing in human restraint and maturity in the quest for the “common good” and a universal truth. At least until I am proved wrong.
Back to Reality
What we have now is what Sinclair Target has called on his blog 0b10 as being “stuck with giant, centralized repositories of information. […] The Semantic Web we were promised has yet to be delivered. […] Today, work on the Semantic Web seems to have petered out.” Others have described it bluntly “as dead as last year’s roadkill.”
What the Semantic Web did provide us with, however (and which I, as a content strategist and web developer, much appreciate), are universal tools to describe and thereby structure content and their relationship to provide a clear content strategy for its users. It puts emphasis on the importance of content and the metadata behind it and gives us a clear content-first roadmap when designing websites. For a content-first designer, that is a huge plus. But for that to work, the Semantic Web needs the collaborative contribution of everyone involved in shaping it. The technological pieces have been assembled over the years and live in various applications—but they don’t yet work universally in tandem.
“Communication can only take place when a common language exists,” writes the data scientist Kurt Cagle. But have we succeeded? We’ve created bits and pieces, but how will they, or should they, work together?
What Tech Can and Can’t Provide
Technically, the road to the Semantic Web—the extension of the current web, adding support for data bases in machine-readable form and the ability to merge them—is by now a done deal. With uniformed syntaxes compiled by a common set of ontology model repositories like the Resource Description Framework (RDF), Shema.org (started in 2011 by Google and Microsoft to help create rich SEO-friendly search snippets), Facebook’s OpenGraph protocol and the Web Ontology Language (OWL)—paired with Extensible Markup Language (XML) and later JSON-LD as well as artificial intelligence, digital signatures and other attendant technologies—the Semantic Web has taken off to become the go-to model for enhanced information distribution and inference (the ability to derive new data from data already known).
Nothing can stop new technologies, and what once took decades to build can be changed, rebuilt or overhauled in no time. Berners-Lee stressed that if the past was document sharing, the future will be data sharing, or, in his words, applying “linked data”. How would that work? First, a URL should point to the data; second, anyone accessing the URL should get data back; third, relationships in the data should point to additional URLs with data. Sounds simple enough.
But a partial deployment of these technologies has left behind the majority of older web pages and the technologies needed to build the Semantic Web, its protocols, concepts and syntaxes, would have to become much more user-friendly, intuitive and affordable to take part in this global interactivity. The basic idea behind the Semantic Web, and the essence for it to work, is that everyone voluntarily uses the new set of standards to annotate their web content. But Uncle Herb and Aunt Charlotte (who might have a lot of information and data on their personal websites) do not (yet?) knowingly utilize semantic ontologies, convert their pages to XHTML let alone build RDF schemata around semantic metadata. There is a need to create a technology for page markup that is easy to grasp; and XML definitely is not.
Enter web design platforms that help DIY users create websites built on the latest technologies, which are, however, much less designed to be customizable. One size does not fit all when it comes to your structured metadata. And do these platforms alert users of the underlying potential of any hidden misleading metadata meant to draw clicks and gather data, and then let them take charge? Not even close.
The Semantic Web: Business as Usual?
There is still little consensus about the direction the Semantic Web should take and which ontology models to use. The technologies are all in place, however, and nothing can stop their implementation and their growth. The Semantic Web has become highly profitable.
Economically, the implementation of new Web 3.0 technologies, incl. software tools, validators, converters etc., has given a huge boost in business for IT conglomerates and their stock holders. But the Semantic Web should not only be clever but useful, engaging in an open, community-driven effort. Old standards should not become obsolete. Since new technologies force the average users to become even more dependent on online one-size-fits-all corporate web tools, open source applications, social content sharing platforms and website editors, those tools must become commoditized, affordable, transparent—and secure.
Politically, the Semantic Web could force web users to get more involved. Governmental portals, partisan sites, PACs and independent watchdogs—geared to the individual political and local needs of each user, highly customizable and cross-checked on the Semantic Web and supplied with supporting data—can be a useful, rich machine-processible repository of information. Unbiased, independently verified information can give users more reliable information and easily accessible tools that will help them stay politically in the know. One can hope. But in the wrong hands, it could corrupt and cause real harm.
The Semantic Web: Political Tool?
Increasing the web’s intelligibility is a compelling vision, but in the wrong hands could turn into Big Virtual Brother and a data security nightmare. Computers will extract data/text from different sources, comb them and “draw conclusions”. But they can do that only after humans have embedded their own ideas, connotations, interests, prejudices and issues as metadata and tags to their content. People might have attached different meanings to the same content snippet because of their worldview, language and culture.
Suppose you were looking for data on falling birth rates in the U.S. and your search agent would alert you, after supplying the requested cross-checked data, to the home page of a militant pro-life organization showing gruesome pictures of aborted fetuses (because meta-tags on the site matched the ontology and tagged them as “birth rates”). The search agent software presumed that the birth rates fell because more women chose to abort. Who is to say that non-profits, political and grassroots organizations with their own agenda won’t find a way to guide or manipulate how a search agent or spider combs the web? They already do that with skewed SEO practices. And knowing that we are now surfing the Semantic Web may mislead us to believe that search results are complete, true, unbiased, cross-checked and the best possible match.
The same goes for censorship: Countries like China and Iran will not hesitate to continue screening what their citizens are allowed to see. Maybe the new technologies will make their task easier? Or maybe they will feel coerced into becoming part of this “single global system” and decide to cut off their citizens from the web altogether.
And what about security issues and privacy? If every piece of information and data bit is accessible and can easily be retrieved and vetted, who is to say that this new data will not fall in the wrong hands? And can we really rely on the “Web of Trust”, or trust Berners-Lee’s “Oh, yeah?” button?
The Semantic Web = Global Truth?
Relying too heavily on meta-data, ontologies, taxonomy, deductive reasoning, text strings, logic and RDF “triples”, which are similar to syllogisms (“Humans are mortal, Greeks are human, Greeks are mortal”), in an attempt to combine and extract information could leave ample room for illogical (and thus useless) assumptions and false conclusions that cannot be caught by machines (“Tekla lives in New York. New Yorkers have a New York accent. Tekla has a New York accent”—which, by the way, is false, but the computer wouldn’t know). Information derived from an illogical assumption would mislead us in our quest “to learn the global truth”. And we wouldn’t even be aware of that flaw, because we assumed the completeness and correctness of all data. We’d be too complacent to realize that there is not one unambiguous, universally agreed-on truth.
I’d be suspicious to be force-fed information. I prefer to get some of the facts—as limited and as irrelevant as they may be—then do my own cross-checking, add my own experiences and then draw my own conclusions, rather than receiving information that someone else wants me to read or that has received the most “thumbs up” from others.
Global Cross-Linking: Blessing in Disguise
However, my argument against human cross-linking of information could also eventually turn out to be a blessing. Maybe, without being provided with automatic electronic cross-checking of the entire Semantic Web, the user would have screened out unpleasant counterarguments of a debate on a topic that he/she would never had thought or wanted to look at (in my above example, an anti-abortion stance)? We tend to look for validation of preconceived assumptions in our searches to reinforce our views and bask in cognitive dissonance while surfing the web. So instead of looking for information by scanning for what we believe in and adhere to, metadata cross linking could expose us to other points of view and information that we didn’t know we didn’t know.
Will the Semantic Web bring us the ultimate information overload? Big data overkill? Already now, we frequently don’t see the forest for the trees. With the Semantic Web firmly in place, will we be forced to know all about each and every forest? And do we really want to become dependent on a “comprehensive meaning system, a universal encyclopedia and world brain” (as prophesized by H.G. Wells in 1938)?
We’re beaming with pride, because we perceive ourselves as so much smarter than past generations. We mock the old and the slow and get bored when asked to think or to listen for too long. Our attention span is getting shorter by the day. We disguise our conformism with ever-changing fashions and trends. Like a herd of cattle, we follow commercial- (and technical) dictates. The so-called “global village” and the search for “one truth” will leave even less space for our individuality, our creativity, our flaws, our eagerness to reveal what goes on behind the façade, far from, or against, the mainstream and our inflated ego.
The Semantic Web is Alive and Well. Sort Of.
So, is the Semantic Web dead? Not really. It is “an ideal, not a killer technology by itself,” writes Rashif Ray Rahman. “There is no one killer application either, but many past, present and future innovations. It exists through the standards and technologies that are the culmination of various focused initiatives. In other words, in 2018, the Semantic Web is right where we need it to be.”
“Turns out that the engineers and developers have moved on and created their own solutions, bypassing many of the lessons we have learned, because we stubbornly refused to acknowledge the amount of research needed to turn our theories into practice,” write Ruben Verborgh and Miel Vander Sander from Ghent University in Belgium. “Since we seemingly did not want the web, more pragmatic people took over. And if we are honest, can we blame them? Clearly, the world will not wait for us. Let us not wait for the world.”
Part of that process, however, will have to be perusing the past for a few answers, regardless of Berners-Lee’s statement in One Small Step for the Web…(Sept. 28, 2018) that “the future is still so much bigger than the past”. In 1930 the German writer Rudolf Arnheim wrote: “Human beings will come to confuse the world perceived by their senses and the world interpreted by thought”. And he predicted: “They will believe that seeing is understanding.”
Arnheim meant us, human beings. What about the machines, the chatbots, block chains and AIs that “see” our content, based on their “semantic knowledge”? Will they understand?
More on the topic:
The Semantic Web (Tim Berners-Lee, Scientific American, 2001)
A “More Revolutionary Web” (Victoria Shannon, The New York Times, 2006)
Machines Beat Humans On a Reading Test. But Do They Understand? (Quantum Magazine, 2019)
What Web 3.0 Means for Data Collection and Security (David Howell, ITPro, 2018)
The Semantic Web Comes of Age (Kurt Cagle, Forbes, 2018)
Whatever Happened to the Semantic Web? (0b10, 2018)
The Semantic Web: Where Is It Now? (Rashif Ray Rahman, Medium, 2018)
Semantic Web and Semantic Technology in 2019 (Jennifer Zaino, Dataversity, 2019)
The Semantic Web Identity Crisis: In Search of the Trivialities That Never Were (Ruben Verborgh & Miel Vander Sande, Ghent University, Belgium, 2019)