This article was written in May 2003 for a special issue of IEEE Intelligent Systems, edited by Stephen Staab, but wasn't published in that issue due to a confusion over copyright permissions. It is slightly out of date - it refers to DAML and WOL rather than to OWL - but its main point still applies, and it is the most accurate summary of my views on the relevance of description logics to the semantic web.

The 'content language' mentioned herein is now a reality, at least in draft, as the Common Logic proposal being put through the ISO standardization process.


Catching the Dreams

By Pat Hayes

If the semantic web needed a symbol, a good one to use would be a Navaho dream-catcher: a small web, lovingly hand-crafted, charming to look at, and rumored to catch dreams; but really more of a symbol than a reality.

There are many visions of the semantic web, some of them more interesting, some more likely to make money, some more likely to happen in the near future. The excitement of these visions has attracted many people to the concept from a variety of different intellectual backgrounds - databases, logic programming, AI knowledge representation, description logics and programming languages, among others. The result is that there have been many different forces pulling the language designs in different directions.

On the whole, the description logics seem to be winning. OIL - arguably the first proposed web-based standard - and DAML are essentially the same language written in different syntactic forms, and they are both quintessential description logics. Now, description logics - DLs - have some very fine features. They can be seen as a kind of hybrid of industrial-strength data modelling tools with a limited form of conventional logics, located at a particularly nice place on the trade-off curve slung between the extremes of a highly expressive - but computationally intractable - full logic, and a highly efficient - but almost autistic - database notation. DLs have become a standard tool for professional ontology builders in industrial and commercial settings.

But is this kind of strength needed for the semantic web? My own view is that this expressiveness/efficiency tradeoff, that has dominated the professional ontology field's thinking for so long, is far less relevant to the semantic web vision - or at any rate, the most exciting versions of that vision - than it has been for the traditional tasks that ontologies have been designed and used for; and that the overhead required by DLs, particularly the conceptual overhead, is now a barrier and an impediment to progress.

Considered as content languages, description logics are like logics with safety guards all over them. They come covered with warnings and restrictions: you cannot say things of this form, you cannot write rules like that, you cannot use arbitrary disjunctions, you cannot use negation freely, you cannot speak of classes of literals, and so on. A beginning user might ask, why all the restrictions? It's not as if any of these things are mysterious or meaningless or paradoxical, so why can't I be allowed to write them down on my web page as markup? The answer is quite revealing: if we let you do that, you could write things that our reasoning engines might be unable to handle. As long as you obey our rules, we can guarantee that the inference engines will be able to generate the answers within some predetermined bounds. That is what DLs are for, to ensure that large-scale industrial ontologies can be input to inference machinery and it still be possible to provide a guarantee that answers will be found, that inferential search spaces will not explode, and in general that things will go well. Providing the guarantee is part of the game: DL's typically can be rigorously proven to be at least decideable, and preferably to be in some tractable complexity class.

There is also enough experience with deployed DL use to give our humble beginner some advice: instead of using negation, you can rephrase your problem in terms of disjointness of classes, and then you can do it this way...; or, instead of saying that a equals b (sorry, we can't let you use "equals", that is far too dangerous), you can say that the class whose members are a and b and nothing else has a cardinality of two... And so on. The result is that users of DAML+OIL need to take a course in how to say things in peculiar and unintuitive ways, because the safety guards prevent them from saying things naturally.

Now, this is not an insurmountable barrier to a determined professional user: it's not harder than learning, say, Applescript. Once you get used to the rather odd way of thinking, writing DAML+OIL can even be kind of fun. But it is a huge barrier to widespread acceptance of a web language for markup; and, more to the point, it is fundamentally unnecessary. The semantic web doesnt need all these DL guards and limitations, because it doesn't need to provide the industrial-quality guarantees of inferential performance. Using DLs as a semantic web content markup standard is a failure of imagination: it presumes that the Web is going to be something like a giant corporation, with the same requirements of predictability and provable performance. In fact (if the SW ever becomes a reality) it will be quite different from current industrial ontology practice in many ways. It will be far 'scruffier', for a start; people will use ingenious tricks to scrape partly-ill-formed content from ill-structured sources, and there is no point in trying to prevent them doing so, or tutting with disapproval. But aside from that, it will be on a scale that will completely defeat any attempt to restrict inference to manageable bounds. If one is dealing with 10|9 assertions, the difference between a polynomial complexity class and something worse is largely irrelevant. And, further, almost all of this content will be extremely simple and shallow, seen from a logical perspective. Worrying about the complexity class of the few intricate ontologies on the web is like being obsessed with the quality of the salt in a supermarket. It is notable that almost all of the DAML so far written uses only a small part of the vocabulary of the language, and is almost entirely concerned with simple class inheritance. Constructs like daml:minCardinalityQ (a restriction on a property defining the class of things which have a minimum number of values of that property in another class... what? Yes, precisely my point) are rarely, if ever, used.

If the entire world were happy using description logics, then carping would be irrelevant. But it is not. The limitations of DAML are already a burden to progress, before the language has even been seriously deployed. The DAML-S effort to express services in DAML is chafing at the expressive limitations it imposes, and efforts to develop a 'rules' extension for DAML are being stymied by the methodological requirement, imposed by the description logicians, that any ability to add rules that would increase the expressiveness too far would run the risk of allowing people to say too much.

It may be worth making this point in some detail. Like many other academic research fields, description logics have their own 'ground rules'. One of the basic assumptions of work in this field is that full logical expressiveness is to be avoided at all costs. (If one is trying to find the low point of the expressiveness/efficiency curve, then one place to definitely avoid is the far left-hand end, since we know that is as high as it can get.) But this reaction seems ludicrous when it is used to reject what would be otherwise quite reasonable proposals. For example, it is easy to imagine what an RDF rules language would be like; one could just marry together a Prolog-style Horn-clause reasoner with an RDF triples engine. Several people have already written such programs and they are in routine use in research settings. So why the delay? Because these allow one to express arbitrary logical implications. That sounds to me (and to logical programmers) like a plus, but to someone trained in the description logic world, this is a cardinal sin. Some way must be found to limit, constrain, or otherwise box in, such an ability; if we allow this kind of expressiveness to leak out, then there is no telling what our inference engines might do. The proper reaction is to agree, but learn to be happy about it. Indeed, there would be no guarantees that answers will always come back, or that inference engines will never time-out. But one should not expect such global guarantees on the web. If the semantic web becomes real, then the economic pressure on both content providers and content users will be quite sufficient to ensure that practical methods will be found to avoid a state of permanent disaster. We do not need to worry about protecting the integrity of our theoretical guarantees before the business even gets started, particularly when those worries are impeding progress.

I think that what the semantic web needs is two rather different things, put together in a new way. It needs a content language whose sole function is to express, transmit and store propositions in a form that permits easy use by engines of one kind and another. There is no need to place restrictions or guards on this language, and it should be compact, easy to use, expressive and syntactically simple. The W3C basic standard is RDF, which is a good start, but nowhere near expressive enough. The best starting-point for such a content language is something like a simple version of KIF, though with an XML-style syntax instead of KIF's now archaic (though still elegant) LISP-based format. Subsets of this language can be described which are equivalent to DLs, but there really is no need to place elaborate syntactic boundaries on the language itself to prevent users from saying too much. Almost none of them will, in any case.

An aside on logic. There is a widespread misapprehension that logic is 'difficult' - like calculus is supposed to be in American high schools. In fact, basic logic is easier to use and understand than description logics; it has a simpler syntax, it has simpler inference processes, and it is closer to natural language. While there are some subtle aspects of logic, one is not obliged to use them or even to consider them.

The second thing that the semantic web needs is a programming language; or perhaps even a suite of tools in an existing programming language, for manipulating the content. The current DAML/OIL/WOL standards get these two aspects jumbled up with one another: the content is all tangled up with limitations that are in place to protect the code (which is hidden inside the inference engines, but still needs protecting.) What we need to do it find a way to give the code to the world as well as the content, so that the planet-wide community of programmers can get started on making ingenious tools to manipulate content. I confess that I do not know how to do this, but I am sure we are going about it the wrong way at present. And if anyone has any good ideas, I'd love to hear about them.

Pat Hayes

Florida IHMC