This is a position paper about the relations among artificial intelligence (AI), mathematical logic and the formalization of common-sense knowledge and reasoning. It also treats other problems of concern to both AI and philosophy. I thank the editor for inviting it. The position advocated is that philosophy can contribute to AI if it treats some of its traditional subject matter in more detail and that this will advance the philosophical goals also. Actual formalisms (mostly first order languages) for expressing common-sense facts are described in the references.
Common-sense knowledge includes the basic facts about events (including actions) and their effects, facts about knowledge and how it is obtained, facts about beliefs and desires. It also includes the basic facts about material objects and their properties.
One path to human-level AI uses mathematical logic to formalize common-sense knowledge in such a way that common-sense problems can be solved by logical reasoning. This methodology requires understanding the common-sense world well enough to formalize facts about it and ways of achieving goals in it. Basing AI on understanding the common-sense world is different from basing it on understanding human psychology or neurophysiology. This approach to AI, based on logic and computer science, is complementary to approaches that start from the fact that humans exhibit intelligence, and that explore human psychology or human neurophysiology.
This article discusses the problems and difficulties, the results so far, and some improvements in logic and logical languages that may be required to formalize common sense. Fundamental conceptual advances are almost certainly required. The object of the paper is to get more help for AI from philosophical logicians. Some of the requested help will be mostly philosophical and some will be logical. Likewise the concrete AI approach may fertilize philosophical logic as physics has repeatedly fertilized mathematics.
There are three reasons for AI to emphasize common-sense knowledge rather than the knowledge contained in scientific theories.
(1) Scientific theories represent compartmentalized knowledge. In presenting a scientific theory, as well as in developing it, there is a common-sense pre-scientific stage. In this stage, it is decided or just taken for granted what phenomena are to be covered and what is the relation between certain formal terms of the theory and the common-sense world. Thus in classical mechanics it is decided what kinds of bodies and forces are to be used before the differential equations are written down. In probabilistic theories, the sample space is determined. In theories expressed in first order logic, the predicate and function symbols are decided upon. The axiomatic reasoning techniques used in mathematical and logical theories depend on this having been done. However, a robot or computer program with human-level intelligence will have to do this for itself. To use science, common sense is required.
Once developed, a scientific theory remains imbedded in common sense. To apply the theory to a specific problem, common-sense descriptions must be matched to the terms of the theory. For example, does not in itself identify d as the distance a body falls in time t and identify g as the acceleration due to gravity. (McCarthy and Hayes 1969) uses the situation calculus discussed in that paper to imbed the above formula in a formula describing the common-sense situation, for example
Here x is the falling body, and we are presuming a language in which the functions height, time, etc. are formalized in a way that corresponds to what the English words suggest. s and s' denote situations as discussed in that paper, and F(s,s') asserts that the situation s' is in the future of the situation s.
(2) Common-sense reasoning is required for solving problems in the common-sense world. From the problem solving or goal-achieving point of view, the common-sense world is characterized by a different informatic situation than that within any formal scientific theory. In the typical common-sense informatic situation, the reasoner doesn't know what facts are relevant to solving his problem. Unanticipated obstacles may arise that involve using parts of his knowledge not previously thought to be relevant.
(3) Finally, the informal metatheory of any scientific theory has a common-sense informatic character. By this I mean the thinking about the structure of the theory in general and the research problems it presents. Mathematicians invented the concept of a group in order to make previously vague parallels between different domains into a precise notion. The thinking about how to do this had a common-sense character.
It might be supposed that the common-sense world would admit a conventional scientific theory, e.g. a probabilistic theory. But no one has yet developed such a theory, and AI has taken a somewhat different course that involves nonmonotonic extensions to the kind of reasoning used in formal scientific theories. This seems likely to work better.
Aristotle, Leibniz, Boole and Frege all included common-sense knowledge when they discussed formal logic. However, formalizing much of common-sense knowledge and reasoning proved elusive, and the twentieth century emphasis has been on formalizing mathematics. Some important philosophers, e.g. Wittgenstein, have claimed that common-sense knowledge is unformalizable or mathematical logic is inappropriate for doing it. Though it is possible to give a kind of plausibility to views of this sort, it is much less easy to make a case for them that is well supported and carefully worked out. If a common-sense reasoning problem is well presented, one is well on the way to formalizing it. The examples that are presented for this negative view borrow much of their plausibility from the inadequacy of the specific collections of predicates and functions they take into consideration. Some of their force comes from not formalizing nonmonotonic reasoning, and some may be due to lack of logical tools still to be discovered. While I acknowledge this opinion, I haven't the time or the scholarship to deal with the full range of such arguments. Instead I will present the positive case, the problems that have arisen, what has been done and the problems that can be foreseen. These problems are often more interesting than the ones suggested by philosophers trying to show the futility of formalizing common sense, and they suggest productive research programs for both AI and philosophy.
In so far as the arguments against the formalizability of common-sense attempt to make precise intuitions of their authors, they can be helpful in identifying problems that have to be solved. For example, Hubert Dreyfus (1972) said that computers couldn't have ``ambiguity tolerance'' but didn't offer much explanation of the concept. With the development of nonmonotonic reasoning, it became possible to define some forms of ambiguity tolerance and show how they can and must be incorporated in computer systems. For example, it is possible to make a system that doesn't know about possible de re/de dicto ambiguities and has a default assumption that amounts to saying that a reference holds both de re and de dicto. When this assumption leads to inconsistency, the ambiguity can be discovered and treated, usually by splitting a concept into two or more.
If a computer is to store facts about the world and reason with them, it needs a precise language, and the program has to embody a precise idea of what reasoning is allowed, i.e. of how new formulas may be derived from old. Therefore, it was natural to try to use mathematical logical languages to express what an intelligent computer program knows that is relevant to the problems we want it to solve and to make the program use logical inference in order to decide what to do. (McCarthy 1959) contains the first proposals to use logic in AI for expressing what a program knows and how it should reason. (Proving logical formulas as a domain for AI had already been studied by several authors).
The 1959 paper said:
The advice taker is a proposed program for solving problems by manipulating sentences in formal languages. The main difference between it and other programs or proposed programs for manipulating formal languages (the Logic Theory Machine of Newell, Simon and Shaw and the Geometry Program of Gelernter) is that in the previous programs the formal system was the subject matter but the heuristics were all embodied in the program. In this program the procedures will be described as much as possible in the language itself and, in particular, the heuristics are all so described.The main advantages we expect the advice taker to have is that its behavior will be improvable merely by making statements to it, telling it about its symbolic environment and what is wanted from it. To make these statements will require little if any knowledge of the program or the previous knowledge of the advice taker. One will be able to assume that the advice taker will have available to it a fairly wide class of immediate logical consequences of anything it is told and its previous knowledge. This property is expected to have much in common with what makes us describe certain humans as having common sense. We shall therefore say that a program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows.
The main reasons for using logical sentences extensively in AI are better understood by researchers today than in 1959. Expressing information in declarative sentences is far more modular than expressing it in segments of computer program or in tables. Sentences can be true in much wider contexts than specific programs can be useful. The supplier of a fact does not have to understand much about how the receiver functions, or how or whether the receiver will use it. The same fact can be used for many purposes, because the logical consequences of collections of facts can be available.
The advice taker prospectus was ambitious in 1959, would be considered ambitious today and is still far from being immediately realizable. This is especially true of the goal of expressing the heuristics guiding the search for a way to achieve the goal in the language itself. The rest of this paper is largely concerned with describing what progress has been made, what the obstacles are, and how the prospectus has been modified in the light of what has been discovered.
The formalisms of logic have been used to differing extents in AI. Most of the uses are much less ambitious than the proposals of (McCarthy 1959). We can distinguish four levels of use of logic.
1. A machine may use no logical sentences--all its ``beliefs'' being implicit in its state. Nevertheless, it is often appropriate to ascribe beliefs and goals to the program, i.e. to remove the above sanitary quotes, and to use a principle of rationality--It does what it thinks will achieve its goals. Such ascription is discussed from somewhat different points of view in (Dennett 1971), (McCarthy 1979a) and (Newell 1981). The advantage is that the intent of the machine's designers and the way it can be expected to behave may be more readily described intentionally than by a purely physical description.
The relation between the physical and the intentional descriptions is most readily understood in simple systems that admit readily understood descriptions of both kinds, e.g. thermostats. Some finicky philosophers object to this, contending that unless a system has a full human mind, it shouldn't be regarded as having any mental qualities at all. This is like omitting the numbers 0 and 1 from the number system on the grounds that numbers aren't required to count sets with no elements or one element. Indeed if your main interest is the null set or unit sets, numbers are irrelevant. However, if your interest is the number system you lose clarity and uniformity if you omit 0 and 1. Likewise, when one studies phenomena like belief, e.g. because one wants a machine with beliefs and which reasons about beliefs, it works better not to exclude simple cases from the formalism. One battle has been over whether it should be forbidden to ascribe to a simple thermostat the belief that the room is too cold. (McCarthy 1979a) says much more about ascribing mental qualities to machines, but that's not where the main action is in AI.
2. The next level of use of logic involves computer programs that use sentences in machine memory to represent their beliefs but use other rules than ordinary logical inference to reach conclusions. New sentences are often obtained from the old ones by ad hoc programs. Moreover, the sentences that appear in memory belong to a program-dependent subset of the logical language being used. Adding certain true sentences in the language may even spoil the functioning of the program. The languages used are often rather unexpressive compared to first order logic, for example they may not admit quantified sentences, or they may use a different notation from that used for ordinary facts to represent ``rules'', i.e. certain universally quantified implication sentences. Most often, conditional rules are used in just one direction, i.e. contrapositive reasoning is not used. Usually the program cannot infer new rules; rules must have all been put in by the ``knowledge engineer''. Sometimes programs have this form through mere ignorance, but the usual reason for the restriction is the practical desire to make the program run fast and deduce just the kinds of conclusions its designer anticipates. We believe the need for such specialized inference will turn out to be temporary and will be reduced or eliminated by improved ways of controlling general inference, e.g. by allowing the heuristic rules to be also expressed as sentences as promised in the above extract from the 1959 paper.
3. The third level uses first order logic and also logical deduction. Typically the sentences are represented as clauses, and the deduction methods are based on J. Allen Robinson's (1965) method of resolution. It is common to use a theorem prover as a problem solver, i.e. to determine an x such that P(x) as a byproduct of a proof of the formula . This level is less used for practical purposes than level two, because techniques for controlling the reasoning are still insufficiently developed, and it is common for the program to generate many useless conclusions before reaching the desired solution. Indeed, unsuccessful experience (Green 1969) with this method led to more restricted uses of logic, e.g. the STRIPS system of (Nilsson and Fikes 1971).
The commercial ``expert system shells'', e.g. ART, KEE and OPS-5, use logical representation of facts, usually ground facts only, and separate facts from rules. They provide elaborate but not always adequate ways of controlling inference.
In this connection it is important to mention logic programming, first introduced in Microplanner (Sussman et al., 1971) and from different points of view by Robert Kowalski (1979) and Alain Colmerauer in the early 1970s. A recent text is (Sterling and Shapiro 1986). Microplanner was a rather unsystematic collection of tools, whereas Prolog relies almost entirely on one kind of logic programming, but the main idea is the same. If one uses a restricted class of sentences, the so-called Horn clauses, then it is possible to use a restricted form of logical deduction. The control problem is then much eased, and it is possible for the programmer to anticipate the course the deduction will take. The price paid is that only certain kinds of facts are conveniently expressed as Horn clauses, and the depth first search built into Prolog is not always appropriate for the problem.
Even when the relevant facts can be expressed as Horn clauses supplemented by negation as failure, the reasoning carried out by a Prolog program may not be appropriate. For example, the fact that a sealed container is sterile if all the bacteria in it are dead and the fact that heating a can kills a bacterium in the can are both expressible as Prolog clauses. However, the resulting program for sterilizing a container will kill each bacterium individually, because it will have to index over the bacteria. It won't reason that heating the can kills all the bacteria at once, because it doesn't do universal generalization.
Here's a Prolog program for testing whether a container is sterile. The predicate symbols have obvious meanings.
not(P) :- P, !, fail. not(P).
sterile(X) :- not(nonsterile(X)). nonsterile(X) :- bacterium(Y), in(Y,X), not(dead(Y)). hot(Y) :- in(Y,X), hot(X). dead(Y) :- bacterium(Y), hot(Y). bacterium(b1). bacterium(b2). bacterium(b3). bacterium(b4). in(b1,c1). in(b2,c1). in(b3,c2). in(b4,c2). hot(c1).
Giving Prolog the goal sterile(c1) and sterile(c2) gives the answers yes and no respectively. However, Prolog has indexed over the bacteria in the containers.
The following is a Prolog program that can verify whether a sequence of actions, actually just heating it, will sterilize a container. It involves introducing situations analogous to those discussed in (McCarthy and Hayes 1969).
not(P) :- P, !, fail. not(P).
sterile(X,S) :- not(nonsterile(X,S)). nonsterile(X,S) :- bacterium(Y), in(Y,X), not(dead(Y,S)). hot(Y,S) :- in(Y,X), hot(X,S). dead(Y,S) :- bacterium(Y), hot(Y,S). bacterium(b1). bacterium(b2). bacterium(b3). bacterium(b4). in(b1,c1). in(b2,c1). in(b3,c2). in(b4,c2). hot(C,result(heat(C),S)).
When the program is given the goals sterile(c1,heat(c1,s0)) and sterile(c2,heat(c1,s0)) it answers yes and no respectively. However, if it is given the goal sterile(c1,s), it will fail because Prolog lacks what logic programmers call ``constructive negation''.
The same facts as are used in the first Prolog program can be expressed in in a first order language as follows.
and
However, from them we can prove sterile(a) without having to index over the bacteria.
Expressibility in Horn clauses, whether supplemented by negation as failure or not, is an important property of a set of facts and logic programming has been successfully used for many applications. However, it seems unlikely to dominate AI programming as some of its advocates hope.
Although third level systems express both facts and rules as logical sentences, they are still rather specialized. The axioms with which the programs begin are not general truths about the world but are sentences whose meaning and truth is limited to the narrow domain in which the program has to act. For this reason, the ``facts'' of one program usually cannot be used in a database for other programs.
4. The fourth level is still a goal. It involves representing general facts about the world as logical sentences. Once put in a database, the facts can be used by any program. The facts would have the neutrality of purpose characteristic of much human information. The supplier of information would not have to understand the goals of the potential user or how his mind works. The present ways of ``teaching'' computer programs by modifying them or directly modifying their databases amount to ``education by brain surgery''.
A key problem for achieving the fourth level is to develop a language for a general common-sense database. This is difficult, because the common-sense informatic situation is complex. Here is a preliminary list of features and considerations.
1. Entities of interest are known only partially, and the information about entities and their relations that may be relevant to achieving goals cannot be permanently separated from irrelevant information. (Contrast this with the situation in gravitational astronomy in which it is stated in the informal introduction to a lecture or textbook that the chemical composition and shape of a body are irrelevant to the theory; all that counts is the body's mass, and its initial position and velocity.)
Even within gravitational astronomy, non-equational theories arise and relevant information may be difficult to determine. For example, it was recently proposed that periodic extinctions discovered in the paleontological record are caused by showers of comets induced by a companion star to the sun that encounters and disrupts the Oort cloud of comets every time it comes to perihelion. This theory is qualitative because neither the orbit of the hypothetical star nor those of the comets are available.
2. The formalism has to be epistemologically adequate, a notion introduced in (McCarthy and Hayes 1969). This means that the formalism must be capable of representing the information that is actually available, not merely capable of representing actual complete states of affairs.
For example, it is insufficient to have a formalism that can represent the positions and velocities of the particles in a gas. We can't obtain that information, our largest computers don't have the memory to store it even if it were available, and our fastest computers couldn't use the information to make predictions even if we could store it.
As a second example, suppose we need to be able to predict someone's behavior. The simplest example is a clerk in a store. The clerk is a complex individual about whom a customer may know little. However, the clerk can usually be counted on to accept money for articles brought to the counter, wrap them as appropriate and not protest when the customer then takes the articles from the store. The clerk can also be counted on to object if the customer attempts to take the articles without paying the appropriate price. Describing this requires a formalism capable of representing information about human social institutions. Moreover, the formalism must be capable of representing partial information about the institution, such as a three year old's knowledge of store clerks. For example, a three year old doesn't know the clerk is an employee or even what that means. He doesn't require detailed information about the clerk's psychology, and anyway this information is not ordinarily available.
The following sections deal mainly with the advances we see as required to achieve the fourth level of use of logic in AI.