Sample solutions and discussion Perl Expert Quiz of The Week #18 (20040630) You will implement a simple, interactive inference engine that understands statements about categories, and can answer questions about what it was told. The engine should understand statements of the following forms: All PLURAL-NOUN are PLURAL-NOUN. No PLURAL-NOUN are PLURAL-NOUN. Some PLURAL-NOUN are PLURAL-NOUN. Some PLURAL-NOUN are not PLURAL-NOUN. The engine should be able to answer questions of the following forms: Are all PLURAL-NOUN PLURAL-NOUN? Are no PLURAL-NOUN PLURAL-NOUN? Are any PLURAL-NOUN PLURAL-NOUN? Are any PLURAL-NOUN not PLURAL-NOUN? Describe PLURAL-NOUN. Here's an example dialog, with user input after the ">" prompts: > All mammals are hairy animals. OK. > All dogs are mammals. OK. > All beagles are dogs. OK. > Are all beagles hairy animals? Yes, all beagles are hairy animals. > All cats are mammals. OK. > All cats are hairy animals. I know. > Are all cats dogs? I don't know. > No cats are dogs. OK. > Are all cats dogs? No, not all cats are dogs. > Are no cats dogs? Yes, no cats are dogs. > All mammals are dogs. Sorry, that contradicts what I already know. > Some mammals are brown animals. OK. > Are any mammals dogs? Yes, some mammals are dogs. > Are any dogs brown animals? I don't know. > Some dogs are brown animals. OK. > All brown animals are brown things. OK. > Are any dogs brown things? Yes, some dogs are brown things. > Describe dogs. All dogs are mammals. All dogs are hairy animals. No dogs are cats. Some dogs are beagles. Some dogs are brown animals. Some dogs are brown things. > Are all goldfish mammals? I don't know anything about goldfish. ---------------------------------------------------------------- The inference engine described by this quiz deals with the four kinds of categorical propositions. As can be seen in the example dialog, the engine uses the interpretation of categorical propositions described by Aristotle. This interpretation defines the propositions as follows (shown with their traditional labels): (A) All A's are B's. = Everything that is an A is a B. (E) No A's are B's. = Everything that is an A is not a B. (I) Some A's are B's. = There is at least one A that is a B. (O) Some A's are not B's. = There is at least one A that is not a B. The possible inferences in this model are described by what is known as the "traditional square of opposition," which states: Contradiction: If A, then not E. If E, then not A. Subalternation: If A, then I. If not I, then not A. If E, then O. If not O, then not E. Contrary: If A, then not E. If E, then not A. Subcontrary: If not I, then O. If not O, then I. As can be seen by the definitions of I and O, the Aristotelian interpretation of categorical propositions expects that all statements are made about things that exist. "There is at least one A that is a B" implies "There is at least one A" and "There is at least one B," as does "There is at least one A that is not a B." Given this assumption, it is valid to conclude from the statement "All dogs are mammals" that "Some mammals are dogs," via subalternation. When discussing a category with no existing instances, however, some of the inferences in this model are invalid. "All unicorns are horned animals" would imply that "Some horned animals are unicorns," or "There is at least one horned animal that is a unicorn," which is false. To address the concern over empty terms, a new interpretation was developed in the 19th century that redefined the universal categorical propositions as follows: (A) All A's are B's. = No members of set A are outside of set B. (E) No A's are B's. = No members of set A are inside set B. With the new interpretation, a universal statement makes no claims about the existence of members of a set. As such, the rules of subalternation, contrary and subcontrary do not apply. The only valid inferences from the four kinds of propositions are those of contradiction. These inferences are known as the "modern square of opposition." With the modern interpretation, it is possible for both A and E to be true, and both I and O to be false. "All unicorns are horned animals" and "No unicorns are horned animals" are both true statements, because there are no unicorns. The respective contradictions "Some unicorns are not horned animals" and "Some unicorns are horned animals" are simultaneously false. The Aristotelian interpretation was chosen for this exercise because the resulting inference engine is still interesting despite the limitation that statements ought not to be about empty sets. Limiting such an engine to inferences of contradiction made for an uninteresting project, and a fancier engine that accepted more explicit statements about sets (and whether or not they were empty) seemed out of scope for Perl QOTW. Further concepts of inference are required for the inference engine. Converse statements are statements about which B's are A's that can be directly inferred from statements about which A's are B's: Some A's are B's. -> Some B's are A's. No A's are B's. -> No B's are A's. Direct inferences that deal with negated sets (term complements) are not likely to be useful for this exercise. It is true that, if "No dogs are cats," then "All dogs are non-cats," but the engine is not expected to say or understand statements about non-cats. The engine need not make inferences of obversion or contraposition ("All dogs are mammals," therefore "All non-mammals are non-dogs"). The quiz description leaves one issue with the parser left as an open question, which was noted in discussion. Noun phrases with multiple words are potentially ambiguous in the question forms where two noun phrases are next to each other. One parsing strategy is to assume the first word belongs to the first term and the rest to the second, and if one of the terms is not found in the database, move a word from the second to the first and try again. This strategy has two problems. For one, error messages are potentially unclean: If the two phrases together are "A B C" and both "A" and "A B" are known noun phrases, but neither "B C" nor "C" are known, then it isn't clear which error message to give. More notably, some questions simply cannot be asked. If the two phrases together are "A B C" and "A", "A B", "B C", and "C" are all known nouns, questions of which "A B" are "C" would be parsed as which "A" are "B C". One solution is to prompt the user for clarification of ambiguous input. Rick Measham suggested supporting double-quotes around noun phrases, or using different question forms that clearly deliniated the two phrases, as possible solutions. I thought it would be a fun word puzzle to come up with interesting examples in English that would be ambiguous in this way. Yitzchak Scott-Thoennes suggested "Are all ladies fair ladies?" as potentially ambiguous. Rick Measham suggested "Are all governors general governors?" The example I prepared for this write-up took a similar tactic: "Are all attourneys general anesthetics?" Four people submitted solutions to this quiz. Javier Merino: http://perl.plover.com/~alias/list.cgi?1:msp:1833 Randy W. Sims: http://perl.plover.com/~alias/list.cgi?1:mss:1833 Mark Jason Dominus: http://perl.plover.com/~alias/list.cgi?1:mss:1834 Julien Quint: http://perl.plover.com/~alias/list.cgi?1:mss:1835 MJD gets the extra-special award for tracking whether or not categories are known to have instances that exist, and providing useful, accurate dialog in either case. It also supports adjective phrases ("Some cats are black") by assuming that plural noun phrases that end in "things" are actually adjectives ("Black things are dark"). And it's less than 150 lines. Julien's solution meets the requirements, prompts the user on ambiguous input, and stores all inferences for quick retrieval (via a "flatten" subroutine). The feedback is especially helpful: If all dogs are mammals, the question "Are any dogs mammals?" results in "Yes, all dogs are mammals." In contrast to Julien's, the solution I prepared only stores direct statements and conversions, and make all inferences by recursively traversing the data structure when a question is asked. This method is quite inefficient given the requirements of the quiz description, but an earlier draft of this quiz had loftier goals, including a "save" routine that would save the knowledge base to disk using as few statements as possible, and the ability to make corrections to existing data and remain logically consistent. (Neither of these fancy features are represented in my solution below.) References: * "The Traditional Square of Opposition," The Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/entries/square/ * "In Defense of Bramantip," Kelley L. Ross. http://www.friesian.com/syllog.htm (Thanks Geoff Rommel.) * "The Paradox of the Ravens," Andrew McMillan. http://www.paradoxes.info/TheRavens.html (Thanks David Landgren.) - - - #!/usr/bin/perl -w # Infer, a simple inference engine # By Dan Sanderson # For Perl Quiz of the Week use strict; use Term::ReadLine; my ($OK, $KNOWN, $CONFLICT, $UNKNOWN, $NO, $YES, $NONE, $ALL, $SOME, $SOME_NOT) = (0..9); my %typewords = (all=>[$ALL,$NONE], some=>[$SOME,$SOME_NOT], any=>[$SOME,$SOME_NOT], no=>[$NONE,$ALL]); my $data = {}; my $seenSets = {}; sub state { my ($type, $first, $second) = @_; my $test = ask($type, $first, $second); return $KNOWN if ($test == $YES); return $CONFLICT if ($test == $NO); $data->{$first}->{$second} = $type; if (($type==$ALL || $type==$SOME) && !defined($data->{$second}->{$first})) { state($SOME, $second, $first); } elsif ($type == $NONE) { state($NONE, $second, $first); } else { $data->{$second}->{''} = 0; } return $OK; } sub ask { my ($qtype, $first, $second, $yesonly) = @_; my $result; $seenSets = {} if (!defined($yesonly) || !$yesonly); $seenSets->{$first} = 1; if ($first eq $second) { return $YES if ($qtype == $ALL || $qtype == $SOME); return $NO; } my $immediate = $data->{$first}->{$second}; if ((defined($immediate) && $immediate == $qtype) || ($qtype==$SOME && ask($ALL, $first, $second, 1)==$YES) || ($qtype==$SOME_NOT && ask($NONE, $first, $second, 1)==$YES) ) { $result = $YES; } else { for my $k (keys(%{$data->{$first}})) { next if (!$k || defined($seenSets->{$k})); my $ftok = $data->{$first}->{$k}; if (($qtype==$ALL && $ftok==$ALL && ask($ALL, $k, $second, 1)==$YES) || ($qtype==$SOME && $ftok==$SOME && ask($ALL, $k, $second, 1)==$YES) || ($qtype==$SOME_NOT && $ftok==$SOME && ask($NONE, $k, $second, 1)==$YES) || ($ftok==$ALL && ask($NONE, $k, $second, 1) == $YES) ) { $result = $YES; last; } } } if (!$yesonly && !defined($result)) { $seenSets = {$first => 1}; if (($qtype == $ALL && ask($SOME_NOT, $first, $second, 1) == $YES) || ($qtype == $SOME && ask($NONE, $first, $second, 1) == $YES) || ($qtype == $SOME_NOT && ask($ALL, $first, $second, 1) == $YES) || ($qtype == $NONE && ask($SOME, $first, $second, 1) == $YES) ) { $result = $NO; } } return defined($result) ? $result : $UNKNOWN; } sub cmd_are { my ($t) = @_; $t =~ s/^\s*(\S+)\s+//; my $types = $typewords{lc($1)}; my ($type, $first, $second, $found, $last_unknown); if ($t =~ /^(.*)\s+not\s+(.*)$/i) { ($type, $first, $second) = ($types->[1], $1, $2); } else { my @words = split(' ', $t); for (my $i=0; $i < ($#words); $i++) { $first = join(' ', @words[0..$i]); $second = join(' ', @words[($i+1)..$#words]); if (exists($data->{$first}) && exists($data->{$second})) { $found = 1; last; } if (exists($data->{$first})) { $last_unknown = $second; } elsif (exists($data->{$second})) { $last_unknown = $first; } } if (!$found) { if ($last_unknown) { print "I don't know anything about $last_unknown.\n"; } else { print "I don't know anything about either of those things, or I don't understand the question.\n"; } return; } $type = $types->[0]; } my $answer = ask($type, $first, $second); if ($answer == $YES) { print 'Yes, ', ($type == $ALL ? 'all ' : $type == $SOME || $type == $SOME_NOT ? 'some ' : 'no '), $first, ' are ', $type == $SOME_NOT ? 'not ' : '', $second, ".\n"; } elsif ($answer == $NO) { print 'No, ', ($type == $ALL ? 'not all ' : $type == $SOME ? 'no ' : $type == $SOME_NOT ? 'all ' : 'some '), $first, ' are ', $second, ".\n"; } else { print "I don't know.\n"; } } sub print_statement { print ''.(($_[0] == $ALL ? 'All' : $_[0] == $SOME || $_[0] == $SOME_NOT ? 'Some' : 'No') . ' ' . $_[1] . ' are ' . ($_[0] == $SOME_NOT ? 'not ' : '') . $_[2] . ".\n"); } sub cmd_describe { my ($thing) = @_; (print "I don't know anything about $thing.\n") and return if (!exists($data->{$thing})); my ($k, @alls, @somes); my %seen_supersets = ($thing => 1); my $superset_href = $data->{$thing}; for $k (keys(%$superset_href)) { next if (!$k); print_statement($superset_href->{$k}, $thing, $k); if ($superset_href->{$k} == $ALL) { push @alls, $k; } elsif ($superset_href->{$k} == $SOME) { push @somes, $k; } } for my $t ([$ALL, \@alls], [$SOME, \@somes]) { my @queue = (@{$t->[1]}); while ($k = shift(@queue)) { if (!exists($seen_supersets{$k})) { $seen_supersets{$k} = 1; for my $sk (keys(%{$data->{$k}})) { if (!exists($seen_supersets{$sk}) && $data->{$k}->{$sk} == $ALL) { print_statement($t->[0], $thing, $sk); push @queue, $sk; } } } } } } sub cmd_state { my $res; ($_[1] =~ /^(.*)\s+are\s+not\s+(.*)$/i) and $res = state($_[0][1], $1, $2); ($_[1] =~ /^(.*)\s+are\s+(.*)$/i) and $res = state($_[0][0], $1, $2); print "I don't understand.\n" and return if (!defined($res)); print "OK.\n" and return if ($res == $OK); print "I know.\n" and return if ($res == $KNOWN); print "Sorry, that conflicts with what I already know.\n"; } my %cmds = (quit => sub { exit(0); }, are => \&cmd_are, describe => \&cmd_describe, all => sub { cmd_state($typewords{all}, $_[0]) }, some => sub { cmd_state($typewords{some}, $_[0]) }, no => sub { cmd_state($typewords{no}, $_[0]) } ); print "Infer, a simple inference engine.\nType 'quit' to quit.\n"; my $term = Term::ReadLine->new('Infer'); while (1) { my $cmdline = $term->readline('> '); next if (!$cmdline); $cmdline =~ s/^\s*(\S+)\s*//; my $firstword = lc($1); $cmdline =~ s/[\.\?\!]*\s*$//; $cmdline =~ s/\s+/ /g; eval { die if (!exists($cmds{$firstword})); $cmds{$firstword}->($cmdline); }; print "I don't understand.\n" if ($@); } __END__