Analytical Learning
Content

Introduction




Difference between Inductive and Analytical Learning
Learning with Perfect Domain Theories: PROLOG-EBG
Remarks on Explanation-Based Learning
Summary
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Introduction




Explanation is used to distinguish the relevant features of the
training examples from the irrelevant ones, so that the examples
can be generalised
Prior knowledge and deductive reasoning is used to augment the
information provided by the training examples
Prior knowledge is used to reduce the complexity of hypothesis
space
Assumption: learner's prior knowledge is correct and complete
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Introduction 2
Example: Learn to recognise important classes of games




Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Goal: Recognise chessboard positions in
which black will lose its queen within two
moves
Induction can be employed <=>
Problem: thousands of training examples
similar to this one are needed
Suggested target hypothesis:
board position in which the black king
and queen are simultaneously attacked
Not suggested: board position in which
four white pawns are still in their original
location
Introduction 3




Explanations of human beings provide the information needed to
rationally generalise from details
Prior knowledge: e.g. knowledge about the rules of chess: legal
moves, how is the game won, ...
Given just this prior knowledge it is possible in principle to
calculate the optimal chess move for any board position <=>
in practice it will be frustratingly complex
Goal: learning algorithm that automatically constructs and learns a
move from such explanations
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Difference between Inductive and
Analytical Learning


Analytical learning methods seek a hypothesis that fits the learner's prior
knowledge and covers the training examples
Explanation based learning is a form of analytical learning in which the learner
processes each new training example by



Explaining the observed target value for this example in terms of the domain theory
Analysing this explanation to determine the general conditions under which the
explanation holds
Refining its hypothesis to incorporate these general conditions
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Difference between Inductive and
Analytical Learning (2)

Difference: They assume two different formulations of the learning
problem:


Inductive learning:
input: hypothesis space H + set of training examples D = x1,f  x1  ,..., x n ,f  x n 



output: hypothesis h, that is consistent with these training examples
Analytical learning:
input: hypothesis space H + set of training examples D = x1,f  x1  ,..., x n ,f  x n 
+ domain theory B consisting of background knowledge (used to explain
the training examples)
output: hypothesis h, that is consistent with both the training examples D and
the domain theory B
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2

Difference between Inductive and
Analytical Learning (3)

Illustration:
f  xi  is True if x i is a situation in which black will lose its queen
within two moves and False otherwise
H: set of Horn-clauses where predicates used by the rules refer to the
position or relative position of specific pieces
B: formalisation of the rules of chess
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2

Given:


New Example
Instance space X: Each instance describes a pair of objects represented by the
predicates Type, Color, Volume, Owner, Material, Density and On.
Hypothesis space H: Each hypothesis is a set of Horn clauses.
The head of each clause is a literal of the target predicate SafeToStack. The body of
each Horn clause is a conjunction of literals based on the same predicates used to
describe
the instances + LessThan|Equal|GreaterThan + function: plus|minus|times


Target concept:
SafeToStack  x,y  Volume  x,vx   Volume  y,vy  LessThan  vx,vy
Training examples:
On(Obj1, Obj2)
Type(Obj1, Box)
Type(Obj2, Endtable)
Color(Obj1, red)
Color(Obj2, Blue)
Volume(Obj1, 2)
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Owner (Obj1, Fred)
Owner (Obj2, Louise)
Density(Obj1, 0.3)
Material (Obj1, Carboard)
Material (Obj1, Wood)
New Example 2

Domain Theory B:
SafeToStack  x, y  ¬Fragile  y
SafeToStack  x, y  Lighter  x, y 
Lighter  x, y  Weight  x, wx   Weight  y, wy  LessThan  wx, wy
Weight  x, w   Volume  x, v   Density  x, d   Equal  w,   v, d  
Weight  x,5  Type  x,Endtable
Fragile  x   Material  x,Glass 

Determine:

A hypothesis from H consistent with the training examples and the
domain theory
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Content

➔
Introduction
Learning with Perfect Domain Theories: PROLOG-EBG




An Illustrative Trace
Remarks on Explanation-Based Learning
Explanation-Based Learning of Search Control Knowledge
Summary
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Learning with Perfect Domain Theories:
PROLOG-EBG



A domain theory is said to be correct if each of its assertions is a
truthful statement about the world
A domain theory is complete with respect to a given target concept
and instance space, if the domain theory covers every positive
example in the instance space.
It is not required that the domain theory is able to prove that negative
examples do not satisfy the target concept.
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Learning with Perfect Domain Theories:
PROLOG-EBG (2)
Question:
The learner had a perfect domain theory, why would it need to learn?
Answer:




There are cases in which it is feasible to provide a perfect domain theory
It is unreasonable to assure that a perfect domain theory is available.
A realistic assumption is that plausible explanations based on imperfect
domain theories must be used, rather than exact proofs based on perfect
knowledge.
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Learning with Perfect Domain Theories:
PROLOG-EBG (3)

PROLOG-EBG (Kedar-Cabelli and McCarthy 1987)



Sequential covering algorithm
When given a complete and correct domain theory, the method is
guaranteed to output a hypothesis (set of rules) that is correct and
that covers the observed positive training examples
Output: set of logically sufficient conditions for the target concept,
according the domain theory
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Learning with Perfect Domain Theories:
PROLOG-EBG (4)


Repeatedly: Domain theory is correct and complete this explanation constitutes a proof
that the training examples satisfy the target concept
PROLOG-EBG(TargetConcept, TrainingExamples, Domain Theory)

LearnedRules   

Pos  the positive examples from TrainingExamples

for each PositiveExample in Pos that is not covered by LearnedRules do
Explain
Explanation  an explanation (proof) in terms of the DomainTheory that
PositiveExample satisfies the TargetConcept
 Analyse
Sufficient Condition  the most general set of features of PositiveExample
sufficient to satisfy the TargetConcept according to the Explanation
 Refine
LearnedRules  LearnedRules + NewHornClause,
where NewHornClause is of the form TargetConcept
SufficientConditions
In general there may be multiple possible explanations Any or all of the explanations may be
used.Gabriella
Explanation
is generated
Kókai: Maschine
Learningusing backward chaining search as performed by PROLOG.


Lehrstuhl für Informatik 2
An Illustrative Trace (2)


The imprtant question in the generalising-process:
Of the many features that happen to be true of the current training
example, which ones are generally relevant to the target concept?
Explanation constructs the answer:
Precisely the features mentioned in the explanation
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
An Illustrative Trace (3)

General rule justified by the domain theory:
SafeToStack  x,y  Volume  x,2  Density  x,0.3  Type  y,Endtable
leaf node in the proof tree expects Equal(0.6,times(2,03)
and LessThan(0.6,5)
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
An Illustrative Trace (4)






Explanation of the training example forms the proof for the correctness of this rule
PROLOG-EBG computes the most general rule that can be justified by the
explanation, by computing the weakest preimage of the explanation
Definition: The weakest preimage of a conclusion C with respect to a proof P is the
most general set of initial assertions A, such that A entails C according to P.
Example:
SafeToStack  x, y   Volume  x, vx   Density  x,dx   Equal  wx, times  vx,dx  
LessThan  wx,5  Type  y,Endtable
PROLOG_EBG computes the weakest preimage of the target concept with respect to
the explanation, using a general procedure called regression
Regression: go iteratively backward through the explanation,


first computing the weakest preimage of the target concept with respect to the final proof
step in the explanation
Computing the weakest preimage of the resulting expressions with respect to the
proceeding step and so on
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
An Illustrative Trace (5)
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
An Illustrative Trace 5
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Content


➔
Introduction
Learning with Perfect Domain Theories: PROLOG-EBG
Remarks on Explanation-Based Learning


Discovering new features
Summary
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Remarks on Explanation-Based Learning

Key properties:






PROLOG-EBG produces justified general hypotheses by using prior
knowledge to analyse individual examples
The explanation about the way how an example satisfies the target concept
determines which example attributes are relevant: the ones mentioned by the
explanation
Regressing the target concept to determine its weakest preimage with
respect to the explanation allows deriving more general constraints on the
values of relevant features
Each learned Horn clause corresponds to a sufficient condition for satisfying
the target concept
The generality of the learned Horn clauses will depend on the formulation of
the domain theory and on the sequence in which the training examples are
considered
Implicitly assumes that the domain theory is correct and complete
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Remarks Explanation-Based Learning 2

Related perspectives to help to understand its capabilities and
limitations:


EBL as theory guided generalisation of examples:
Rational generalisation from examples allows to avoid the
bounds on sample complexity that occured in pure inductive
learning
EBL as example guided reformulation of theories:
Method for reformulating the domain theory into more
operational form: Creating rules that:


Deductively follow the domain theory
Classify the observed training examples in a single inference step
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Remarks Explanation-Based Learning 3

Related perspectives to help to understand its capabilities and
limitations:

EBL is „just“ restating what the learner already „knows“:
In what sense does this quality help to learn then?


Knowledge reformulation: In many tasks the difference between what one knows in
principle and what one can efficiently compute in practice may be great
Situation: Complete perfect domain theory is already known to the (human) learner,
and further learning is „simple“! So it's a matter of reformulating this knowledge
into a form in which it can be used more effectively to select appropriate moves.
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Remarks Explanation-Based Learning 4

Knowledge Compilation: EBL involves reformulating the domain
theory to produce general rules that classify examples in a single
inference step
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Discovering new features




Interesting capability: Ability to formulate new features that are
not explicitly in the description of the training examples but that
are needed to describe the general rule underlying the training
examples
This „feature“ is similarly represented by the hidden units of neural
networks
Like the BACKPROPAGATION algorithm, PROLOG_EBG
automatically formulates such features in its attempt to fit the
training data
BUT:



In neural networks it's developed in a statistical process
PROLOG-EBG it's derived in an analytical process
Example: derives the feature Volume  Density > 5
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Content



➔
Introduction
Learning with Perfect Domain Theories: PROLOG-EBG
Remarks on Explanation-Based Learning
Summary
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2
Summary

PROLOG-EBG







Uses first order Horn clauses in its domain theory and in its learned hypotheses
The explanation is a PROLOG proof
The hypothesis extracted from the explanation is the weakest preimage of this proof
Analytical learning methods construct useful intermediate features as
a side effect of analysing individual training examples.
Other deductive learning procedures can extend the deductive closure
of their domain.
PRODIGY and SOAR have demonstrated the utility of explanation
based learning methods for automatically acquiring effective search
control knowledge that speeds up problem solving
Disadvantage: purely deductive implementations such as PROLOGEBG produce a correct output if the domain theory is also correct
Gabriella Kókai: Maschine Learning
Lehrstuhl für Informatik 2

Analytical Learning - Lehrstuhl für Informatik 2 (Programmiersysteme)