of 6

Observation language

Observation language Hendrik Blockeel Katholieke Universiteit Leuven, Belgium Leiden Institute of Advanced Computer Science, The Netherlands Synonyms Instance language. Definition The observation language
0 views6 pages
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Documenttranscript
Observation language Hendrik Blockeel Katholieke Universiteit Leuven, Belgium Leiden Institute of Advanced Computer Science, The Netherlands Synonyms Instance language. Definition The observation language used by a machine learning system is the language in which the observations it learns from are described. Motivation, background Most machine learning algorithms can be seen as a procedure for deriving one or more hypotheses from a set of observations. Both the input (the observations) and the output (the hypotheses) need to be described in some particular language. This language is respectively called the observation language or the hypothesis language. These terms are mostly used in the context of symbolic learning, where these languages are often more complex than in subsymbolic or statistical learning. The following sections describe some of the key observation languages. Attribute-value learning Probably the most used setting in machine learning is the attribute-value setting. Here, an example (observation) is described by a fixed set of attributes, each of which is given a value from the domain of the attribute. Such an observation is often called a vector, or, in relational database terminology, a tuple. The attributes are usually atomic (i.e., not decomposable in component values) and single-valued (i.e., an attribute has only one value, not a set of values). So we have an instance space (or space of observations) O = A 1 A n, 1 elements of which are denoted using an observation language that typically has the same structure: L O = L A1 L An (the language contains tuples of objects that represent the attribute values). The attribute-value framework easily allows for both supervised and unsupervised learning; in the supervised learning setting, the label of an instance is simply included as an attribute in the tuple, while for unsupervised learning it is excluded. The attribute-value setting assumes that all instances can be represented using the same fixed set of attributes. When instances can be of different types or are variable-sized (e.g., when an instance is set-valued), this assumption may not hold, and more powerful languages may have to be used instead. Learning from graphs, trees or sequences We here consider the case where a single instance is a graph, or a node in a graph. Note that trees and sequences are special cases of graphs. A graph is defined as a pair (V, E) where V is a set of vertices and E a set of edges, each edge being a pair of vertices. If the pair is ordered, the graph is directed, otherwise it is undirected. For simplicity we will here restrict ourselves to undirected graphs. A graph can in practice not be encoded in attribute-value format without loss of information. That is, one could use a number of properties of graphs as attributes in the encoding, but several graphs may then still map onto the same representation, which implies loss of information. In theory, one could imagine defining a total order on (certain classes of) graphs and representing each graph by its rank in that order (which is a single numerical attribute), thus representing graphs as numbers without loss of information ; but then it is not obvious how to map patterns in this numerical representation to patterns in the original representation. No such approaches have been proposed up till now. Describing the instance space is more difficult here than in the attribute value case. Consider a task of graph classification, where observations are of the form (G, y) with G a graph and y a value for a target attribute Y. Then we can define the instance space as O = {(V, E) V N E V 2 } Y where N is the set of all natural numbers. (For each graph, there exists a graph defined over N that is isomorphic with it, so O contains all possible graphs up to isomorphism.) A straightforward observation language in the case of graph classification is then {(G, y) G = (V, E) V L V E V 2 y Y } where L V is some alphabet for representing nodes. In learning from graphs, there are essentially two settings: those where a prediction is made for entire graphs, and those where a prediction is made for 2 single nodes in a graph. While in the first case observations are of the form (G, y), in the second case they are of the form (G, v, y) where G = (V, E) and v V. That is, a node is given together with the graph in which it occurs (its environment ), and a prediction is to be made for this specific node, using the information about its environment. In many cases, the set of observations one learns from is of the form (G, v i, y i ) where each instance is a different node of exactly the same graph G. This is the case when, for instance, classifying web pages, if we take the whole web as their environment. In a labelled graph, labels are associated with each node or edge. Often these are assumed atomic, being elements of a finite alphabet or real numbers, but they can also be vectors of reals. Relational learning In relational learning, it is assumed that relationships may exist between different instances of the instance space, or an instance may internally consist of multiple objects among which relationships exist. This essentially corresponds to learning from graphs, except that in a graph only one binary relation exists (the edges E), whereas here there may be multiple relations and they may be non-binary. The expressiveness of the two settings is the same, however, as any relation can be represented using only binary relations. In the attribute-value setting, one typically uses one table where each tuple represents all the relevant information for one observation. In the relational setting, there may be multiple tables and information on a single instance is contained in multiple tuples, possibly belonging to multiple relations. Example 1 Assume we have a database about students, courses and professors (see Figure 1). We can define a single observation as all the information relevant to one student, that is: the name, year of entrance, etc. of the student, but also the courses they take and the professors teaching these courses. The most obvious link to the graph representation is as follows: create one node for each tuple, labelled with that tuple, and create a link between two nodes if the corresponding tuples are connected by a foreign key relationship. Defining a single observation as a set of tuples that are connected through foreign keys in the database corresponds to representing each observation (G, v, y) as (G, v, y) where G is the connected component of G that contains v. The actual links are usually not explicitly written in this representation, as they are implicit: there is an edge between two tuples if they have the same value for a foreign key attribute. Inductive logic programming In inductive logic programming, a language based on first order logic is used to represent the observations. Typically, an observation is then represented 3 Anne Bernard Daniel Elisa Fabian Anne Anne Bernard Algebra Biology 1998 A 1998 B 2000 A B 2000 B 1998 A Algebra Biology Adams Adams Baeck Cools Cools Algebra Biology Adams Baeck Cools Figure 1: A small database of students. by a ground fact, which basically corresponds to a single tuple in a relational database. In some settings an observation is represented by an interpretation, a set of ground facts, which corresponds to the set of tuples mentioned in the previous subsection. While the target variable can always be represented as an additional attribute, ILP systems often learn from examples and counterexamples of a concept. The target variable is then implicit: it is true or false depending on whether the example is in the positive or negative set, but it is not explicitly included in the fact. Typical for the inductive logic programming setting is that the input of a system may contain, besides the observations, background knowledge about the application domain. The advantage of the ILP setting is that no separate language is needed for such background knowledge: the same first order logic based language can be used for representing the observations as well as the background knowledge. Example 2 Take the following small dataset: sibling(bart,lisa). sibling(lisa,bart). :- sibling(bart, bart). :- sibling(lisa, lisa). father(homer, bart). mother(marge, bart). father(homer, lisa). mother(marge, lisa). 4 There are positive and negative (preceded by :-) examples of the Sibling relation. The following hypothesis might be learned: sibling(x,y) :- father(z,x), father(z,y), X Y. sibling(x,y) :- mother(z,x), mother(z,y), X Y. If the following clauses as included as background knowledge: parent(x,y) :- father(x,y). parent(x,y) :- mother(x,y). then the same ILP system might learn the following more compact definition: sibling(x,y) :- parent(z,x), parent(z,y), X Y. Recommended Reading Most of the literature on hypothesis and observation languages is found in the area of inductive logic programming. Excellent starting points to become familiar with this field are Relational Data Mining by Lavrač and Džeroski [3] and Logical and Relational Learning by De Raedt [2]. De Raedt [1] compares a number of different observation and hypothesis languages with respect to their expressiveness, and indicates relationships between them. References [1] L. De Raedt. Attribute-value learning versus inductive logic programming: the missing links (extended abstract). In D. Page, editor, Proceedings of the Eighth International Conference on Inductive Logic Programming, volume 1446 of Lecture Notes in Artificial Intelligence, pages 1 8. Springer-Verlag, [2] L. De Raedt. Logical and Relational Learning. Springer, [3] S. Džeroski and N. Lavrač, editors. Relational Data Mining. Springer-Verlag, See also: hypothesis language, inductive logic programming, relational learning 5 Definitions Instance Space Given a learning task, the instance space is the set of all possible instances that can be part of the input of the learner. In practice it is determined by the observation language used by the learner. 6
Advertisement
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x