How to Add Your Own Learning System

Conventions

The benchmarking framework assumes to find for each tool to consider a folder named as the tool's identifier, i.e. ideally all lowercase, without whitespaces, in the learningsystems directory. There one can find the currently available learning systems:

$ ls learningsystems/
aleph  dllearner  funclog  golem  progol  progolem  README.md  toplog

To add a new tool just create a new directory under learningsystems named as your tool identifier, e.g. mytool. Inside the mytool directory there should be at least two executable files named run and validate. Furthermore, a file system.ini should be provided that specifies the language of the knowledge base and input examples.

Run the Learning System

The purpose of the run executable is to run your inductive learning tool, writing the learned hypotheses to a file. Currently, the expected parameters are:

$ ./run <config_file>

The config file contains information where the tool should store its output and which example files it should read.

An example of the content of the output file generated by the Golem ILP tool would be

active(A) :- carbon_5_aromatic_ring(A,[B,C,D,E,F]).
active(A) :- hetero_aromatic_5_ring(A,[B,C,D,E,F]), nitro(A,[F,G,H,I]).
active(A) :- nitro(A,[B,C,D,E]), phenanthrene(A,[[F,G,H,I,J,K],[L,M,N,O,P,Q],[R,S,T,U,V,W]]), bond(A,I,B,7), bond(A,X,I,7).

Currently there are no fixed specifications, how to store the learned results. However the content of this file should be processable in the validation step

Validation

The validation step is performed by the second executable, validate, which should be called as follows:

$ ./validate <config_file>

The config file contains information where the tool should store its output and which results input file and example files it should read.

This executable reads the results, loads the background knowledge of the considered learning task, and checks how many of the positive/negative examples of the considered learning problem are covered. Learning on OWL knowledge bases, this means utilizing an OWL reasoner to run instance checks on the learned DL concepts. In case of Prolog-based background knowledge a Prolog interpreter has to be executed to check how many of the positive and negative examples are covered. The output generated by the validate executable should be just four lines written to the <validation_output_file>: One line for the number of true positives, one for the number of false positives, one for the number of true negatives and one line for the number of false negatives. An example for the content of <validation_output_file> would be:

tp: 10
fp: 3
tn: 29
fn: 0

Tool Configuration Files

Tool-specific configuration settings are defined per learning problem and should be held in a file named like the tool identifier with the file suffix .conf, e.g. aleph.conf. Such a configuration file should be placed inside the considered learning problem directory. For example, the tool-specific configuration files of the Prolog-based tools for learning problem 42 the Mutagenesis learning task can be found here:

$ ls -1 learningtasks/mutagenesis/prolog/lp/42/*.conf
learningtasks/mutagenesis/prolog/lp/42/aleph.conf
learningtasks/mutagenesis/prolog/lp/42/funclog.conf
learningtasks/mutagenesis/prolog/lp/42/golem.conf
learningtasks/mutagenesis/prolog/lp/42/progol.conf
learningtasks/mutagenesis/prolog/lp/42/progolem.conf
learningtasks/mutagenesis/prolog/lp/42/toplog.conf

The framework will combine this file with the config file passed to the tool. The actual processing of settings made inside such a configuration file should be done by the run executable.

Tool Specific Data

If a learning task requires tool-specific data, e.g. specific mode declarations etc., these can be put into a directory named like the tool identifier residing inside the data directory of the corresponding learning task. An example for Aleph-specific data for the Mutagenesis task can be found here:

$ ls learningtasks/mutagenesis/prolog/data/aleph/
mode.pl

In case of Prolog-based learning tools such data files must have the file suffix .pl. For OWL-based learning tools this should be one of the standard file suffixes for the common serialization formats (.owl, .rdf, .xml, .nt, .ttl, ...).