Examples

KGP comes with a set of example problems that can be used as guidelines for how to implement your own problems as well as to test the implementation.

This section will provide some information about each example problem and its implementation.

For each example the data used to train the model is described. U[a, b, c] is c uniform random samples drawn from a to b inclusive. E[a, b, c] is a set of points evenly spaced with an interval of c from a to b inclusive.

The examples are included in the JAR file which can be downloaded from here and can be run at the command-line using the following command:

java -cp KGP-1.0-SNAPSHOT.jar kgp.examples.<PROBLEM>

where <PROBLEM> is one of the problems given below:

  • BasicRegressionProblem
  • QuarticPolynomial
  • Keijzer6
  • Korns12
  • Nguyen7
  • Pagie1
  • Vladislavleva4

Basic Regression Problem

BasicRegressionProblem is a simple example that is used in the Usage section to demonstrate how to use KGP. As results for this example problem have been previously detailed, we will omit them in this section.

The target function is given in Equation 1. The problem’s inputs are constrained to the range \([-1, 1]\).

\(x_0^2 - x_1^2 + x_1 - 1 \tag{1}\)

The table below details the parameters that are used for this problem.

Parameter Value
Training Data U[-1, 1, 100]
Tree Depth 5
Generation Method Half and Half
Number of Features 2 (\(x_0, x_1\))
Constants \([-1, 0, 1]\)
Population Size 1000
Generations 500
Tournament Size 20
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x_0, x_1\} \cup constants\)
Functions \(\{+, -, \times, \div\}\)
Fitness Function MAE
Stopping Threshold 0.01

Quartic Polynomial

The Quartic Polynomial function is defined by Equation 2.

\(x^4 + x^3 + x^2 + x \tag{2}\)

A small amount of training data is given for this function but a reasonably sufficient function set leads to good results. Also, the inclusion of 1 and 2 as constants works well (it seems they are combined to form the powers of each term).

Parameter Value
Training Data E[-1, 1, 20]
Tree Depth 5
Generation Method Half and Half
Number of Features 1 (\(x\))
Constants \([1, 2]\)
Population Size 300
Generations 100
Tournament Size 3
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x\} \cup constants\)
Functions \(\{+, -, \times, \div, -x, sin, cos\}\)
Fitness Function MSE
Stopping Threshold 0.01

Keijzer 6

This problem and the subsequent problems are taken from White et al. (2013) [1]. The target function for this problem is given by Equation 3.

\(\sum_{i=1}^{x} \frac{1}{i} \tag{3}\)

As the value of the \(x\) variable increases so too does the problem difficulty.

Parameter Value
Training Data E[1, 50, 1]
Tree Depth 5
Generation Method Half and Half
Number of Features 1 (\(x\))
Constants \([0, 1]\)
Population Size 1000
Generations 100
Tournament Size 20
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x\} \cup constants\)
Functions \(\{+, -, \times, \div, \frac{1}{x},\) \(\sqrt{}\}\)
Fitness Function MSE
Stopping Threshold 0.01

Korns 12

Korns 12 has the target function given in Equation 4.

\(2 - 2.1 \times cos(9.8 \times x_0) \times sin(1.3 \times x_4) \tag{4}\)

Interestingly, this function has 5 input variables but only 2 influence the output of the function: the aim here is for the system to be able to discard unimportant variables. A large amount of training data is used for this problem, to test that the system does not use unimportant variables to over fit the data.

The use of a fitness function which applies parsimony pressure seems to improve the accuracy of the solutions, due to longer solutions with higher fitness being promoted rather than longer but worse solutions.

Although the average fitness is lower with a parsimony aware fitness function, the solutions that are found seem to consist of a large amount of nodes making them hard to interpret.

Parameter Value
Training Data U[-50, 50, 10000]
Tree Depth 5
Generation Method Half and Half
Number of Features 5 (\(x_0, x_1, x_2, x_3, x_4\))
Constants Randomly generated value
Population Size 100
Generations 50
Tournament Size 20
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x_0, x_1, x_2, x_3, x_4\} \cup\) \(constants\)
Functions \(\{+, \times, \frac{1}{x}, -x,\) \(\sqrt{}\}\)
Fitness Function MSE
Stopping Threshold 0.01

Vladislavleva 4

Also known as the UBall5D (Five-dimensional unwrapped ball) problem, this function has the equation as given in Equation 5.

\(\frac{10}{5 + \sum_{i=1}^{5} (x_i - 3)^2} \tag{5}\)

Alternatively, the contour plot below shows the target function in its testing region with \(x0\) and \(x1\) being uniformly distributed in the test space and \(x2 = x3 = x4 = 0\).

_images/UBall5D.png
Parameter Value
Training Data U[0.05, 6.05, 1024]
Tree Depth 5
Generation Method Half and Half
Number of Features 5 (\(x_0, x_1, x_2, x_3, x_4\))
Constants Randomly generated value
Population Size 500
Generations 100
Tournament Size 20
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x_0, x_1, x_2, x_3, x_4\} \cup\) \(constants\)
Functions \(\{+, -, \times, \div, square\}\)
Fitness Function MSE
Stopping Threshold 0.01

Nguyen 7

The Nguyen 7 problem is given in Equation 6. The problem features the natural log function which no other problems include.

\(ln(x + 1) + ln(x^2 + 1) \tag{6}\)

KGP seems to be able to find good solutions for this problem and out of interest a few example solutions have been plotted below along with the program predicted for that solution. The programs have been converted to an imperative format for easier interpretation:

_images/Nguyen1.png
ln(ln((2.0 ^ x)) + (((x * 2.0) * (x ^ 1.0)) + ((1.0 * 1.0) + (x * x))))
_images/Nguyen2.png
ln(((ln(x) / (1.0 / 1.0)) / ln(x)) + (x * ((x ^ x) + (x * 2.0))))
Parameter Value
Training Data U[0, 2, 20]
Tree Depth 5
Generation Method Half and Half
Number of Features 1 (\(x\))
Constants \([1, 2]\)
Population Size 500
Generations 100
Tournament Size 20
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x\} \cup constants\)
Functions \(\{+, -, \times, \div, sin, cos, exp,\) \(ln\)
Fitness Function MSE
Stopping Threshold 0.01

Pagie 1

The Pagie 1 problem is given by Equation 7.

\(\frac{1}{1 + x^{-4}} + \frac{1}{1 + y^{-4}} \tag{7}\)

This problem can be easily scaled to make it more difficulty by adding higher dimensions to the equation.

Parameter Value
Training Data E[-5, 5, 0.4]
Tree Depth 5
Generation Method Half and Half
Number of Features 2 (\(x, y\))
Constants \(1, 4\)
Population Size 500
Generations 100
Tournament Size 20
Crossover Rate 70%
Subtree Mutation Rate 10%
Hoist Mutation Rate 5%
Point Mutation Rate 10%
Point Replacement Rate 5%
Number of Offspring 10
Terminals \(\{x, y\} \cup constants\)
Functions \(\{+, -, \times, \div, exp,\) \(\frac{1}{x}, exp\}\)
Fitness Function MSE
Stopping Threshold 0.01
[1]White, D. R., McDermott, J., Castelli, M., Manzoni, L., Goldman, B. W., Kronberger, G., … Luke, S. (2013). Better GP benchmarks: Community survey results and proposals. Genetic Programming and Evolvable Machines. https://doi.org/10.1007/s10710-012-9177-2