Kernel Derivatives #46

trthatcher · 2017-05-20T21:54:25Z

There's two components to this enhancement.

Optimization

Define a theta and eta (inverse theta) function to transform parameters between an open bounded interval to a closed bounded interval (or eliminate the bounds entirely) for use in optimization methods. This is similar to how link functions work in logistic regression - unconstrained optimization is used to set a parameter value in the interval (0,1) using the logit link function.

theta - given an interval and a value, applies a transformation that eliminates finite open bounds
eta - given an interval and a value, reverses the value back to the original parameter space
gettheta returns the theta transformed variable when applied to HyperParameters and a vector of theta transformed variables when used on a Kernel
settheta! this function is used to update HyperParameters or Kernels given a vector of theta-transformed variables
checktheta used to check if the provided vector (or scalar if working with a HyperParameter) is a valid update
upperboundtheta returns the theta-transformed upper bound. For example, in the case that a parameter is restricted to (0,1], the transformed upper bound will be log(1)
lowerboundtheta returns the theta-transformed lower bound. For example, in the case that a parameter is restricted to (0,1], the transformed lower bound will be -Infinity

Derivatives

Derivatives will be with respect to theta as described above.

gradeta derivative of eta function. Using chain rule, this is applied to gradkappa to get the derivative with respect to theta. Not exported.
gradkappa derivative of the scalar part of a Kernel. This must be defined for each kernel. It will be manual, so the derivative will be analytical or a hand coded numerical derivative. It will only be defined for parameters of the kernel. Not exported. Ex. dkappa(k, Val{:alpha}, z)
gradkernel derivative of kernel. Second argument will be the variable the derivative is with respect to. A value type with the field name as a parameter will be used. Ex. dkernel(k, Val{:alpha}, x, y)
gradkernelmatrix derivative matrix.

The text was updated successfully, but these errors were encountered:

kskyten · 2017-06-01T09:48:04Z

Sounds great! How can I help?
Can also you explain what is the relation between this enhancement and the derivatives branch?

trthatcher · 2017-06-03T17:44:45Z

Hello!

Very early on there was an attempt at adding derivatives - that's the derivatives branch. However, this added a great deal of complexity. I didn't feel like the base Kernel type and calculation method was carefully planned out before building all this complexity on top. For example, there wasn't really any consideration for the parameter constraints and how that would impact the optimization routines (this can be an issue with open intervals such as the alpha parameter in a Gaussian Kernel - not all kernels can use an unconstrained optimization method).

I've since reworked much of the package and explored how other libraries approach derivatives. Rather than having the Kernel type be a collection of floats, I've now made it a collection of HyperParameter instances. This new HyperParameter type contains a pointer to a value that can be altered as well as an Interval type that can be used to transform the parameter to a domain more amenable to optimization and enforce constraints/invariants.

I'm almost done the changes I've outlined in the "Optimization" section. Unfortunately I need to finish that first since the derivatives have a few dependencies on those changes. Once that is complete, it will just be a matter of defining analytic derivatives for the parameters and a kernel/kernel matrix derivative. I can provide some more direction as soon as that done if you'd like to help. It will be a couple more days though

kskyten · 2017-06-03T18:36:05Z

Excellent! I would like to help with defining the analytical derivatives. It seems that some of them have already been done in the derivatives branch.

Should #2 be closed?

trthatcher · 2017-06-09T13:35:44Z

The optimization section is basically complete save for a few tests - so it's good enough to start on the derivatives. I've updated the original comment for some detail. I've also expanded the documentation here:

http://mlkernels.readthedocs.io/en/dev/interface.html

The Hyper Parameters section may be helpful.

If you'd like to add some derivative definitions and open a PR, feel free. You can probably grab a number of them from the derivatives branch (hopefully some reusable tests, too). If you're planning on working on this over the next couple days, I won't be working on anything but I'll try to answer any questions you have.

trthatcher added the enhancement label May 20, 2017

trthatcher self-assigned this May 20, 2017

trthatcher mentioned this issue Jun 6, 2017

Derivatives (wrt parameters and input values) #2

Closed

trthatcher added this to the 0.3 milestone Jun 29, 2017

kskyten mentioned this issue Jul 12, 2017

Using Distances.jl and automatic differentiation #55

Open

trthatcher removed this from the 0.3 milestone Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel Derivatives #46

Kernel Derivatives #46

trthatcher commented May 20, 2017 •

edited

Loading

kskyten commented Jun 1, 2017

trthatcher commented Jun 3, 2017

kskyten commented Jun 3, 2017

trthatcher commented Jun 9, 2017 •

edited

Loading

Kernel Derivatives #46

Kernel Derivatives #46

Comments

trthatcher commented May 20, 2017 • edited Loading

Optimization

Derivatives

kskyten commented Jun 1, 2017

trthatcher commented Jun 3, 2017

kskyten commented Jun 3, 2017

trthatcher commented Jun 9, 2017 • edited Loading

trthatcher commented May 20, 2017 •

edited

Loading

trthatcher commented Jun 9, 2017 •

edited

Loading