-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mle #299
base: main
Are you sure you want to change the base?
Mle #299
Conversation
…t is the best way of extracting the kernel
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
One issue with this implementation is how the kernel is calculated. currently the kernel is calculated between the output means from the GP. Could this be done more efficiently by extracting the Kernel from the PyTorch model ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this PR is well-structured and demonstrates the MLE functionality well — nice job! I’ve left some suggestions and questions in the review.
In summary:
- Clarifying input shapes: Adding explicit shape checks and improving docstring descriptions to ensure consistency and avoid unexpected behavior.
- Ensuring consistency between torch and numpy operations: This includes maintaining a clear distinction between frameworks to avoid breaking the computational graph, especially when using automatic differentiation.
from utils import select_kernel | ||
|
||
|
||
def negative_log_likelihood(Sigma, obs_mean, model_mean): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mathematically, this looks good! Some potential improvements:
- The log probability is implemented in scipy and torch for multivariate normal distributions. Using those might be preferable.
- Right now, the function assumes a single mean and covariance. Should we extend it to handle a batch of means with shape
(n, d)
and covariances of shape(n, d, d)
? Also, adding shape checks might help downstream errors.
float: Negative log-likelihood value. | ||
""" | ||
# Add numerical stability to Sigma | ||
noise_term = 1e-5 * np.eye(Sigma.shape[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow batched inputs, we can do Sigma.shape[-1]
(last dimension) here, which will always be d
for a (n, d, d)
array or (d, d)
array.
# Negative log-likelihood | ||
nll = 0.5 * (quad_term + log_det_Sigma + len(diff) * np.log(2 * np.pi)) | ||
|
||
return nll.item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were using torch here, calling item()
stops the computation graph. To preserve differentiability, we can return the (torch) tensor. If this is what we want, then the rest of the function needs to be converted to torch.
""" | ||
Maximize the likelihood by optimizing the model parameters to fit the observed data. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the docstrings.
expectations (tuple): A tuple containing two elements: | ||
- pred_mean (Tensor): The predicted mean values (could be 1D or 2D tensor). | ||
- pred_var (Tensor): The predicted variance values (could be 1D or 2D tensor). | ||
obs (list or tuple): A list or tuple containing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we split this, so that the required arguments become obs_means
of shape (n, d)
or shape (d,)
, and obs_covs
of shape (n, d, d)
or (d, d)
, where both are torch arrays? If so, then we'd also need to check that both are either given as a batch with n
-size leading axis, or as a single example.
mean = model.predict(optimizable_params.detach().numpy()) | ||
K = kernel(mean.reshape(-1, 1)) # X is the input data | ||
nll = torch.tensor( | ||
negative_log_likelihood(K, obs_mean, mean), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is obs_mean
supposed to be here? A batch of means (with shape (n, d)
or a single mean (with shape (d,)
)?
return nll.item() | ||
|
||
|
||
def max_likelihood(parameters, model, obs, lr=0.01, epochs=1000, kernel_name="RBF"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify my understanding, is the following correct?
Inputs and Outputs:
parameters
are the candidate inputs to the simulation model.- The simulation outputs are summarized by their mean (
obs_mean
) and variance (obs_var
). For a deterministic simulation, you’d typically have onlyobs_mean
.
Emulator Predictions:
- The emulator (or surrogate model) takes the
parameters
as inputs and returns a predicted mean (pred_mean
). - Instead of relying on a separately predicted variance (
pred_var
), the kernel is used within the negative log-likelihood (NLL) computation to generate the corresponding covariance.
NLL Computation and Optimization:
- The NLL is computed for each candidate by comparing the observed outputs (
obs_mean
) against the predictive density defined bypred_mean
and the kernel-derived covariance. - The candidate input with the lowest NLL (i.e., the one that is most plausible under the emulator’s predictive distribution) is selected.
- That selected candidate is then further refined using gradient descent to minimize the NLL, effectively fine-tuning the input so that the emulator's output distribution aligns closely with the observation.
|
||
result = max_likelihood(parameters=X, model=best_model, obs=[0, 10]) | ||
|
||
print(f"Indices of plausible regions: {result['optimized_params']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "indicies" correct? Shouldn't it be a point in the input space of the simulation function?
|
||
gp_final = em.refit(gp) | ||
|
||
seed = 42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the seed is set multiple times. What do you think about collecting these global operations into one section?
@@ -0,0 +1,127 @@ | |||
import matplotlib.pyplot as plt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job on demonstrating the usage. For clarity and modularity, what do you think about encapsulating each demonstration into its own function, e.g. run_epidemic_experiment()
?
In this PR I have so far has implemented the maximum_likelihood function, the outer loop in this figure shows how we hope the output of this function can be used in Active-learning in AutoEmulate