Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mle #299

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Mle #299

wants to merge 2 commits into from

Conversation

marjanfamili
Copy link
Collaborator

@marjanfamili marjanfamili commented Feb 13, 2025

In this PR I have so far has implemented the maximum_likelihood function, the outer loop in this figure shows how we hope the output of this function can be used in Active-learning in AutoEmulate

Screenshot 2025-02-13 at 15 22 44

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@marjanfamili
Copy link
Collaborator Author

One issue with this implementation is how the kernel is calculated. currently the kernel is calculated between the output means from the GP.

Could this be done more efficiently by extracting the Kernel from the PyTorch model ?
Are all our models going to be PyTorch ?

Copy link
Collaborator

@cisprague cisprague left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this PR is well-structured and demonstrates the MLE functionality well — nice job! I’ve left some suggestions and questions in the review.

In summary:

  • Clarifying input shapes: Adding explicit shape checks and improving docstring descriptions to ensure consistency and avoid unexpected behavior.
  • Ensuring consistency between torch and numpy operations: This includes maintaining a clear distinction between frameworks to avoid breaking the computational graph, especially when using automatic differentiation.

from utils import select_kernel


def negative_log_likelihood(Sigma, obs_mean, model_mean):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mathematically, this looks good! Some potential improvements:

  1. The log probability is implemented in scipy and torch for multivariate normal distributions. Using those might be preferable.
  2. Right now, the function assumes a single mean and covariance. Should we extend it to handle a batch of means with shape (n, d) and covariances of shape (n, d, d)? Also, adding shape checks might help downstream errors.

float: Negative log-likelihood value.
"""
# Add numerical stability to Sigma
noise_term = 1e-5 * np.eye(Sigma.shape[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow batched inputs, we can do Sigma.shape[-1] (last dimension) here, which will always be d for a (n, d, d) array or (d, d) array.

# Negative log-likelihood
nll = 0.5 * (quad_term + log_det_Sigma + len(diff) * np.log(2 * np.pi))

return nll.item()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were using torch here, calling item() stops the computation graph. To preserve differentiability, we can return the (torch) tensor. If this is what we want, then the rest of the function needs to be converted to torch.

"""
Maximize the likelihood by optimizing the model parameters to fit the observed data.

Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the docstrings.

expectations (tuple): A tuple containing two elements:
- pred_mean (Tensor): The predicted mean values (could be 1D or 2D tensor).
- pred_var (Tensor): The predicted variance values (could be 1D or 2D tensor).
obs (list or tuple): A list or tuple containing:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split this, so that the required arguments become obs_means of shape (n, d) or shape (d,), and obs_covs of shape (n, d, d) or (d, d), where both are torch arrays? If so, then we'd also need to check that both are either given as a batch with n-size leading axis, or as a single example.

mean = model.predict(optimizable_params.detach().numpy())
K = kernel(mean.reshape(-1, 1)) # X is the input data
nll = torch.tensor(
negative_log_likelihood(K, obs_mean, mean),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is obs_mean supposed to be here? A batch of means (with shape (n, d) or a single mean (with shape (d,))?

return nll.item()


def max_likelihood(parameters, model, obs, lr=0.01, epochs=1000, kernel_name="RBF"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify my understanding, is the following correct?

Inputs and Outputs:

  • parameters are the candidate inputs to the simulation model.
  • The simulation outputs are summarized by their mean (obs_mean) and variance (obs_var). For a deterministic simulation, you’d typically have only obs_mean.

Emulator Predictions:

  • The emulator (or surrogate model) takes the parameters as inputs and returns a predicted mean (pred_mean).
  • Instead of relying on a separately predicted variance (pred_var), the kernel is used within the negative log-likelihood (NLL) computation to generate the corresponding covariance.

NLL Computation and Optimization:

  • The NLL is computed for each candidate by comparing the observed outputs (obs_mean) against the predictive density defined by pred_mean and the kernel-derived covariance.
  • The candidate input with the lowest NLL (i.e., the one that is most plausible under the emulator’s predictive distribution) is selected.
  • That selected candidate is then further refined using gradient descent to minimize the NLL, effectively fine-tuning the input so that the emulator's output distribution aligns closely with the observation.


result = max_likelihood(parameters=X, model=best_model, obs=[0, 10])

print(f"Indices of plausible regions: {result['optimized_params']}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "indicies" correct? Shouldn't it be a point in the input space of the simulation function?


gp_final = em.refit(gp)

seed = 42
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the seed is set multiple times. What do you think about collecting these global operations into one section?

@@ -0,0 +1,127 @@
import matplotlib.pyplot as plt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on demonstrating the usage. For clarity and modularity, what do you think about encapsulating each demonstration into its own function, e.g. run_epidemic_experiment()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants