Skip to content

Conversation

@Perceptronium
Copy link
Contributor

Context of the PR

Add Graphical Lasso and Adaptive Graphical Lasso estimators

Contributions of the PR

  • A new GraphicalLasso estimator that can:
  1. Handle weighted regularization.
  2. Use two different algorithms to solve the GLasso problem:
    2.1. The "Banerjee" approach as in Banerjee et al., 2008 (the original GLasso algorithm)
    2.2. The "Mazumder" approach as in Mazumder et al., 2012 (the P-GLasso algorithm)
  • A new AdaptiveGraphicalLasso estimator that solves the Reweighted Graphical Lasso problem, where the weights are updated following the procedure of Candès et al., 2008

  • Test functions for the two estimators

  • An illustrative example of the two estimators that plots their performance (in terms of NMSE and F1 score) as a function of the regularization hyperparameter.

@Perceptronium
Copy link
Contributor Author

Hello @Badr-MOUFAD ,
Hope everything is going well for you ! Could you take a look at this when you have the time ?
Thanks a lot !

@Badr-MOUFAD
Copy link
Collaborator

Definitely! Thnx for the PR 🚀

@mathurinm
Copy link
Collaborator

@Badr-MOUFAD no need to review yet, we realized that a barebone gramcd solver works much better than anderson working set here, we're still in the benchmarking phase. We'll ping you when the code is ready! 🙏

@mathurinm mathurinm marked this pull request as draft February 14, 2025 08:04
@mathurinm mathurinm changed the title Add support for GLasso and Adaptive (reweighted) GLasso WIP Add support for GLasso and Adaptive (reweighted) GLasso Feb 14, 2025
@mathurinm
Copy link
Collaborator

@Perceptronium the unit tests are failing:

@Perceptronium Perceptronium marked this pull request as ready for review April 3, 2025 08:03
@mathurinm mathurinm changed the title WIP Add support for GLasso and Adaptive (reweighted) GLasso ENH Add support for GLasso and Adaptive (reweighted) GLasso Apr 7, 2025
elif strategy == "mcp":
gamma = 3.
Weights = np.zeros_like(Theta)
Weights[np.abs(Theta)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this indentation quite hard to read

return self


def update_weights(Theta, alpha, strategy="log"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be private (start with an underscore)



def update_weights(Theta, alpha, strategy="log"):
if strategy == "log":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to directly use penalty.derivative as is done in IterativeReweightedL1 ?

Copy link
Contributor

@floriankozikowski floriankozikowski Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is changed now (see class above AdaptiveGraphicalLassoPenalty), ready for review and to be discussed with Can. I have made tests below which we can delete later (see results in screenshot attached)
To-Dos left: Verify if its correct, check if it works with all penalties
For now, both options yield the same results in the comparison
Comparison_Penalty vs  Strategy code

 into perceptroniumglasso/graphical_lasso

Merge Perceptronium/skglm's graphical_lasso: Sync with upstream for collaborative PR.#
@mathurinm
Copy link
Collaborator

@Perceptronium you can pull upstream main to fix the tests

@mathurinm mathurinm requested a review from Badr-MOUFAD June 27, 2025 14:11
Copy link
Collaborator

@Badr-MOUFAD Badr-MOUFAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work, pin me when you are done for another review



class GraphicalLasso():
"""A first-order BCD Graphical Lasso solver.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elaborate more on the docstring (attributes, refs, ...)
(for instance docs of AndersonCD)

from skglm.penalties.separable import L0_5


class GraphicalLasso():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there are reason why we don't inherent from sklearn estimator or at least BaseGraphicalLasso
?

that's will be a good way to stay consistent with sklearn api

W = self.covariance_
else:
W = S.copy()
W *= 0.95
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better be more explicit about why we scale W

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood we need it to get the inverse right after and thus W needs to be SPD. I changed the approach to not rely on hardcoded scaling, but instead computed the smallest eigenvalue of S, then if it was negative or too close to zero added a ridge to make all eigenvalues at least eps.
Let me know what you think.
@Perceptronium also check if there was any other reason for that scaling.

return self


class AdaptiveGraphicalLasso():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better inherent a sklearn estimator here also

glasso.weights = Weights
glasso.fit(S)

Theta_sym = (glasso.precision_ + glasso.precision_.T) / 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for enforcing the symmetry ? the G-Lasso should normally output symmetric matrices ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was rather made as a guardrail in case there are small numerical errors and its slightly asymmetric, but I can remove it



@njit
def barebones_cd_gram(H, q, x, alpha, weights, max_iter=100, tol=1e-4):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function (almost) reimplement the routine in GramCD solver

def _gram_cd_epoch(scaled_gram, w, grad, penalty, greedy_cd):
all_features = np.arange(len(w))
for cd_iter in all_features:
# select feature j
if greedy_cd:
opt = penalty.subdiff_distance(w, grad, all_features)
j = np.argmax(opt)
else: # cyclic
j = cd_iter
# update w_j
old_w_j = w[j]
step = 1 / scaled_gram[j, j] # 1 / lipschitz_j
w[j] = penalty.prox_1d(old_w_j - step * grad[j], step, j)
# gradient update with Gram matrix
if w[j] != old_w_j:
grad += (w[j] - old_w_j) * scaled_gram[:, j]
return penalty.subdiff_distance(w, grad, all_features)

we better wrap it than re-implemented

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally good suggestion to avoid code duplication. However, I am a bit cautious with doing changes there: (see comment from Mathurin: @#280 (comment) )
As I understood, Can and Mathurin made the separate barebones_cd_gram function for performance:

  • The graphical lasso inner loop is extremely performance-sensitive and is called many times
  • The general GramCD/_gram_cd_epoch routine is more flexible, but that flexibility adds overhead (penalty abstraction, selection logic, etc.).
  • The specialized function is Numba-compiled and hardcoded for L1/soft-thresholding, which probably makes it much faster for this use case.

If you think we can refactor to share more code without losing performance, I’m open to suggestions. I didnt manage to wrap it without getting into issues with the numba compilation.

@mathurinm @Perceptronium, it seems you did some benchmarking back in February on this, maybe you can help us out here what to do best?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is doable without compromising performance (but, maybe I'm wrong)
I will try to push code to see

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Badr-MOUFAD yesI am the one who wrote this, when using our existing code we were too slow. I'd be happy if you managed to improve, but otherwise (with the addition of a comment to explain why we need this) I'm ok with this solution as it only concerns a small snippet

Copy link
Contributor

@floriankozikowski floriankozikowski Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Badr-MOUFAD I tried to make a fair comparison with the standard gram_cd and the epoch solver, but I couldn't figure out a way to make them faster and in the comparison, barebones is still the fastest. Let me know what you think. Just run the debug_compare_solvers script and you will see it.

(Note: For the comparison I had to let GramCD take gram as an input instead of the design matrix, so I added this. It was marked as a todo anyways, so please verify if I implemented it correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Badr-MOUFAD If you're OK with this, I'm in favor of incorporating these changes. If another version is faster, it can be replaced in a subsequent PR. OK?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't have the time to push.
Go ahead, we can make a PR later if needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What explains the time difference is probably the full pass over w when returning subdiff distance in GramCD:

return penalty.subdiff_distance(w, grad, all_features)

scaled_gram = X
scaled_Xty = y
n_features = X.shape[0]
scaled_y_norm2 = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct if we do not pass y, right?

@mathurinm mathurinm merged commit 7e4802b into scikit-learn-contrib:main Jul 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants