-
Notifications
You must be signed in to change notification settings - Fork 41
ENH Add support for GLasso and Adaptive (reweighted) GLasso #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add support for GLasso and Adaptive (reweighted) GLasso #280
Conversation
|
Hello @Badr-MOUFAD , |
|
Definitely! Thnx for the PR 🚀 |
|
@Badr-MOUFAD no need to review yet, we realized that a barebone gramcd solver works much better than anderson working set here, we're still in the benchmarking phase. We'll ping you when the code is ready! 🙏 |
|
@Perceptronium the unit tests are failing:
|
skglm/covariance.py
Outdated
| elif strategy == "mcp": | ||
| gamma = 3. | ||
| Weights = np.zeros_like(Theta) | ||
| Weights[np.abs(Theta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this indentation quite hard to read
skglm/covariance.py
Outdated
| return self | ||
|
|
||
|
|
||
| def update_weights(Theta, alpha, strategy="log"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be private (start with an underscore)
skglm/covariance.py
Outdated
|
|
||
|
|
||
| def update_weights(Theta, alpha, strategy="log"): | ||
| if strategy == "log": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try to directly use penalty.derivative as is done in IterativeReweightedL1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is changed now (see class above AdaptiveGraphicalLassoPenalty), ready for review and to be discussed with Can. I have made tests below which we can delete later (see results in screenshot attached)
To-Dos left: Verify if its correct, check if it works with all penalties
For now, both options yield the same results in the comparison

|
@Perceptronium you can pull upstream main to fix the tests |
…dos: review math, check if it works with all other penalties and clean up code to one version if everything is correct
Badr-MOUFAD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work, pin me when you are done for another review
skglm/covariance.py
Outdated
|
|
||
|
|
||
| class GraphicalLasso(): | ||
| """A first-order BCD Graphical Lasso solver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elaborate more on the docstring (attributes, refs, ...)
(for instance docs of AndersonCD)
skglm/covariance.py
Outdated
| from skglm.penalties.separable import L0_5 | ||
|
|
||
|
|
||
| class GraphicalLasso(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there are reason why we don't inherent from sklearn estimator or at least BaseGraphicalLasso
?
that's will be a good way to stay consistent with sklearn api
skglm/covariance.py
Outdated
| W = self.covariance_ | ||
| else: | ||
| W = S.copy() | ||
| W *= 0.95 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better be more explicit about why we scale W
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood we need it to get the inverse right after and thus W needs to be SPD. I changed the approach to not rely on hardcoded scaling, but instead computed the smallest eigenvalue of S, then if it was negative or too close to zero added a ridge to make all eigenvalues at least eps.
Let me know what you think.
@Perceptronium also check if there was any other reason for that scaling.
skglm/covariance.py
Outdated
| return self | ||
|
|
||
|
|
||
| class AdaptiveGraphicalLasso(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better inherent a sklearn estimator here also
skglm/covariance.py
Outdated
| glasso.weights = Weights | ||
| glasso.fit(S) | ||
|
|
||
| Theta_sym = (glasso.precision_ + glasso.precision_.T) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for enforcing the symmetry ? the G-Lasso should normally output symmetric matrices ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was rather made as a guardrail in case there are small numerical errors and its slightly asymmetric, but I can remove it
skglm/solvers/gram_cd.py
Outdated
|
|
||
|
|
||
| @njit | ||
| def barebones_cd_gram(H, q, x, alpha, weights, max_iter=100, tol=1e-4): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function (almost) reimplement the routine in GramCD solver
skglm/skglm/solvers/gram_cd.py
Lines 148 to 167 in 4b2344c
| def _gram_cd_epoch(scaled_gram, w, grad, penalty, greedy_cd): | |
| all_features = np.arange(len(w)) | |
| for cd_iter in all_features: | |
| # select feature j | |
| if greedy_cd: | |
| opt = penalty.subdiff_distance(w, grad, all_features) | |
| j = np.argmax(opt) | |
| else: # cyclic | |
| j = cd_iter | |
| # update w_j | |
| old_w_j = w[j] | |
| step = 1 / scaled_gram[j, j] # 1 / lipschitz_j | |
| w[j] = penalty.prox_1d(old_w_j - step * grad[j], step, j) | |
| # gradient update with Gram matrix | |
| if w[j] != old_w_j: | |
| grad += (w[j] - old_w_j) * scaled_gram[:, j] | |
| return penalty.subdiff_distance(w, grad, all_features) |
we better wrap it than re-implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally good suggestion to avoid code duplication. However, I am a bit cautious with doing changes there: (see comment from Mathurin: @#280 (comment) )
As I understood, Can and Mathurin made the separate barebones_cd_gram function for performance:
- The graphical lasso inner loop is extremely performance-sensitive and is called many times
- The general GramCD/_gram_cd_epoch routine is more flexible, but that flexibility adds overhead (penalty abstraction, selection logic, etc.).
- The specialized function is Numba-compiled and hardcoded for L1/soft-thresholding, which probably makes it much faster for this use case.
If you think we can refactor to share more code without losing performance, I’m open to suggestions. I didnt manage to wrap it without getting into issues with the numba compilation.
@mathurinm @Perceptronium, it seems you did some benchmarking back in February on this, maybe you can help us out here what to do best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is doable without compromising performance (but, maybe I'm wrong)
I will try to push code to see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Badr-MOUFAD yesI am the one who wrote this, when using our existing code we were too slow. I'd be happy if you managed to improve, but otherwise (with the addition of a comment to explain why we need this) I'm ok with this solution as it only concerns a small snippet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Badr-MOUFAD I tried to make a fair comparison with the standard gram_cd and the epoch solver, but I couldn't figure out a way to make them faster and in the comparison, barebones is still the fastest. Let me know what you think. Just run the debug_compare_solvers script and you will see it.
(Note: For the comparison I had to let GramCD take gram as an input instead of the design matrix, so I added this. It was marked as a todo anyways, so please verify if I implemented it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Badr-MOUFAD If you're OK with this, I'm in favor of incorporating these changes. If another version is faster, it can be replaced in a subsequent PR. OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't have the time to push.
Go ahead, we can make a PR later if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What explains the time difference is probably the full pass over w when returning subdiff distance in GramCD:
skglm/skglm/solvers/gram_cd.py
Line 167 in 4b2344c
| return penalty.subdiff_distance(w, grad, all_features) |
…, remove symmetry enforcing
skglm/solvers/gram_cd.py
Outdated
| scaled_gram = X | ||
| scaled_Xty = y | ||
| n_features = X.shape[0] | ||
| scaled_y_norm2 = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct if we do not pass y, right?
Context of the PR
Add Graphical Lasso and Adaptive Graphical Lasso estimators
Contributions of the PR
2.1. The "Banerjee" approach as in Banerjee et al., 2008 (the original GLasso algorithm)
2.2. The "Mazumder" approach as in Mazumder et al., 2012 (the P-GLasso algorithm)
A new AdaptiveGraphicalLasso estimator that solves the Reweighted Graphical Lasso problem, where the weights are updated following the procedure of Candès et al., 2008
Test functions for the two estimators
An illustrative example of the two estimators that plots their performance (in terms of NMSE and F1 score) as a function of the regularization hyperparameter.