Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: address comments for monitoring component #1520

Merged
merged 8 commits into from
Jan 21, 2025

Conversation

zdtsw
Copy link
Member

@zdtsw zdtsw commented Jan 19, 2025

  • move if to switch...case: Update Component rules in Prometheus #1503 (comment)
  • add .status.condition.MonitoringReady type: Update Component rules in Prometheus #1503 (comment)
    • change Monitoring CR .status.condition Reason and Message and Type name
    • remove unused predicate var from DSCI
    • change check on promethus deployment ready
    • update: change to use Apply than Create
    • update: add or remove prom rules
    • add field manager for monitoring CR to DSCI
    • add isComponentReady()
    • update predicate for monitoring on DSC change on both .spec.components and .status.condition

Description

How Has This Been Tested?

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@zdtsw zdtsw requested review from lburgazzoli and VaishnaviHire and removed request for jackdelahunt and biswassri January 19, 2025 16:47
@zdtsw zdtsw changed the title chore: address comments from pervious review chore: address comments for monitoring component Jan 19, 2025
@zdtsw
Copy link
Member Author

zdtsw commented Jan 19, 2025

/test opendatahub-operator-e2e

Copy link

codecov bot commented Jan 19, 2025

Codecov Report

Attention: Patch coverage is 0% with 57 lines in your changes missing coverage. Please review.

Project coverage is 19.63%. Comparing base (3e897ce) to head (971099e).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...rvices/monitoring/monitoring_controller_actions.go 0.00% 55 Missing ⚠️
.../dscinitialization/dscinitialization_controller.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1520      +/-   ##
==========================================
- Coverage   19.69%   19.63%   -0.07%     
==========================================
  Files         161      161              
  Lines       11102    11137      +35     
==========================================
  Hits         2187     2187              
- Misses       8683     8718      +35     
  Partials      232      232              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@biswassri
Copy link
Contributor

@zdtsw Can you help also link the PR comment here? So that anyone trying to understand the changes can go back and read the context?

@zdtsw
Copy link
Member Author

zdtsw commented Jan 20, 2025

@zdtsw Can you help also link the PR comment here? So that anyone trying to understand the changes can go back and read the context?

updated in description.

odhdeploy "github.com/opendatahub-io/opendatahub-operator/v2/pkg/deploy"
)

var (
ComponentName = serviceApi.MonitoringServiceName
prometheusConfigPath = filepath.Join(odhdeploy.DefaultManifestPath, ComponentName, "prometheus", "apps", "prometheus-configs.yaml")
ReadyConditionType = conditionsv1.ConditionType(serviceApi.MonitoringKind + status.ReadySuffix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for service/component, the ready condition should just be Ready, the suffix is only needed if the service/component specific readiness condition is exposed in an higher level API (i.e. DSC/DSCI)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if len(promDeployment.Items) == 1 && ready == 1 {
// TODO: deprecate phase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if before should either be rewritten like len(promDeployment.Items) == ready or there are more than the expected number of deployment should be reported as part of the failure condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

nc := metav1.Condition{
Type: string(ReadyConditionType),
Status: metav1.ConditionFalse,
Reason: status.ReconcileInit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if ReconcileInit makes any sense to be honest, so if the deployment won't get up and running, then we would leave the reason to ReconcileInit forever

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated not usre if it is good to use

                Type:    string(ReadyConditionType),
		Status:  metav1.ConditionFalse,
		Reason:  status.PhaseNotReady,
		Message: "Prometheus deployment is not ready",
	

zdtsw added 2 commits January 20, 2025 12:29
- more if to switch...case
- add .status.condition.MonitoringReady type

Signed-off-by: Wen Zhou <[email protected]>
- change Monitoring CR .status.condition Reason and Message and Type name
- remove unused predicate var
- change check on promethus deployment ready

Signed-off-by: Wen Zhou <[email protected]>
@@ -501,7 +501,10 @@ func (r *DSCInitializationReconciler) configureMonitoring(ctx context.Context, d
)

if dsci.Spec.Monitoring.ManagementState == operatorv1.Managed {
err := r.Create(ctx, defaultMonitoring)
// for generic case if we need to support configable monitoring namespace
_, err := controllerutil.CreateOrUpdate(ctx, r.Client, defaultMonitoring, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably replace it with server side apply r.Apply(..)

err := r.Create(ctx, defaultMonitoring)
if err != nil && !k8serr.IsAlreadyExists(err) {
// for generic case if we need to support configable monitoring namespace
if err := r.Apply(ctx, defaultMonitoring); err != nil && !k8serr.IsAlreadyExists(err) {
Copy link
Contributor

@lburgazzoli lburgazzoli Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting field owner is required, something like

err := ctrl.SetControllerReference(instance, componentCR, r.Scheme)
if err != nil {
return nil, err
}
err = r.Client.Apply(ctx, componentCR, client.FieldOwner(fieldOwner), client.ForceOwnership)
if err != nil {
return nil, err
}

}
} else {
falseVal := false
updated = &falseVal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use pointer.Bool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the meta.IsStatusConditionTrue(ci.GetStatus().Conditions, status.ConditionTypeReady) == true can also be moved to a case

}

// Check for shared components
if ch.GetName() == componentApi.KserveComponentName || ch.GetName() == componentApi.ModelMeshServingComponentName {
if err := UpdatePrometheusConfig(ctx, enabled, componentRules[componentApi.ModelControllerComponentName]); err != nil {
if err := UpdatePrometheusConfig(ctx, *updated, componentRules[componentApi.ModelControllerComponentName]); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this result in a panic due to a nil ptr ?

return err
}
}

if err := UpdatePrometheusConfig(ctx, enabled, componentRules[ch.GetName()]); err != nil {
if err := UpdatePrometheusConfig(ctx, *updated, componentRules[ch.GetName()]); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this result in a panic due to a nil ptr ?

}

// Check for shared components
if ch.GetName() == componentApi.KserveComponentName || ch.GetName() == componentApi.ModelMeshServingComponentName {
if err := UpdatePrometheusConfig(ctx, enabled, componentRules[componentApi.ModelControllerComponentName]); err != nil {
if err := UpdatePrometheusConfig(ctx, *updated, componentRules[componentApi.ModelControllerComponentName]); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here you have to check also for model controller readiness ?

trueVal := true
updated = &trueVal

if err := UpdatePrometheusConfig(ctx, *updated, componentRules[ch.GetName()]); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you don't need the updated var since it is always true

if !dsc.Status.InstalledComponents["model-mesh"] && !dsc.Status.InstalledComponents["kserve"] {
falseVal := false
updated = &falseVal
if err := UpdatePrometheusConfig(ctx, *updated, componentRules[ch.GetName()]); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you don't need the updated var since it is always false

zdtsw added 2 commits January 21, 2025 08:32
- update on modelcontroller
- add filed manager for monitoring CR to DSCI

Signed-off-by: Wen Zhou <[email protected]>
- remove duplicated function from pkg/service/montiroing
- add isComponentReady()

Signed-off-by: Wen Zhou <[email protected]>
zdtsw added 3 commits January 21, 2025 09:27
- remove duplicated monitoring predicates from DSCI controller
- update predicate for monitoring on DSC change on both .spec.components and .status.components

Signed-off-by: Wen Zhou <[email protected]>
}

// compare type one by one with their status if not equal return true
for _, nc := range newConditions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can probably made a little bit more efficient, but ok for now

if err != nil && !k8serr.IsAlreadyExists(err) {
// for generic case if we need to support configable monitoring namespace
// set filed manager to DSCI
if err := r.Apply(ctx, defaultMonitoring, client.FieldOwner("dscinitialization.opendatahub.io"), client.ForceOwnership); err != nil && !k8serr.IsAlreadyExists(err) {
Copy link
Contributor

@lburgazzoli lburgazzoli Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is a finalizerName const

Copy link

openshift-ci bot commented Jan 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lburgazzoli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zdtsw zdtsw merged commit 05c7947 into opendatahub-io:main Jan 21, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants