Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented label in cluster estimation #231

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

Aleksa-M
Copy link
Contributor

No description provided.

Copy link
Member

@maxlou05 maxlou05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

Copy link
Member

@maxlou05 maxlou05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

@Aleksa-M Aleksa-M force-pushed the cluster-estimation-labels branch from c43b88f to 3134b2a Compare January 7, 2025 02:39
Copy link
Member

@maxlou05 maxlou05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed the code and it looks very complect. My suggestion is to use composition: Leave the original class alone, and create a new class (e.g. ClusterEstimationByLabel or something). This new class contains a dictionary of label to cluster estimation object, and is responsible for sorting the incoming points into the correct object in the dictionary and collecting the individual return values, which it then returns to the worker.

Copy link
Member

@maxlou05 maxlou05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

@@ -277,7 +288,7 @@ def __sort_by_weights(
@staticmethod
def __convert_detections_to_point(
detections: "list[detection_in_world.DetectionInWorld]",
) -> "list[tuple[float, float]]":
) -> "list[tuple[float, float, int]]":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we change this function signature? You never changed the function so it still returns a (float, float)

if min_activation_threshold < 1:
return False, None

return True, ClusterEstimationByLabel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check ClusterEstimation's restrictions. Either apply them again here or by invoking creating cluster estimation

labels_to_object_clusters: dict[int, list[object_in_world.ObjectInWorld] or None.
Dictionary where the key is a label and the value is a list of all cluster detections with that label
"""
label_to_detections: dict[int, list[detection_in_world.DetectionInWorld]] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment saying sorting detections by label

self, cluster_model_by_label: cluster_estimation_by_label.ClusterEstimationByLabel
) -> None:
"""
Five clusters with small standard devition that have different labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit doc string

Comment on lines 626 to 656
1: [100, 100, 100],
2: [100, 100, 100],
3: [100, 100, 100],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make each point different weights and different number of clusters, so it's a little more random?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do weight and confidence mean the same thing?

Five clusters with small standard devition that all have the same label
"""
# Setup
labels_to_n_samples_per_cluster = {1: [100, 100, 100, 100, 100]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test this with different number of points per cluster?

@@ -488,3 +566,97 @@ def test_position_regular_data(
break

assert is_match


class TestCorrectClusterEstimationByLabel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe test some error cases, like what happens if things violate your input conditions? For example, what happens when only 1 new point is passed to your Cluster Estimation By Label, it still runs. However, Cluster Estimation has a higher min_new_points, so nothing ever gets run and that point is just lost?

@Aleksa-M Aleksa-M force-pushed the cluster-estimation-labels branch from d8b63f7 to 2106bbe Compare February 6, 2025 01:08
Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

Copy link
Collaborator

@Xierumeng Xierumeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed.

"""
Checks if a valid cluster estimation object can be constructed.

See `ClusterEstimation` for parameter descriptions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to:

See `create()` for parameter descriptions.

Return: Whether the arguments are valid.

Comment on lines +14 to +35

ATTRIBUTES
----------
min_activation_threshold: int
Minimum total data points before model runs. Must be at least max_num_components.

min_new_points_to_run: int
Minimum number of new data points that must be collected before running model.

max_num_components: int
Max number of real landing pads. Must be at least 1.

random_state: int
Seed for randomizer, to get consistent results.

local_logger: Logger
For logging error and debug messages.

METHODS
-------
run()
Cluster estimation filtered by label.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

Comment on lines +108 to +115
RETURNS
-------
model_ran: bool
True if ClusterEstimation object successfully ran its estimation model, False otherwise.

labels_to_objects: dict[int, list[object_in_world.ObjectInWorld] or None.
Dictionary where the key is a label and the value is a list of all cluster detections with that label.
ObjectInWorld objects don't have a label property, but they are sorted into label categories in the dictionary.
Copy link
Collaborator

@Xierumeng Xierumeng Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify:

Return: Success, labels and their associated objects.


if not label in labels_to_objects:
labels_to_objects[label] = []
labels_to_objects[label] += clusters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line above this line.


if not label in labels_to_objects:
labels_to_objects[label] = []
labels_to_objects[label] += clusters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the desired behaviour? The cluster estimation objects already hold a record of all points, so they will always generate an updated version of the cluster centres. I think this should be an unconditional assignment instead.

"""

import random
import numpy as np
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line between system and 3rd party imports.


from modules.cluster_estimation import cluster_estimation_by_label
from modules.common.modules.logger import logger
from modules import detection_in_world
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this above (shorter is higher in alphabetical order).

from modules.common.modules.logger import logger
from modules import detection_in_world

MIN_TOTAL_POINTS_THRESHOLD = 100
Copy link
Collaborator

@Xierumeng Xierumeng Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 empty lines total between imports and global constants.

RNG_SEED = 0
CENTRE_BOX_SIZE = 500

# Test functions use test fixture signature names and access class privates
Copy link
Collaborator

@Xierumeng Xierumeng Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 empty lines total above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this testing is unnecessary and brittle (it will break if the implementation of ClusterEstimation is changed). The goal of these tests is ensuring that the labelled points go to the correct cluster estimation objects. The tests themselves are for various conditions. For example:

  • Never before seen labelled point (is the cluster estimation object created correctly?)
  • Existing labelled point goes to the correct cluster estimation object
  • Multiple points go to the correct places
  • Non consecutive labels (i.e. label values that skip numbers (e.g. {0, 1, 2, 5} )

Verifying that the points go to the correct objects can be done by accessing the ClusterEstimationByLabel and ClusterEstimation members.

Verifying the outputs is a little trickier, but can be done by making the objects return a number of cluster centres corresponding to their label (e.g. the object at 3 returns 3 centres when run() is called). When constructing them, the objects can be directly modified by creating a huge number of points exactly at the same location at each of the 3 centres, no need for any fancy cluster generation.

There is also no need to check whether the object actually ran with the thresholds either (basically, this test should still pass if someone removes __decide_to_run() from cluster estimation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants