Skip to content

Simplify caching #600

@janosg

Description

@janosg

The new design of data valuation methods avoids repeated computations of the utility function without relying on caching. We could therefore get rid of our current caching implementation based on memcached, which seems overpowered. This would close several issues related to caching (e.g. #517, #475, #464 and #459). Moreover, it could solve problems that arise due to the many files the current caching solution creates.

The only situation where caching ist still really important is when one benchmarks multiple algorithms and wants to use caching to ensure that randomness is kept as constant as possible between different algorithms and to save runtime in the benchmark. We therefore should create an entry point for benchmarking frameworks to enable caching. I see two possible solutions:

  1. Use a simple shared-memory cache to store all utility evaluations and return them as part of the ValuationResult. A benchmarking library could then use these evaluations to build up a cache. All logic to wrap Utility with a cached version would be in the benchmarking library.
  2. We could keep the cache_backend abstraction in the Utility but only implement a much simpler shared-memory backend in pydvl. Users with advanced caching needs could then build their own backends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cleanupwhen code is ugly or unreadable and needs restylingdesign-problemProblems with internal architecture

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions