-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
Problem Description
As a user, I'd like to have a baseline synthesizer to compare different multi-table synthesizers to.
In the single table case we have the UniformSynthesizer. We also use this synthesizer as the fallback when other synthesizers timeout or fail. We need something like this in the multi-table case.
Our plan is to add a MultiTableUniformSynthesizer. All this synthesizer will do is use the single table UniformSynthesizer on each table. It does not have to handle connecting child to parent rows. We expect this synthesizer will have poor referential integrity for this reason.
Expected behavior
Add a new class
class MultiTableUniformSynthesizer(BaselineSynthesizer):
def get_trained_synthesizer(data, metadata):
"""
This function should train single table UniformSynthesizers on each table in the data.
Args:
data (dict): A dict mapping table name to table data.
metadata (sdv.metadata.Metadata): The metadata
Returns:
A synthesizer object.
""""
pass
def sample_from_synthesizer(synthesizer, scale):
"""
Args:
synthesizer (sdgym.synthesizers.BaselineSynthesizer): The trained synthesizer instance.
scale (float): The scale of data to sample. Should default to 1.
Returns:
dict: A dict mapping table name to the sampled data.
"""
passAdditional context
- The base class to use may change based on SDGym should be able to automatically discover SDV Enterprise synthesizers #481
- Don't worry about referential integrity
- Don't worry about getting this class to run with a benchmark. That will be handled in Add benchmark_multi_table function #486 and Add benchmark_multi_table_aws #487
Metadata
Metadata
Assignees
Labels
feature requestRequest for a new featureRequest for a new feature