Feature Request: Using `emulator.learn()` with Simulation Uncertainty Data

**Is your feature request related to a problem? If so, please describe.**

I am trying to run `emulator.learn()` using a simulation that outputs two datasets; the simulation results and the standard deviation in these values. When I try this, I get the error: **ValueError: The specified estimator type, fixed_noise_gp, is not currently supported. Please check the ``EstimatorParams`` documentation for more available estimator types.** When I try a different estimator type, I get the error: **ValueError: Must pass 2-d input. shape=(2, 1, 10) .**

**Describe the solution you'd like**

I’d like to be able to use emulator.learn() on a simulation with noise and to be able to train an emulator on this output noise, updating the noise dataset with the results dataset inside the `emulator.learn()` function. Below, I've included an example of code I would like to be able to run using `emulator.learn()`.

**Describe alternatives you've considered**

Emulator.learn() works when I return only the simulation results and don’t train the emulator on the simulation noise.

**Possible workarounds**

I’ve been able to use `emulator.recommend()` in a function that performs active learning on my dataset, with a fixed noise GP. Recommending a new data point, uploading the outputs for the simulation noise and results, and retrains the emulator.

**Here is an example of a simulation function that generates a results dataset and a noise dataset that I would to run with a fixed noise GP inside of emulator.learn():**

```
[tool.poetry.dependencies]
python = ">=3.10,<3.13"
twinlab = "2.11.0"
scikit-learn = "^1.5.1"
```
Code making example initial datasets of simulation results and corresponding uncertainty:
```
import pandas as pd
import twinlab as tl
import numpy as np

num_inputs = 3
num_outputs = 10
num_rows = 10

# Create a simple dataset with 3 input columns and 10 output columns
input_columns = [f"x_{i}" for i in range(num_inputs)]
output_columns = [f'y_{i}' for i in range(num_outputs)]

# Create random input data (10 samples)
inputs = np.random.rand(num_rows, len(input_columns)) * 10  # Random values between 0 and 10

# Create correlated output data with some noise
weights = np.random.rand(num_outputs, num_inputs)  # Random weights for linear combination
outputs = np.dot(inputs, weights.T) + np.random.randn(num_rows, num_outputs) * 0.5  # Linear relationship with noise

# Create corresponding standard deviations (can be a function of the outputs, or fixed)
std_devs = np.abs(np.random.randn(num_rows, len(output_columns)) * 0.2)  # Smaller standard deviations

# Convert to DataFrames
input_df = pd.DataFrame(inputs, columns=input_columns)
output_df = pd.DataFrame(outputs, columns=output_columns)
std_df = pd.DataFrame(std_devs, columns=output_columns)

# Make sim_results_df
simulation_results = pd.concat([input_df, output_df], axis=1)

# Display the results
simulation_results
```
```
# Display the noise dataframe
std_df
```
Make an example function that returns simulation outputs and simulation uncertainty for a set of inputs:
```
def run_simulation(input_params):
    num_outputs = 10  # Number of output columns

    # If input_params is a DataFrame, convert it to a NumPy array
    if isinstance(input_params, pd.DataFrame):
        input_params = input_params.values.flatten()  # Flatten to 1D array

    # Ensure input_params is now a 1D NumPy array
    input_params = np.asarray(input_params).flatten()

    # Example simple model: outputs are linear combinations of inputs with some noise
    outputs = np.dot(input_params, np.random.rand(len(input_params), num_outputs)) + np.random.randn(num_outputs)

    # Example standard deviations (simulated here as random noise)
    std_devs = np.abs(np.random.randn(num_outputs) * 0.2)  # Smaller standard deviations

    # Convert outputs and standard deviations to DataFrames
    output_columns = [f'y_{i}' for i in range(num_outputs)]
    output_df = pd.DataFrame([outputs], columns=output_columns)
    std_df = pd.DataFrame([std_devs], columns=output_columns)

    return output_df, std_df
```
Uploading the initial datasets:
```
dataset = tl.Dataset('feature_request_dataset')

# Upload the dataset, passing in the dataframe
dataset.upload(simulation_results)
```
```
uncertainty_dataset = tl.Dataset('feature_request_uncertainty_dataset')

# Upload the dataset, passing in the uncertainty dataframe
uncertainty_dataset.upload(std_df)
```
Initialising and training the emulator:
```
# Initialise emulator
emulator_id = "feature_request_emulator"

emulator = tl.Emulator(id=emulator_id)
```
```
estimator_params = tl.EstimatorParams(
    estimator_type="fixed_noise_gp"
)
```
```
train_params = tl.TrainParams(
    estimator='gaussian_process_regression',
    dataset_std=uncertainty_dataset,
    estimator_params=estimator_params,
)
```
```
# Train the emulator using the train method
emulator.train(
    dataset=dataset,
    inputs=input_columns,
    outputs=output_columns,
    params=train_params,
)
```
Using `emulator.learn()`:
```
emulator.learn(
    dataset=dataset, 
    inputs=input_columns, 
    outputs=output_columns, 
    num_loops=1, 
    num_points_per_loop=1, 
    acq_func="LogEI", 
    simulation=run_simulation, 
    train_params=train_params, 
)
```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-112-74429a22c6ec>](https://localhost:8080/#) in <cell line: 1>()
----> 1 emulator.learn(
      2     dataset=dataset,
      3     inputs=input_columns,
      4     outputs=output_columns,
      5     num_loops=1,

[/usr/local/lib/python3.10/dist-packages/twinlab/emulator.py](https://localhost:8080/#) in learn(self, dataset, inputs, outputs, num_loops, num_points_per_loop, acq_func, simulation, train_params, recommend_params, verbose)
   1319         ]
   1320         if train_params.estimator_params.estimator_type in invalid_GP_estimators:
-> 1321             raise ValueError(
   1322                 f"The specified estimator type, {train_params.estimator_params.estimator_type}, is not currently supported. Please check the ``EstimatorParams`` documentation for more available estimator types."
   1323             )

ValueError: The specified estimator type, fixed_noise_gp, is not currently supported. Please check the ``EstimatorParams`` documentation for more available estimator types.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Using `emulator.learn()` with Simulation Uncertainty Data #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Using emulator.learn() with Simulation Uncertainty Data #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Using `emulator.learn()` with Simulation Uncertainty Data #4