Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use it for multivariate forecasting? #13

Open
harshitv804 opened this issue Mar 17, 2024 · 11 comments
Open

How to use it for multivariate forecasting? #13

harshitv804 opened this issue Mar 17, 2024 · 11 comments
Labels
FAQ Frequently asked question

Comments

@harshitv804
Copy link

No description provided.

@abdulfatir
Copy link
Contributor

@harshitv804 as we discussed in the paper, Chronos currently focuses on univariate forecasting. For multivariate time series, you might want to use Chronos on the individual dimensions independently. If you have specific multivariate use cases/datasets to share with us, please do. It will helpful for us to understand the types of practical multivariate problems.

@lostella
Copy link
Contributor

Keeping this open for visibility, since others may have the same question

@lostella lostella reopened this Mar 18, 2024
@lostella lostella added the FAQ Frequently asked question label Mar 18, 2024
@lostella lostella pinned this issue Mar 18, 2024
@lostella lostella changed the title How to use it for multivariate forecasting ? How to use it for multivariate forecasting? Mar 18, 2024
@abdulfatir abdulfatir unpinned this issue Mar 26, 2024
@ozanbarism
Copy link

Can chronos take multiple inputs (channels) but make predictions on a single one of them?

I have pushed a data of size: (n_features, samples) and it makes predictions on one of them. However, it seems like I cannot choose the feature that it is making predictions on. Is there a way to choose it?

Thanks

@lostella
Copy link
Contributor

@ozanbarism if I understand your question right, you want to provide covariates: this is not possible, see #22.

I have pushed a data of size: (n_features, samples) and it makes predictions on one of them.

I'm not sure what you mean here: don't you get predictions for all of them? That's what should happen

@ozanbarism
Copy link

ozanbarism commented Jun 23, 2024

I do not get predictions for all of them. I get predictions for one of them it seems like.
Also, there is a number of samples term, is this the length of the context data we provide?

This is what it looks like for a univariate data.
image
And this is the case where i push multivariate data. as you can see it still returns a single prediction column.
image

this is my code

model = ChronosModel(name = "amazon/chronos-t5-small",
device = "cpu")
duration = 20 # in hours
pred_hrz = 2
sampling_rates=[300]
for i, sr in enumerate(sampling_rates):

Parameter = ParameterGenerator('OfficeSmall', 'Hot_Dry', 'Tucson', max_power=max_power, time_reso=control_rate)  # Description of ParameterGenerator in bldg_utils.py
data, gt = building_simulate(Parameter, room_id, duration, pred_hrz, control_rate,
                          sr, T_cool, T_heat, mode, hysteresis_margin, single_variate=False, make_plot=False, show_outdoor=False)

pred_len = int(pred_hrz*3600/sr)
low, forecast, high = model(data, prediction_length=pred_len, num_samples=1)
plot_pred(data, forecast, gt, forecast_index=None)
print('MSE {:.4f}'.format(np.mean((forecast-gt[:,0])**2)))

and this is how i defined the chronosmodel class

class ChronosModel:

def __init__(self, name, device="cuda"):
    from chronos import ChronosPipeline
    self.model = ChronosPipeline.from_pretrained(
        name,
        device_map=device,  # use "cpu" for CPU inference and "mps" for Apple Silicon
        torch_dtype=torch.bfloat16,
    )

def __call__(self, data, prediction_length, num_samples=1):
    if not torch.is_tensor(data):
        _data = torch.tensor(data)
    else:
        _data = data
    forecast = self.model.predict(
        context=_data,
        prediction_length=prediction_length,
        num_samples=num_samples,
    )

    low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
    return low, median, high # 80% interval

@iamchrisearle
Copy link

Also, there is a number of samples term, is this the length of the context data we provide?

From the documentation:

Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context

So you will get more than one output (multiple future trajectories) equal to the num_samples. If you only want one prediction you can set to 1, but most users will take the median of the default 20 as your code does with np.quantile().

The code example is a bit difficult to follow (why add a model wrapper here?) I suspect that you're only getting one prediction because you set the 0th forecast to always be output with forecast[0].
From the plots though, each blue line can be thought of as an independent univariate time series such as a collection of independent weather stations collection temperature data. This model can't take in traditional model "features", but can predict on each univariate series in parallel based only off the historical. So when you add in several "features" like that (T_cool, T_heat etc.) it could predict the next values of each "features" but not use them to inform a target variable as @lostella mentioned.
To get each prediction you would need to loop through the number of univariate time series that you have with something like:

for i in range(num_of_series):
    forecast[i].median()

Hope this helps.

@ikvision
Copy link

@harshitv804 I am working on extending chronos to add covariates using an lgbm regression head on top of univariate embeddings
If you want to assist me to progress on this solution I would really appreciate it
autogluon/autogluon#4278

@hsm207
Copy link

hsm207 commented Aug 9, 2024

If you have specific multivariate use cases/datasets to share with us, please do. It will helpful for us to understand the types of practical multivariate problems.

@abdulfatir The multivariate use case I have is to forecast the open, high, low, and close of an asset in the financial markets aka candlestick charts. In this case, I don't think forecasting on the individual dimensions independently is ideal, since in a given timestep, since there is a dependent relationship between the dimensions.

@ikvision
Copy link

@hsm207 Do you want to forecast all 3 variables future based on all 3 variables past value?
Did you try https://huggingface.co/Salesforce/moirai-1.0-R-large?

@hsm207
Copy link

hsm207 commented Aug 12, 2024

Do you want to forecast all 3 variables future based on all 3 variables past value?

@ikvision yes, I want to forecast all 4 variables (open, high, low, close) based on the 4 variable's past value.

Did you try https://huggingface.co/Salesforce/moirai-1.0-R-large?

I have not. Thanks for sharing! I was not aware of this paper before. From the abstract, it looks like it will help.

@Aisuko
Copy link

Aisuko commented Sep 27, 2024

Hi, guys. Thanks for your discussion. I got some useful info, cool. In my case, I have the medical data of different vital signs for multiple patients. These are multivariate time series data. The multivariate part comes from different measurement items, like PH, SpO2, Urine Output, etc total 12 item.

For example, for the 5000 samples. We will have the data(ndarray) shape (5000, 12, 200). 12 features over 200 times steps. The dataset please check the output of this notebook https://www.kaggle.com/code/wangyuweikiwi/mimi-iii-time-series-data-preprocessing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FAQ Frequently asked question
Projects
None yet
Development

No branches or pull requests

8 participants