Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForkingPickler Error When Reading Arrow File #149

Open
AlvinLok opened this issue Jul 10, 2024 · 12 comments
Open

ForkingPickler Error When Reading Arrow File #149

AlvinLok opened this issue Jul 10, 2024 · 12 comments
Labels
bug Something isn't working windows Concerns running code on Windows

Comments

@AlvinLok
Copy link

Describe the bug
TypeError caused by EOFError when loading pickle file through ForkingPickler:

2024-07-10 10:46:29,886 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - TF32 format is only available on devices with compute capability >= 8. Setting tf32 to False.
2024-07-10 10:46:29,893 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Using SEED: 1360904892
2024-07-10 10:46:29,958 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Logging dir: output\run-1
2024-07-10 10:46:29,961 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Loading and filtering 1 datasets for training: ['data.arrow']
2024-07-10 10:46:29,962 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Mixing probabilities: [1]
2024-07-10 10:46:30,642 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Initializing model
2024-07-10 10:46:30,642 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Using pretrained initialization from amazon/chronos-t5-small
The speedups for torchdynamo mostly come wih GPU Ampere or higher and which is not detected here.
max_steps is given, it will override any value given in num_train_epochs
2024-07-10 10:46:45,054 - C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py - INFO - Training
0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py", line 692, in
app()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\main.py", line 326, in call
raise e
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\main.py", line 309, in call
return get_command(self)(*args, **kwargs)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\click\core.py", line 1157, in call
return self.main(*args, **kwargs)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\core.py", line 661, in main
return _main(
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\core.py", line 193, in _main
rv = self.invoke(ctx)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\click\core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\click\core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer\main.py", line 692, in wrapper
return callback(**use_params)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\typer_config\decorators.py", line 92, in wrapped
return cmd(*args, **kwargs)
File "C:\Users\alvin\OneDrive\Coding\Python\chronos\chronos-forecasting\scripts\training\train.py", line 679, in main
trainer.train()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\transformers\trainer.py", line 1932, in train
return inner_training_loop(
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\transformers\trainer.py", line 2230, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\accelerate\data_loader.py", line 671, in iter
main_iterator = super().iter()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter
return self._get_iterator()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\torch\utils\data\dataloader.py", line 387, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\site-packages\torch\utils\data\dataloader.py", line 1040, in init
w.start()
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "", line 2, in pyarrow.lib._RecordBatchFileReader.reduce_cython
TypeError: no default reduce due to non-trivial cinit
0%|

(chronos) C:\Users\alvin\OneDrive\Coding\Python\chronos>Traceback (most recent call last):
File "", line 1, in
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\alvin\anaconda3\envs\chronos\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Occurs when attempting to fine tune the model:
python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001

Steps taken:

  1. Spun up new conda environment
  2. pip install "chronos[training] @ git+https://github.com/amazon-science/chronos-forecasting.git"
  3. Cloned repo into working directory
  4. Converted pandas df into arrow with provided function: convert_to_arrow('data.arrow', df.VALUE, df.REF_DATE)
  5. Edited config file to point to arrow file
  6. set CUDA_VISIBLE_DEVICES=0
  7. python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001
  8. Encountered RuntimeError: fused=True requires all the params to be floating point Tensors of supported devices: ['cuda', 'xpu', 'privateuseone'], so I did pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  9. Ran training again: python chronos-forecasting/scripts/training/train.py --config chronos-forecasting/scripts/training/configs/chronos-t5-small.yaml --model-id amazon/chronos-t5-small --no-random-init --max-steps 1000 --learning-rate 0.001

Environment description
Operating system:
Python version: 3.10.14
CUDA version: 12.2
PyTorch version: 2.3.1+cu121
HuggingFace transformers version: 4.42.3
HuggingFace accelerate version: 0.32.1

Any help is appreciated, I have tried this with multiple fresh conda environments on different machines

@AlvinLok AlvinLok added the bug Something isn't working label Jul 10, 2024
@abdulfatir
Copy link
Contributor

It looks like you're using Windows. We haven't really tested this codebase on windows. Could you try the following?

  • Set dataloader_num_workers to 0.
  • Use another optimizer instead of adamw_torch_fused (try adamw_torch).

@lostella
Copy link
Contributor

@AlvinLok any update on this?

@lostella lostella added the windows Concerns running code on Windows label Jul 17, 2024
@AlvinLok
Copy link
Author

Yes, I'm on Windows. I've made the changes, and now it's a new error:

Traceback (most recent call last):
File "C:\Users\alvinlok\xxx\03 Code\chronos-forecasting\scripts\training[train.py](http://train.py/)", line 694, in
app()
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\typer[main.py](http://main.py/)", line 326, in call
raise e
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\typer[main.py](http://main.py/)", line 309, in call
return get_command(self)(*args, **kwargs)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\click[core.py](http://core.py/)", line 1157, in call
return [self.main(](http://self.main(/)*args, **kwargs)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\typer[core.py](http://core.py/)", line 661, in main
return _main(
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\typer[core.py](http://core.py/)", line 193, in _main
rv = self.invoke(ctx)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\click[core.py](http://core.py/)", line 1434, in invoke
return [ctx.invoke(self.callback](http://ctx.invoke(self.callback/), **ctx.params)/)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\click[core.py](http://core.py/)", line 783, in invoke
return __callback(*args, **kwargs)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\typer[main.py](http://main.py/)", line 692, in wrapper
return callback(**use_params)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\typer_config[decorators.py](http://decorators.py/)", line 92, in wrapped
return cmd(*args, **kwargs)
File "C:\Users\alvinlok\xxx\03 Code\chronos-forecasting\scripts\training[train.py](http://train.py/)", line 681, in main
trainer.train()
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\transformers[trainer.py](http://trainer.py/)", line 1932, in train
return inner_training_loop(
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\transformers[trainer.py](http://trainer.py/)", line 2230, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\accelerate[data_loader.py](http://data_loader.py/)", line 677, in iter
next_batch, next_batch_info = self._fetch_batches(main_iterator)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\accelerate[data_loader.py](http://data_loader.py/)", line 631, in _fetch_batches
batches.append(next(iterator))
File "C:\Users\alvinlok\AppData\Local\anaconda3\envs\chronos-2\lib\site-packages\torch\utils\data[dataloader.py](http://dataloader.py/)", line 631, in next
data = self._next_data()
File "C:\Users\alvinlok\AppData\Local\anaconda3\envs\chronos-2\lib\site-packages\torch\utils\data[dataloader.py](http://dataloader.py/)", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\alvinlok\AppData\Local\anaconda3\envs\chronos-2\lib\site-packages\torch\utils\data_utils[fetch.py](http://fetch.py/)", line 32, in fetch
data.append(next(self.dataset_iter))
File "C:\Users\alvinlok\xxx\03 Code\chronos-forecasting\scripts\training[train.py](http://train.py/)", line 241, in iter
for element in self.base_dataset:
File "C:\Users\alvinlok\xxx\03 Code\chronos-forecasting\scripts\training[train.py](http://train.py/)", line 491, in iter
yield self.to_hf_format(next(iterators
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\transform[_base.py](http://_base.py/)", line 111, in iter
yield from [self.transformation(](http://self.transformation(/)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\transform[_base.py](http://_base.py/)", line 186, in call
for data_entry in data_it:
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\transform[_base.py](http://_base.py/)", line 186, in call
for data_entry in data_it:
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts[itertools.py](http://itertools.py/)", line 85, in iter
for el in self.iterable:
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\dataset[common.py](http://common.py/)", line 424, in call
data = t(data)
File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\dataset[common.py](http://common.py/)", line 345, in call
raise GluonTSDataError(
gluonts.exceptions.GluonTSDataError: Array 'target' has bad shape - expected 1 dimensions, got 0.
0%| | 0/1000 [00:00<?, ?it/s]

@abdulfatir
Copy link
Contributor

For this, convert_to_arrow('data.arrow', df.VALUE, df.REF_DATE), can you share how your dataframe looks like?

@AlvinLok
Copy link
Author

This is what my df looks like:

REF_DATE	VALUE

0 2010-01-01 84.7
1 2010-02-01 85.3
2 2010-03-01 85.4
3 2010-04-01 85.8
4 2010-05-01 86.8

convert_to_arrow(
path="arrow_files/p32_df_train.arrow",
time_series=p32_df_train.VALUE,
start_times=p32_df_train.REF_DATE,
)

@lostella
Copy link
Contributor

@AlvinLok could you check if the fix proposed in #156 makes it work for you?

@AlvinLok
Copy link
Author

no, adding freeze_support() did not have any effect. I am getting the same error: Array 'target' has bad shape - expected 1 dimensions, got 0.

@abdulfatir
Copy link
Contributor

@lostella this one is unrelated. @AlvinLok you're transforming the data incorrectly. Please check the type signature of the function that you're using to transform. convert_to_arrow expects

...
time_series: Union[List[np.ndarray], np.ndarray],
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
...

The first one is a list of 1-D numpy arrays (i.e., a list of time series). The second one is a list of np.datetime64, i.e., a list of start times, one for each time series in the first list. Since we're only using the start_times, time series are expected to be uniformly-spaced.

@abdulfatir
Copy link
Contributor

@AvisP: can you also check that you're transforming the data correctly?

@AvisP
Copy link

AvisP commented Jul 23, 2024

@abdulfatir That is highly unlinkely, as I am not using any custom data but generated ones using the provided script and code in example. Are there any likely issues that may happen generating data with python kernel-synth.py --num-series 20 --max-kernels 5 and with the following script? Here are the datafiles that I am using to download and verify

from pathlib import Path
from typing import List, Optional, Union

import numpy as np
from gluonts.dataset.arrow import ArrowWriter


def convert_to_arrow(
    path: Union[str, Path],
    time_series: Union[List[np.ndarray], np.ndarray],
    start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
    compression: str = "lz4",
):
    if start_times is None:
        # Set an arbitrary start time
        start_times = [np.datetime64("2000-01-01 00:00", "s")] * len(time_series)

    assert len(time_series) == len(start_times)

    dataset = [
        {"start": start, "target": ts} for ts, start in zip(time_series, start_times)
    ]
    ArrowWriter(compression=compression).write_to_file(
        dataset,
        path=path,
    )


if __name__ == "__main__":
    # Generate 20 random time series of length 1024
    time_series = [np.random.randn(1024) for i in range(20)]

    # Convert to GluonTS arrow format
    convert_to_arrow("./noise-data.arrow", time_series=time_series)

@AlvinLok
Copy link
Author

AlvinLok commented Aug 1, 2024

@lostella this one is unrelated. @AlvinLok you're transforming the data incorrectly. Please check the type signature of the function that you're using to transform. convert_to_arrow expects

...
time_series: Union[List[np.ndarray], np.ndarray],
start_times: Optional[Union[List[np.datetime64], np.ndarray]] = None,
...

The first one is a list of 1-D numpy arrays (i.e., a list of time series). The second one is a list of np.datetime64, i.e., a list of start times, one for each time series in the first list. Since we're only using the start_times, time series are expected to be uniformly-spaced.

Alright, well I converted it to a numpy array, and removed the start times argument but received the same error

time_series_data = p32_df_train.VALUE.to_numpy()
path = "arrow_files/p32_df_train.arrow"

convert_to_arrow(
path=path,
time_series=time_series_data
)

Error:

File "C:\Users\alvinlok\AppData\Roaming\Python\Python310\site-packages\gluonts\dataset\common.py", line 345, in call
raise GluonTSDataError(
gluonts.exceptions.GluonTSDataError: Array 'target' has bad shape - expected 1 dimensions, got 0.
0%| | 0/1000 [00:00<?, ?it/s]

@abdulfatir
Copy link
Contributor

@AlvinLok It looks like you're passing a single series to the function. You need to pass a list of time series. If you only have a single series, pass it as [time_series_data].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows Concerns running code on Windows
Projects
None yet
Development

No branches or pull requests

4 participants