Can LLMs Classify Time Series Data?

📑 Report

Abstract

The increasing prevalence of time-series data across domains, such as healthcare, finance, and IoT, has driven the need for flexible and generalizable modeling approaches. Traditional time-series analysis relies heavily on supervised models, which suffer from data scarcity and high deployment costs. This project explores the potential of Large Language Models (LLMs) as general-purpose few-shot learners for time-series classification tasks. Specifically, we investigate how different input representations affect LLM performance in two representative tasks: Dual-Tone Multi-Frequency (DTMF) signal decoding and Human Activity Recognition (HAR). By comparing text-based, visual, and multimodal inputs across models including GPT-4o, GPT-o1, and DeepSeek-R1, we assess their reasoning capabilities and robustness. Our findings show that while LLMs demonstrate potential, performance significantly varies depending on input representation and model type. In DTMF, GPT-o1 and DeepSeek-R1 consistently outperforms GPT-4o, particularly in tasks requiring text-based numerical reasoning. In HAR, visualization aids interpretation, and few-shot learning significantly boosts performance. However, challenges remain, especially in precise plot reading, domain knowledge retrieval (GPT-4o), and multimodal integration. Further domain-specific enhancements and robust representation strategies are heavily required for current LLMs.

DTMF

1. Run the experiments:

python dtmf_run.py <subcommand> -m <model> -n <noise-type> [optional arguments]

Required Arguments

-m, --model:
Choose the model (LLM) for this task. Support models: GPT-4o, GPT-o1, DppeSeek-R1
- Options: gpt-4o, o1-2024-12-17, deepseek-reasoner
-n, --noise-type:
Choose input data noise type. 'clean' means using data generated with exactly DTMF frequencies, 'noise' means using data generated with added noise.
- Options: noise, clean

Optional Arguments

-r, --result-save-filename:
The file name to save results. (default: results_{model}_{noise-type}_{subcommand}[_{optional arguments}].csv)

Subcommands (Input Types)

freq_text: Raw frequency-magnitude text input
python dtmf_run.py freq_text -m <model> -n <noise-type> [-g]
Optional Arguments:
1. -g, --guide:
  Add step-by-step guidance if set, otherwise no guidance will be added in the prompt
freq_plot: Plot frequency-magnitude pairs into line plot for input
python dtmf_run.py freq_plot -m <model> -n <noise-type> [-g -gr]
Note that freq_plot only supports model GPT-4o and GPT-o1.
Optional Arguments:
1. -g, --guide:
  Add step-by-step guidance if set, otherwise no guidance will be added in the prompt
2. -gr, --grid:
  Add grid lines to the input plots is set, otherwise input plots will not contain any grid lines
freq_pair: Input low/high frequency pair directly
python dtmf_run.py freq_pair -m <model> -n <noise-type> [-map]
Optional Arguments:
1. -map, --map:
  Provide the true DTMF frequency-key mapping in the map if set.

2. Evaluation:

python dtmf_eval.py -r <result-save-filename> [optional arguments]
This script calculates the overall accuracy and 3 break-down accuracies, unless --no-detail-acc is set. This script will also plot their confusion matrices. All evaluation results are saved under ./results/dtmf/ by default.

Accuracy of recognized key comparing to the true key (Overall accuracy). Its confusion matrx filename is default to conf_matrix_{result-save-filename}_overall.png.
Accuracy of detected low frequency comparing to the true low frequency. Its confusion matrx filename is default to conf_matrix_{result-save-filename}_low_freq_{err-tolerance}Hz.png.
Accuracy of detected high frequency comparing to the true high frequency. Its confusion matrx filename is default to conf_matrix_{result-save-filename}_high_freq_{err-tolerance}Hz.png. The distribution plot of frequency detection error is saved in freq_error_dist_{result-save-filename}.png.
Accuracy of recognized key comparing to the detected frequency. Its confusion matrx ifilename is default to conf_matrix_{result-save-filename}_freq2key_{err-tolerance}Hz.png.

Required Argument:

-r, --result-save-filename:
Results file name used in dtmf_run.py. A prefix of "result_" and a file extension of ".csv" will be automatically added.

Optional Arguments:

-e, --err-tolerance:
An integer of error tolerance range for frequency detection (default: 15 for freq_plot results, 5 for results of all other input types)
--no-detail-acc:
Calculate step-by-step accuracies or not. If guidance is included, step-by-step accuracies will be calculated by default, set --no-detail-acc to disable this feature. Otherwise only overall accuracy will be calculated, regardless of whether this argument is set or not.

Human Activity Recognition (HAR)

1. Run the experiments:

python shl_run.py -m <model> -i <input_type> [optional arguments]

Required Arguments

-m, --model:
Choose the model (LLM) for this task. Support models: GPT-4o, GPT-o1, DppeSeek-R1
- Options: gpt-4o, o1-2024-12-17, deepseek-reasoner
-i, --input:
Select the input representation format.
- Options:
  - time_text: IMU time-series as raw text
  - time_text_fewshot: Raw text + one raw text example for each class
  - time_text_description: Raw text + textual summary
    (Note: the following input types only suppport model GPT-4o and GPT-o1.)
  - time_plot: Time-series line plot
  - time_plot_fewshot: Line plot + one line plot example for each class
  - time_plot_env: Line plot + environment photo taken when the activity is happening
  - env_only: Environment photo only

Optional Arguments

-dn, --data-num:
Number of test samples per class (default: 30)
-df, --data-folder:
Path to the dataset folder (default: ./datasets/SHL_processed/User1/220617/Torso_video/)
-l, --location:
The body location on which the smartphone used for IMU data collection is placed (default: Torso)
-f, --frequency:
Sampling frequency in Hz (default: 10 for time_text_fewshot and time_text_description, otherwise 100)
-r, --result-save-filename:
The file name to save results. Will be default to results_{model}_User1_220617_{4*data-num}_{location}_{input}_4class.csv if not set.

2. Evaluation:

python shl_eval.py -r <result-save-filename>

This script calculates the recognition accuracy, and plot the confusion matrix. The confusion matrix is saved as ./results/HAR/conf_matrix_{result-save-filename}.png by default.

Required Argument:

-r, --result-save-filename:
Results file name used in shl_run.py. A prefix of "result_" and a file extension of ".csv" will be automatically added. This filename will also be used to generate the confusion matrix file name.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
config		config
datasets		datasets
results		results
README.md		README.md
dtmf-generator.py		dtmf-generator.py
dtmf_eval.py		dtmf_eval.py
dtmf_gen_overflow.jpg		dtmf_gen_overflow.jpg
dtmf_run.py		dtmf_run.py
report.md		report.md
shl_eval.py		shl_eval.py
shl_run.py		shl_run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Can LLMs Classify Time Series Data?

Abstract

DTMF

1. Run the experiments:

Required Arguments

Optional Arguments

Subcommands (Input Types)

2. Evaluation:

Required Argument:

Optional Arguments:

Human Activity Recognition (HAR)

1. Run the experiments:

Required Arguments

Optional Arguments

2. Evaluation:

Required Argument:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

nesl/LLM_time_series

Folders and files

Latest commit

History

Repository files navigation

Can LLMs Classify Time Series Data?

Abstract

DTMF

1. Run the experiments:

Required Arguments

Optional Arguments

Subcommands (Input Types)

2. Evaluation:

Required Argument:

Optional Arguments:

Human Activity Recognition (HAR)

1. Run the experiments:

Required Arguments

Optional Arguments

2. Evaluation:

Required Argument:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages