Seisyun Information Entropy

This repository contains the implementation and data for quantifying the concept of "Seisyun" (青春), a term widely used in Japan to describe youth, through a novel measure called Seisyun Information Entropy. The method combines advanced natural language processing techniques and large language models (LLMs) to establish a quantitative framework for analyzing and understanding this abstract concept.

Overview

The study introduces a mathematical and computational framework to measure the "youthfulness" of a text by considering three key factors:

Unusualness: Using the probability of word predictions from pre-trained BERT models to calculate unexpectedness in text.
Positivity: Sentiment analysis to evaluate the positive nature of the text.
Fluency: Evaluating the grammatical correctness of the text.

These metrics are integrated into a single measure, Seisyun Information Entropy, using information theory principles.

Formula

The Seisyun Information Entropy for a given text (S) is defined as:

$$I_{adolescence}(S) = -\log_2 P_{unusual}(S) - \log_2(1 - P_{positive}(S)) - \log_2(1 - P_{fluency}(S))$$

Where:

$P_{unusual}(S)$: The average word prediction probability.
$P_{positive}(S)$: The probability that the text is classified as positive.
$P_{fluency}(S)$: The probability that the text is grammatically correct.

Methodology

The framework leverages multiple pre-trained BERT models:

Unusualness:
- Tohoku BERT (Japanese)
- Predicts word probabilities to assess unexpectedness.
Positivity:
- Sentiment-Enhanced BERT
- Analyzes sentiment polarity.
Fluency:
- Fluency-Scoring BERT
- Evaluates grammatical correctness.

Repository Structure

module/: Contains the core modules for calculating Seisyun Information Entropy.
main.py: The main script to execute the entropy calculations.
seisyun.csv: Dataset containing text samples related to "Seisyun".
goodness.csv: Dataset with human-annotated "youthfulness" scores.
comparison.csv: Results comparing calculated entropy with human scores.
plot.png: Visualization of the comparison results.
requirement.txt: List of dependencies required to run the project.
.gitignore: Specifies files and directories to be ignored by git.
LICENSE: License information for the repository.

Getting Started

Prerequisites

Python 3.8+
PyTorch
Hugging Face Transformers

Install dependencies:

pip install -r requirement.txt

Usage

For testing the result of thesis, run:

python main.py

Results

Experimental results demonstrated a weak positive correlation ((r = 0.333)) between Seisyun Information Entropy and manually generated "youthfulness" rankings of texts. The significance test ((p = 0.035)) confirmed the statistical validity of the measure at a 5% significance level.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Document Availability

The complete manuscript is accessible online at the following address:

https://web.cshe.nagoya-u.ac.jp/support/student/contest/img/2024_kotama.pdf

References

If you use this work, please cite:

@unpublished{kotama:seisyun_entropy_llm,
  author        = {Takanori Kotama},
  title         = {A Quantitative Definition of Youth Using Large Language Models},
  year          = {2025},
  note          = {Award-winning paper of the Nagoya University Student Paper Contest (Encouragement Prize), presented in 2025},
  url           = {https://web.cshe.nagoya-u.ac.jp/support/student/contest/img/2024_kotama.pdf}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Seisyun Information Entropy

Overview

Formula

Methodology

Repository Structure

Getting Started

Prerequisites

Usage

Results

License

Document Availability

References

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
module		module
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
comparison.csv		comparison.csv
goodness.csv		goodness.csv
main.py		main.py
plot.png		plot.png
requirement.txt		requirement.txt
seisyun.csv		seisyun.csv

License

kotama7/seisyun_information_entropy

Folders and files

Latest commit

History

Repository files navigation

Seisyun Information Entropy

Overview

Formula

Methodology

Repository Structure

Getting Started

Prerequisites

Usage

Results

License

Document Availability

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages