- AIHub Korean Dataset
- JSON Parsing
- Data Cleaning
- Regular Expression
- TXT Merge
- Multiprocessing
- Exploratory Data Analysis
- Length of Source List
- The Number of Character
- Capacity
- Preprocessing Runtime Calculator
- Preprocessing Memory & Process & Thread
- Future Work
- Dataframe(Pandas or Polars), Dictionary Optimization
- The Searcher of Source JSON
To compare JSON and TXT extracted from JSON
File Naming & Storage System (Before & After File Name Match in Excel)
What's the Source TXT File Name to Each Line in Proprocessed TXT File? (Dataframe) - Remove Warning kss 3.7.3 Message: "[Korean Sentence Splitter]: Too long text! turn off quotes calibration!"
- Cython Multithreading
kss
regex
pandas