Convert text and label data in files into a single data structure that can be used as a common input for all NLP model training