Simple tool to convert your CSV file into a format ready for uploading to Dataverse or Borealis data repositories.
No coding experience needed! Just follow the simple steps below.
✅ Takes your dataset information from a CSV file
✅ Automatically creates a JSON file that Dataverse/Borealis can understand
✅ Fills in missing required information automatically
✅ Ready to upload directly to your data repository
-
Open your terminal/command prompt
-
Go to the folder with this tool:
cd /workspaces/CSV_to_Dataverse_JSON -
Install the required software:
pip install -r requirements.txt
Create a CSV file with your dataset information. See examples below in the CSV Format section.
Option 1: Super Simple (Uses Default Values)
python csv_to_dataverse_json.pyThis reads: Csv_to_json - Citation.csv
Creates: output_metadata.json
Option 2: Add Your Own Author, Email & Description
python csv_to_dataverse_json.py "Csv_to_json - Citation.csv" "my_dataset.json" \
--default-author "Your Name" \
--default-email "your.email@organization.edu" \
--default-description "What is this dataset about?"What to change:
"Your Name"→ Your name"your.email@organization.edu"→ Your email"What is this dataset about?"→ Brief description of your dataset
Example:
python csv_to_dataverse_json.py "Csv_to_json - Citation.csv" "my_dataset.json" \
--default-author "Dr. Jane Smith" \
--default-email "jane.smith@university.edu" \
--default-description "Survey data from 2024 research project"Option 3: Use Environmental Variables (Save Your Defaults)
export DATAVERSE_DEFAULT_AUTHOR="Your Name"
export DATAVERSE_DEFAULT_EMAIL="your.email@organization.edu"
export DATAVERSE_DEFAULT_DESCRIPTION="Your description"
python csv_to_dataverse_json.py input.csv output.jsonOption 4: Use Custom CSV File
python csv_to_dataverse_json.py your_file.csv your_output.jsonYour new file will be created (e.g., my_dataset.json or output_metadata.json).
To view it:
cat my_dataset.json| What to Do | Command |
|---|---|
| Simple conversion with defaults | python csv_to_dataverse_json.py |
| Add your info (Recommended) | python csv_to_dataverse_json.py input.csv output.json --default-author "Name" --default-email "email@org.edu" --default-description "Description" |
| Custom input/output files | python csv_to_dataverse_json.py my_data.csv result.json |
When your CSV is missing required fields like author name, contact email, or description:
- First: Uses values from your CSV (if you provided them)
- Second: Uses the values you gave with
--default-author,--default-email, etc. - Third: Uses environment variables you set earlier
- Last: Uses generic placeholders like "Unknown Author" or "No description provided"
These fields define the dataset at the system level. If not provided, the script generates sensible defaults:
id- Dataset ID (auto-generated if missing: 1000+row_index)identifier- Persistent identifier (auto-generated if missing: FK2/random)protocol- Protocol for identifier, typicallydoiorhdl(default:doi)authority- Authority prefix, e.g.,10.70122(default:10.70122)publisher- Publisher name (default:Dataverse)publicationDate- Publication date in YYYY-MM-DD format (default: today)datasetType- Type of dataset (default:dataset)versionId- Version ID (auto-generated if missing: 2000+row_index)versionNumber- Version number (default:1)versionMinorNumber- Minor version (default:0)versionState- Version state:DRAFT,RELEASED, etc. (default:DRAFT)latestVersionPublishingState- Publishing state (default:DRAFT)storageIdentifier- Storage location (auto-generated if missing)UNF- Universal Numeric Fingerprint (optional)lastUpdateTime- Last update timestamp (auto-generated)createTime- Creation timestamp (auto-generated)citationDate- Citation date (default: today)termsOfUse- Terms of use text (optional)citationRequirements- Citation requirements (optional)conditions- Access conditions (optional)termsOfAccess- Terms of access (optional)fileAccessRequest- Allow file access requests:trueorfalse(default:true)
licenseName- License name (default:CC0 1.0)licenseUri- License URI (default: CC0 URI)licenseIconUri- License icon URL (default: CC0 icon)rightsIdentifier- Rights identifier (default:CC0-1.0)rightsIdentifierScheme- Scheme name (default:SPDX)schemeUri- Scheme URI (default: SPDX URL)languageCode- Language code (default:en)
Your CSV should have the following column headers for citation metadata:
title- Dataset titlesubtitle- Dataset subtitlealternativeTitle- Alternative title (use|for multiple values)notesText- Additional noteslanguage- Language (use|for multiple; controlled vocabulary)
Use semicolon (;) to separate subfields and pipe (|) for multiple entries:
-
author: authorName; authorAffiliation- Example:
John Smith; Harvard University | Jane Doe; MIT
- Example:
-
datasetContact: datasetContactName; datasetContactAffiliation; datasetContactEmail- Example:
Contact Person; University; contact@email.com
- Example:
-
producer: producerName; producerAffiliation; producerAbbreviation- Example:
Statistics Canada; Statistics; StatCan
- Example:
-
contributor: contributorType; contributorName- Example:
Researcher; John Smith
- Example:
dsDescription: dsDescriptionValue; dsDescriptionDate- Example:
This is a dataset description; 2025-01-01
- Example:
-
subject- Subject category (use|for multiple; controlled vocabulary)- Example:
Social Sciences | Medicine, Health and Life Sciences
- Example:
-
keyword: keywordValue; keywordVocabulary- Example:
sample; | test; | data
- Example:
-
topicClassification: topicClassValue; topicClassVocab- Example:
survey; | statistics;
- Example:
-
productionDate- Year produced -
distributionDate- Year distributed -
dateOfDeposit- Deposit date -
timePeriodCovered: timePeriodCoveredStart; timePeriodCoveredEnd- Example:
2025-01-01; 2025-12-31
- Example:
-
dateOfCollection: dateOfCollectionStart; dateOfCollectionEnd- Example:
2024-01-01; 2024-12-31
- Example:
-
kindOfData- Type of data (use|for multiple)- Example:
Quantitative | Qualitative
- Example:
-
software: softwareName; softwareVersion- Example:
R; 4.0.0
- Example:
-
series: seriesName; seriesInformation- Example:
Annual Statistics; Volume 5
- Example:
relatedMaterial- Related materials (use|for multiple)relatedDatasets- Related datasets (use|for multiple)otherReferences- Other references (use|for multiple)dataSources- Data sources (use|for multiple)
originOfSources- Origin of sourcescharacteristicOfSources- Characteristics of sourcesaccessToSources- Access to sources
publication: publicationRelationType; publicationCitation; publicationIDType; publicationIDNumber- Example:
IsCitedBy; Smith et al. 2024; DOI; 10.1234/example
- Example:
-
grantNumber: grantNumberAgency; grantNumberValue- Example:
NSERC; GRANT-12345
- Example:
-
distributor: distributorName; distributorAffiliation; distributorAbbreviation; distributorURL
depositor- Person depositing the datasetdoi- Digital Object Identifier (optional, for reference)citation- Full citation (optional, for reference)
title,author: authorName; authorAffiliation,subject,keyword: keywordValue; keywordVocabulary,dsDescription: dsDescriptionValue; dsDescriptionDate
"My Research Dataset","John Smith; University of Toronto | Jane Doe; MIT","Social Sciences | Computer Science","research; | data; | survey;","This dataset contains survey responses; 2025"The script generates complete Dataverse-compatible JSON with all system fields, version information, and metadata blocks:
{
"id": 1000,
"identifier": "FK2/DBEC899D",
"persistentUrl": "https://doi.org/10.70122/FK2/DBEC899D",
"protocol": "doi",
"authority": "10.70122",
"separator": "/",
"publisher": "Dataverse",
"publicationDate": "2025-12-11",
"storageIdentifier": "s3://10.70122/FK2/DBEC899D",
"datasetType": "dataset",
"datasetVersion": {
"id": 2000,
"datasetId": 1000,
"datasetPersistentId": "doi:10.70122/FK2/DBEC899D",
"datasetType": "dataset",
"storageIdentifier": "s3://10.70122:...",
"versionNumber": 1,
"internalVersionNumber": 1,
"versionMinorNumber": 0,
"versionState": "DRAFT",
"latestVersionPublishingState": "DRAFT",
"UNF": "",
"lastUpdateTime": "2025-12-11T16:49:19Z",
"releaseTime": "",
"createTime": "2025-12-11T16:49:19Z",
"publicationDate": "2025-12-11",
"citationDate": "2025-12-11",
"termsOfUse": "",
"citationRequirements": "",
"conditions": "",
"termsOfAccess": "",
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0",
"iconUri": "https://licensebuttons.net/p/zero/1.0/88x31.png",
"rightsIdentifier": "CC0-1.0",
"rightsIdentifierScheme": "SPDX",
"schemeUri": "https://spdx.org/licenses/",
"languageCode": "en"
},
"fileAccessRequest": true,
"metadataBlocks": {
"citation": {
"displayName": "Citation Metadata",
"name": "citation",
"fields": [
{
"typeName": "title",
"multiple": false,
"typeClass": "primitive",
"value": "My Research Dataset"
},
{
"typeName": "author",
"multiple": true,
"typeClass": "compound",
"value": [
{
"authorName": {
"typeName": "authorName",
"multiple": false,
"typeClass": "primitive",
"value": "John Smith"
},
"authorAffiliation": {
"typeName": "authorAffiliation",
"multiple": false,
"typeClass": "primitive",
"value": "Harvard University"
}
}
]
}
]
},
"geospatial": {...},
"socialscience": {...}
},
"files": [...]
},
"citation": "Full citation string (optional)"
}-
In Dataverse:
- Go to your dataset → Edit → Metadata
- Use the JSON API endpoint or direct metadata upload
- POST the generated JSON to:
/api/datasets/{id}/metadata
-
In Borealis:
- Follow similar import procedures for Dataverse JSON formats
Solution 1: Make sure you're in the right folder:
cd /workspaces/csv_to_dataverse_jsonSolution 2: Check that Csv_to_json - Citation.csv exists in the folder
Solution 3: Make sure you installed the required software:
pip install -r requirements.txt- Check that your CSV file has the same column names as shown in the examples
- Make sure column names are spelled exactly right
- Empty cells in your CSV are automatically skipped (this is fine)
Use this command:
python csv_to_dataverse_json.py your_file.csv output.jsonChange your_file.csv to your actual filename.
- Dates should be in this format:
YYYY-MM-DD(example:2024-12-11) - If you just have the year, that's fine too:
2024
csv_to_dataverse_json.py- Main tool (do not edit)Csv_to_json - Citation.csv- Example CSV filerequirements.txt- List of required softwareREADME.md- This file (instructions)TEMPLATE_CSV_WITH_ALL_COLUMNS.csv- Full template with all possible fields
Please create an issue for questions or issues.