You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our CSV complete 2025 data is 16,902,955 rows containing a few rows for each cluster (cluster + treatment)
We should split the CSV into multiple CSVs and then recombine them after processing. I think splitting by county would be clean -- some counties probably have a high share of the biomass but if it's a problem we can reassess.
Also if it's easier to just split by cluster number into equally sized buckets that's fine too.
The trick is, each CSV will need headers and when you recombine them you need to do it so you only end up with headers at the top. I think this can be done via bash w/o needing javascript code but give it a shot.
And finally, once this code is written and tested, we'll need to write some scripts to actually run it. Perhaps it could go through every mini CSV and move one into an 'inprogress' folder while working on it and then to done when finished. The key is to be repeatable and resilient so we don't accidentally double process the same cluster/treatment.
The text was updated successfully, but these errors were encountered:
Our CSV complete 2025 data is 16,902,955 rows containing a few rows for each cluster (cluster + treatment)
We should split the CSV into multiple CSVs and then recombine them after processing. I think splitting by county would be clean -- some counties probably have a high share of the biomass but if it's a problem we can reassess.
Also if it's easier to just split by cluster number into equally sized buckets that's fine too.
The trick is, each CSV will need headers and when you recombine them you need to do it so you only end up with headers at the top. I think this can be done via bash w/o needing javascript code but give it a shot.
And finally, once this code is written and tested, we'll need to write some scripts to actually run it. Perhaps it could go through every mini CSV and move one into an 'inprogress' folder while working on it and then to done when finished. The key is to be repeatable and resilient so we don't accidentally double process the same cluster/treatment.
The text was updated successfully, but these errors were encountered: