Replies: 6 comments
-
I thought it would be useful to provide a basic example of some plausible starting data: pfts.csv: contains PFT attributes
community.csv: contains cohort data for two locations ('cell_id').
And when I do that, I belatedly realise that what we have is a relational database of two tables where |
Beta Was this translation helpful? Give feedback.
-
My initial (fairly uninformed!) thoughts are that the object oriented approach is likely to be better in terms of readability, so I wonder if this becomes a trade off between readability and performance. Might be worth investigating if possible what the performance difference is like. I also wonder if there's a way to maintain an object oriented approach, but then do some flatmapping of some kind to potentially get a performance benefit (although not sure if this is possible without seeing the context in detail). I think pandas may also have some known performance issues to watch out for, particularly the iterrows function. From a quick skim, this talks about it quite well: https://towardsdatascience.com/efficiently-iterating-over-rows-in-a-pandas-dataframe-7dd5f9992c01 |
Beta Was this translation helpful? Give feedback.
-
I think we can reduce the number of classes down basically to Flora (which is just a fancied up look up table for plant functional type traits) and Community (which contains an inventory of cohort data, possibly only one cohort). We don't really need a separate cohort class, I think. Then basically all data is one of:
All of that comes from those CSV inputs (apart from the things I've forgotten). I'm thinking I've got some prototype stuff to play with - will put it up as a WIP once I've cleared off the rougher edges... |
Beta Was this translation helpful? Give feedback.
-
I've just added a WIP PR with some code, docs and those files: #227 |
Beta Was this translation helpful? Give feedback.
-
I think your draft looks overall very sensible, the performance I cannot judge :-) I have a few general random questions/comments that might be addressed along the way.
|
Beta Was this translation helpful? Give feedback.
-
I'm not too worried about performance to begin with - I'm just keen to see if there are obviously better ways to factor this.
Yes - each individual in each cohort has a distribution of leaf area through the canopy and the canopy structure provides a light profile. So we should know: i) what the environment is within each layer (temp, vapour pressure etc) and ii) what the light flux reaching that layer is.
It's the number of individuals - so yes, name change needed 😄
I did not. Happy to change names.
The cumulative canopy area for the community gives the total amount of canopy. If the area of the location was unlimited then the PPA model would say there is just one layer. But with limited area, we basically move across that cumulative area curve until we get to the area limit. That is the closure height of
I think it was just to make the sums easier - but we certainly could alter this to make it more compatible. |
Beta Was this translation helpful? Give feedback.
-
This discussion is to provide an overview of the structures that we want to work with on in the April - September 2024 ICCS support session. That is currently being grouped under a single milestone: Demography and allocation model, which currently has 5 open issues.
The data that we're going to be working with has multiple axes, which are going to lead to some pretty ragged data structures, and it would be great to get a good start on how best to approach these. We do have the beginnings of some of these structures in the
virtual_ecosystem
project (new name forvirtual_rainforest
): these are placeholders that currently just set the data structures and do no real science and I think most of this will move intopyrealm
.The theory here is:
One key design point in what we have so far is that the
virtual_ecosystem
has to model multiple communities (one in each grid cells in a spatial structure). This is probably what we want inpyrealm
too.Structures
Plant Functional Types (PFT)
Each plant has PFT, consisting of a set of parameters that govern the shape of the growing plant and the way in which it allocates GPP. The basic structures we have already are:
the
pyrealm
TModelTraits dataclass: this is a simple dataclass holding traits for a single PFT.the
virtual_ecosystem
Flora class: this is basically just a dictionary of PFTs with a factory method from config. The dictionary entries are currently a hugely simplified placeholder for theTModelTraits
data.Cohorts
A cohort is basically just a number of individuals of a single PFT in a single community that share the same size class, defined using the diameter at breast height (DBH). All members of a cohort follow the same growth trajectory and experience the same vertical light environment and conditions. We have:
virtual_ecosystem
PlantCohort
dataclass.Communities
A community is just a collection of cohorts in a single location, that will grow together and form the canopy model for that location. We have:
virtual_ecosystem
PlantCommunities
class, which collects cohorts into multiple communities across multiple locations.The number of cohorts in a community changes at each time step as new cohorts are recruited and existing cohorts die off (number of individuals in cohort == 0).
Canopy model
This is a relatively standalone part - it takes a single community and returns the vertical canopy structure. The complexity here is that the number of canopy layers is not fixed and emerges from the canopy model.
I have a draft notebook showing an example of the code, but it isn't in the repo yet. I will add it.
Design structure
At the moment, we have a mix of object-oriented and array based approaches. One advantage of the object oriented approach is that the structures can just be lists of objects of differing sizes. Each community object has an isolated set of cohorts and adding cohorts can be removed and added. Each canopy model could have a list of canopy layer objects, which would only need to hold the actual number of canopy layers.
However, most (if not all) of the calculations we need to make on these data can be implemented using array based calculations, which would likely be much faster than iterating scalar calculations over objects. So, for example, a community could consist of:
Tricky bits:
I'm leaning towards having a
Community
class with array attributes as above - each cohort is then just a particular index along those arrays - and keeping separate locations as a list of Community objects rather than trying to calculate things across all communities at once.Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions