Demography milestone overview #222

davidorme · 2024-04-23T10:48:45Z

davidorme
Apr 23, 2024
Maintainer

This discussion is to provide an overview of the structures that we want to work with on in the April - September 2024 ICCS support session. That is currently being grouped under a single milestone: Demography and allocation model, which currently has 5 open issues.

The data that we're going to be working with has multiple axes, which are going to lead to some pretty ragged data structures, and it would be great to get a good start on how best to approach these. We do have the beginnings of some of these structures in the virtual_ecosystem project (new name for virtual_rainforest): these are placeholders that currently just set the data structures and do no real science and I think most of this will move into pyrealm.

The theory here is:

The existing P Model module calculates light use efficiency (LUE) given environmental conditions - this dictates how many µmols of Carbon a plant can capture per m2 s1, given a light flux in µmol m2 s1 of photosynthetic photons (PPFD - photosynthetic photon flux density).
An area has a community of plants, structured into cohorts of different functional types,
The community generates a vertically layered canopy structure, where we know how much canopy area is present in each layer for each cohort,
The incoming solar radiation filters down and is absorbed through those layers, so we know the PPFD at each layer.
We can then work out the gross primary productivity (GPP) for each member in a cohort for each canopy layer as the product of the canopy area in a layer, the calculated LUE for the layer and the PPFD reaching the layer. Those sum across layers to give a GPP for each stem.
We can then use an allocation model to divide GPP into respiration, turnover and net primary productivity (NPP).
NPP is used to predict growth, update the canopy structure and so step forward through time.

One key design point in what we have so far is that the virtual_ecosystem has to model multiple communities (one in each grid cells in a spatial structure). This is probably what we want in pyrealm too.

Structures

Plant Functional Types (PFT)

Each plant has PFT, consisting of a set of parameters that govern the shape of the growing plant and the way in which it allocates GPP. The basic structures we have already are:

the pyrealm TModelTraits dataclass: this is a simple dataclass holding traits for a single PFT.
the virtual_ecosystem Flora class: this is basically just a dictionary of PFTs with a factory method from config. The dictionary entries are currently a hugely simplified placeholder for the TModelTraits data.

Cohorts

A cohort is basically just a number of individuals of a single PFT in a single community that share the same size class, defined using the diameter at breast height (DBH). All members of a cohort follow the same growth trajectory and experience the same vertical light environment and conditions. We have:

the virtual_ecosystem PlantCohort dataclass.

Communities

A community is just a collection of cohorts in a single location, that will grow together and form the canopy model for that location. We have:

The virtual_ecosystem PlantCommunities class, which collects cohorts into multiple communities across multiple locations.

The number of cohorts in a community changes at each time step as new cohorts are recruited and existing cohorts die off (number of individuals in cohort == 0).

Canopy model

This is a relatively standalone part - it takes a single community and returns the vertical canopy structure. The complexity here is that the number of canopy layers is not fixed and emerges from the canopy model.

I have a draft notebook showing an example of the code, but it isn't in the repo yet. I will add it.

Design structure

At the moment, we have a mix of object-oriented and array based approaches. One advantage of the object oriented approach is that the structures can just be lists of objects of differing sizes. Each community object has an isolated set of cohorts and adding cohorts can be removed and added. Each canopy model could have a list of canopy layer objects, which would only need to hold the actual number of canopy layers.

However, most (if not all) of the calculations we need to make on these data can be implemented using array based calculations, which would likely be much faster than iterating scalar calculations over objects. So, for example, a community could consist of:

a 1D array of DBH values
a 1D array of number of individuals
multiple 1D arrays of containing PFT traits, with values repeated to match to each cohort.
If we have multiple locations, you can 'simply' concatenate the structures above end to end and add a location index.

Tricky bits:

the arrays need to grow and shrink along the axis as cohorts die or are recruited.
the canopy model sets up a vertical structure that is unique to a location and is matched to only cohorts within that location.

I'm leaning towards having a Community class with array attributes as above - each cohort is then just a particular index along those arrays - and keeping separate locations as a list of Community objects rather than trying to calculate things across all communities at once.

Thoughts?

davidorme · 2024-04-24T15:21:50Z

davidorme
Apr 24, 2024
Maintainer Author

I thought it would be useful to provide a basic example of some plausible starting data:

pfts.csv: contains PFT attributes

a_hd,ca_ratio,h_max,lai,name,par_ext,resp_f,resp_r,resp_s,rho_s,sla,tau_f,tau_r,yld,zeta
116.0,390.43,25.33,1.8,test1,0.5,0.1,0.913,0.044,200.0,14.0,4.0,1.04,0.17,0.17
116.0,390.43,15.33,1.8,test2,0.5,0.1,0.913,0.044,200.0,14.0,4.0,1.04,0.17,0.17

community.csv: contains cohort data for two locations ('cell_id').

cell_id,pft,dbh,n
1,test1,0.2,6
1,test1,0.25,6
1,test1,0.3,3
1,test1,0.35,1
1,test2,0.5,1
1,test2,0.6,1
2,test1,0.2,6
2,test1,0.25,6
2,test1,0.3,3
2,test1,0.35,1
2,test2,0.5,1
2,test2,0.6,1

And when I do that, I belatedly realise that what we have is a relational database of two tables where community.pft is a foreign key onto pft.name and we should probably just use pandas internally for managing this. I've been avoiding pandas so far, but if we start using xarray elsewhere then it is baked into the requirements, so we should use it.

0 replies

AmyOctoCat · 2024-04-25T15:56:21Z

AmyOctoCat
Apr 25, 2024
Collaborator

My initial (fairly uninformed!) thoughts are that the object oriented approach is likely to be better in terms of readability, so I wonder if this becomes a trade off between readability and performance. Might be worth investigating if possible what the performance difference is like. I also wonder if there's a way to maintain an object oriented approach, but then do some flatmapping of some kind to potentially get a performance benefit (although not sure if this is possible without seeing the context in detail).

I think pandas may also have some known performance issues to watch out for, particularly the iterrows function. From a quick skim, this talks about it quite well: https://towardsdatascience.com/efficiently-iterating-over-rows-in-a-pandas-dataframe-7dd5f9992c01

0 replies

davidorme · 2024-04-25T19:54:25Z

davidorme
Apr 25, 2024
Maintainer Author

I think we can reduce the number of classes down basically to Flora (which is just a fancied up look up table for plant functional type traits) and Community (which contains an inventory of cohort data, possibly only one cohort). We don't really need a separate cohort class, I think. Then basically all data is one of:

a trait value that is constant within a plant functional type, giving rise to 1D arrays with shape (n_pfts,) for each trait.
a 1D array with shape (n_cohorts,) representing values across plant cohorts (which might include PFT traits, mapped onto a matching array on cohort PFT identity).
a 2D array with shape (n_cohorts, n_vertical_layers) representing cohort vertical structure
a 1D array with shape (vertical_layers,) representing properties across a community (=location) like the vertical canopy layer heights.
a 2D array with shape (n_communities, n_vertical_layers) representing community properties across locations.

All of that comes from those CSV inputs (apart from the things I've forgotten). I'm thinking pandas is probably just a convenient way to load those initial dataframes and then everything else should be using entire pd.Series in a numpy like way and never iterating over rows.

I've got some prototype stuff to play with - will put it up as a WIP once I've cleared off the rougher edges...

0 replies

davidorme · 2024-04-26T15:10:22Z

davidorme
Apr 26, 2024
Maintainer Author

I've just added a WIP PR with some code, docs and those files: #227

0 replies

vgro · 2024-05-02T09:40:32Z

vgro
May 2, 2024
Collaborator

I think your draft looks overall very sensible, the performance I cannot judge :-) I have a few general random questions/comments that might be addressed along the way.

How is the available light divided between plants, is there an implicit light competition?
I don't fully understand what plant_cohorts_n represents in the community, will there more than one cohort of each PFT with the same dbh? Or is it the number of individuals? Maybe a small change to the name might make it clearer
On the naming of variables, did you follow a reference? I already spotted a few that are used in the abiotic model (in virtual_ecosystem) which might lead to confusion.
It was not clear to me how the canopy height z* is calculated, i.e. the red lines in the graph.
Minor point, the layers without LAI have a value of 0, why not nan? Or in other words, do you consider an array of maximum size like in the virtual ecosystem?

0 replies

davidorme · 2024-05-02T10:17:13Z

davidorme
May 2, 2024
Maintainer Author

I'm not too worried about performance to begin with - I'm just keen to see if there are obviously better ways to factor this.

How is the available light divided between plants, is there an implicit light competition?

Yes - each individual in each cohort has a distribution of leaf area through the canopy and the canopy structure provides a light profile. So we should know: i) what the environment is within each layer (temp, vapour pressure etc) and ii) what the light flux reaching that layer is.

I don't fully understand what plant_cohorts_n represents in the community, will there more than one cohort of each PFT with the same dbh? Or is it the number of individuals? Maybe a small change to the name might make it clearer

It's the number of individuals - so yes, name change needed 😄

On the naming of variables, did you follow a reference? I already spotted a few that are used in the abiotic model (in virtual_ecosystem) which might lead to confusion.

I did not. Happy to change names.

It was not clear to me how the canopy height z* is calculated, i.e. the red lines in the graph.

The cumulative canopy area for the community gives the total amount of canopy. If the area of the location was unlimited then the PPA model would say there is just one layer. But with limited area, we basically move across that cumulative area curve until we get to the area limit. That is the closure height of $z^*_1$ and we then keep going until we get to 2 times the area etc.

Minor point, the layers without LAI have a value of 0, why not nan? Or in other words, do you consider an array of maximum size like in the virtual ecosystem?

I think it was just to make the sums easier - but we certainly could alter this to make it more compatible.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demography milestone overview #222

{{title}}

Replies: 6 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Demography milestone overview #222

davidorme Apr 23, 2024 Maintainer

Structures

Plant Functional Types (PFT)

Cohorts

Communities

Canopy model

Design structure

Replies: 6 comments

davidorme Apr 24, 2024 Maintainer Author

AmyOctoCat Apr 25, 2024 Collaborator

davidorme Apr 25, 2024 Maintainer Author

davidorme Apr 26, 2024 Maintainer Author

vgro May 2, 2024 Collaborator

davidorme May 2, 2024 Maintainer Author

davidorme
Apr 23, 2024
Maintainer

davidorme
Apr 24, 2024
Maintainer Author

AmyOctoCat
Apr 25, 2024
Collaborator

davidorme
Apr 25, 2024
Maintainer Author

davidorme
Apr 26, 2024
Maintainer Author

vgro
May 2, 2024
Collaborator

davidorme
May 2, 2024
Maintainer Author