Conversation
nikola-rados
left a comment
There was a problem hiding this comment.
Just a few styling comments, otherwise looks good @eyvorchuk!
dp/generate_climos.py
Outdated
| logger.info("Converting {} variable to units mm/day".format(flux_var)) | ||
| else: | ||
| logger.info("Converting {} variable to units cm/day".format(flux_var)) |
There was a problem hiding this comment.
The repo has recently been updated to drop support for python <3.6 so that we have access to f-strings. While it may not make a huge difference here, it can make things easier to read overall.
logger.info(f"Converting {flux_var} variable to units mm/day")
dp/generate_prsn.py
Outdated
| valid_units = [ | ||
| Unit('kg / m**2 / s'), | ||
| Unit('mm / s'), | ||
| Unit('mm / d'), | ||
| Unit('kg / d / m**2'), | ||
| Unit('kg / m**2 / d') | ||
| Unit('kg / m**2 / d'), | ||
| Unit('g / cm**2 / s'), | ||
| Unit('cm / s'), | ||
| Unit('cm / d'), | ||
| Unit('g / d / cm**2'), | ||
| Unit('g / cm**2 / d') | ||
| ] |
There was a problem hiding this comment.
It may be helpful to order these in some manner (maybe grouping by mm, cm, g, kg) just to make it easier to follow. You may want to double check with @corviday to ensure we aren't missing any cases here.
dp/generate_prsn.py
Outdated
| prsn_data = prsn_data * Q_(1.0, prsn_units_from).to(prsn_units_to).magnitude | ||
|
|
||
| output_dataset.variables['prsn'][chunk] = prsn_data |
There was a problem hiding this comment.
Have you considered skipping the extra variable assignment and assigning directly?
output_dataset.variables['prsn'][chunk] = prsn_data * Q_(1.0, prsn_units_from).to(prsn_units_to).magnitude
jameshiebert
left a comment
There was a problem hiding this comment.
Excellent, way to dive into this code base and tackle the problem! I think that you have essentially added the logic necessary to solve the problem, so you clearly understood what needed to be done. Good work!
I have a couple comments on making the code a little better, and some comments for giving us confidence in the results. And the units are confusing me a bit, so maybe you can answer my question there.
You can push more commits to this branch, and it will update the PR.
dp/generate_prsn.py
Outdated
| valid_units = [ | ||
| Unit('kg / m**2 / s'), | ||
| Unit('mm / s'), | ||
| Unit('mm / d'), |
There was a problem hiding this comment.
It looks like you're mixing whitespace here... the existing file uses spaces and you've inserted lines with tabs. You'll should conform to what's there already and make it consistent.
dp/generate_prsn.py
Outdated
| Unit('cm / s'), | ||
| Unit('cm / d'), | ||
| Unit('g / d / cm**2'), | ||
| Unit('g / cm**2 / d') |
There was a problem hiding this comment.
With all of the valid units that you've added to this list, it's probably worth while to eliminate some redundancy by instantiate the Unit objects in a loop. E.g. define the list as strings first and then loop over them.
valid_units = ['kg / m**2 / s' ...]
valid_units = [Unit(x) for x in valid_units]
dp/generate_climos.py
Outdated
|
|
||
| if units in [Unit('kg / m**2 / s'), Unit('mm / s')]: | ||
| logger.info("Converting {} variable to units mm/day".format(flux_var)) | ||
| if units in [Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')]: |
There was a problem hiding this comment.
An option that would make this more readable is to define each set of units as a variable to use in your conditional. For example:
precip_units = {Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')}
snow_units = {Unit('g / m**2 / s'), Unit('cm / s')}
if units in precip_units:
...
dp/generate_climos.py
Outdated
| if units in [Unit('kg / m**2 / s'), Unit('mm / s')]: | ||
| logger.info("Converting {} variable to units mm/day".format(flux_var)) | ||
| if units in [Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')]: | ||
| if units in [Unit('kg / m**2 / s'), Unit('mm / s')]: |
There was a problem hiding this comment.
Do you need this conditional? It only changes one thing... the output unit in the log message. But you already have the string for the output unit string below: (units * Unit('s / day')).to_udunits_str(). Save that here, then use it both for your log message and for your attributes. Then the conditional can go away.
| prsn_data = np.where(means < freezing, chunk_data['pr'], 0) | ||
|
|
||
| prsn_units_from = ureg.parse_units('kg/m^2/s') | ||
| prsn_units_to = ureg.parse_units('g/cm^2/s') |
There was a problem hiding this comment.
Can you tell me more about using this as the target unit? You've changed both the mass (kg -> g) and the area (per square meter to per square centimeter). Is that definitely what we want? And is it definitely correct?
There was a problem hiding this comment.
Converting from kg -> g multiplies the data by 1000 and converting from /m^2 -> /cm^2 divides the data by 10000, giving the net effect of dividing the data by 10. Since converting from mm to cm also divides by 10, I think this will work. I can check with Lee when he's back.
| assert fake_dataset.variables['prsn'].standard_name == 'snowfall_flux' | ||
| assert fake_dataset.variables['prsn'].long_name == 'Precipitation as Snow' | ||
|
|
||
| assert fake_dataset.variables['prsn'].units == 'g cm-2 s-1' |
There was a problem hiding this comment.
There's not actually any check on the data values in this test, so it's hard to know whether it's doing the conversion correctly. Could you work out what the sum snowfall in this data should be an add that to the test?
There was a problem hiding this comment.
I believe the dataset used for test_process_to_prsn results in a prsn array of all zeros since these tests pass.
result = fake_dataset.variables['prsn'][:]
result = np.where(result != 0)
for array in result:
assert len(array) == 0Is there possibly another dataset I could use to test the conversion?
There was a problem hiding this comment.
Oh right, you had mentioned that. Well that's not a great testing dataset, then is it?! :)
Maybe you could throw a few strategic low temperature values into the test data set... it might throw some other tests out of whack, but you could adjust as necessary.
There was a problem hiding this comment.
Since we're only concerned about the conversion, would it be ok if I store the pr dataset in two variables, perform the conversion on one of them, and check to make sure that each value in the two datasets differs by a factor of 10? Or would you prefer I modify the data set and tests where needed?
|
I have a question: Why convert units? P2A, at least, converts units on the frontend; that is a presentation issue, which is the responsibility of the UI, not of the data backend. Possibly the Marmot may need this, possibly lacking units conversions? (That can be remedied, particularly given the utilities now in P2A.) |
I'm not sure. @corviday wrote the original issue. |
|
I forgot I'd written that issue. At the time I wrote it, we did not yet have p2a, just PCEX, so it didn't occur to me that we might do unit conversion in the front end instead. |
The
generate_climos.pyandgenerate_prsn.pyfiles have been modified to convert the prsn data frommmtocm(equivalently,kg/m2tog/cm2). Thetest_generate_prsn.pyfile has an additional assert statement to ensure the correct units.