-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I tried something extreme, and the results were too: I generated weather data with solar orientations, tilts, wind directions, ... in total about 1600 variables which resulted in this formula:
Value ~ HDD_13 + GlobalIrradianceO270T90 + HDD_3 + windComponentSquared180 + GlobalIrradianceO265T80 + precipIntensity + windComponent95 + GlobalIrradianceO265T75 + CDD_22 + GlobalIrradianceO275T20 + GlobalIrradianceO260T50 + GlobalIrradianceO40T60 + windComponentCubed145 + GlobalIrradianceO0T0 + GlobalIrradianceO35T90 + GlobalIrradianceO100T55 + GlobalIrradianceO0T85
And got a miraculous RSquared of 1!
I could obviously fix it by reducing the number of variables. But what might also work is this: define certain "families" of variables (for instance, the heating degree days), and make sure the Analysis only uses 1 of them to make its model.
Could just be a list of lists, like
var_structure = [
[HDD_10, HDD_11, ..., HDD_24],
[CDD_10, ...],
[GlobalIrradianceO0T0, GlobalIrradianceO10T10, ...],
...
]@saroele thoughs?