-
Notifications
You must be signed in to change notification settings - Fork 61
Conversation
1e-to
commented
Feb 19, 2020
•
edited
Loading
edited
sdc/functions/numpy_like.py
Outdated
@@ -73,6 +74,10 @@ def nansum(self): | |||
pass | |||
|
|||
|
|||
def corr(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate function definition. Other definition in line 507.
corr(x,y) = cov(x, y)/sqrt(var(x)*var(y)) Please note, that cov here is unbiased, while Series.cov calculates biased one Biased cov calculation is here: Suggestion for var implementation is here: Considering all of it, corr calculation would be something like this: def hpat_pandas_series_corr_impl(self, other, min_periods=None):
if min_periods is None or min_periods < 1:
min_periods = 1
min_len = min(len(self._data), len(other._data))
if min_len == 0:
return numpy.nan
sum_y = 0.
sum_x = 0.
sum_xy = 0.
sum_xx = 0.
sum_yy = 0.
total_count = 0
for i in prange(min_len):
x = self._data[i]
y = other._data[i]
if not (numpy.isnan(x) or numpy.isnan(y)):
sum_x += x
sum_y += y
sum_xy += x*y
sum_xx += x*x
sum_yy += y*y
total_count += 1
if total_count < min_periods:
return numpy.nan
cov_xy = (sum_xy - sum_x*sum_y/total_count)
var_x = (sum_xx - sum_x*sum_x/total_count)
var_y = (sum_yy - sum_y*sum_y/total_count)
corr_xy = cov_xy/sqrt(var_x*var_y)
return corr_xy Haven't verified it. So, please, double check it and fix any mistakes in calculation. Also, if we had loops fusion, we'd be able to simply call three functions (cov(x,y), var(x) and var(y)). |
I tried to use our algorithms for var and cov to test fusing with Todd’s fix, but the unit tests stopped passing, the algorithm does not work correctly. Results have not needed accurancy. Does this mean that we need to rewrite the var and cov? |
It doesn't match because Series.conv calculates biased cov. And you need unbiased one. Have you tried the provided solution? |