-
Notifications
You must be signed in to change notification settings - Fork 1
Correlations on Binary Data
Brian Wylie edited this page Aug 16, 2023
·
2 revisions
SageWorks will compute correlations on numeric data, including binary (0,1) data. The toolkit will utilize the Pearson's correlation calculation on the numeric fields and for binary data this calculation gives you equivalent of the Phi coefficient ( https://en.wikipedia.org/wiki/Phi_coefficient).
If we have binary data that quite similar:
'binary1': [0, 1, 0, 1, 1, 0, 1, 0, 1],
'binary2': [0, 1, 0, 0, 1, 0, 1, 0, 1]
Then the correlations will look like this
ds = DataSource("my_data")
ds.correlations()
binary1 binary2
binary1 1.0 0.8
binary2 0.8 1.0