Skip to content

Correlations on Binary Data

Brian Wylie edited this page Aug 16, 2023 · 2 revisions

SageWorks will compute correlations on numeric data, including binary (0,1) data. The toolkit will utilize the Pearson's correlation calculation on the numeric fields and for binary data this calculation gives you equivalent of the Phi coefficient ( https://en.wikipedia.org/wiki/Phi_coefficient).

If we have binary data that quite similar:

'binary1': [0, 1, 0, 1, 1, 0, 1, 0, 1],
'binary2': [0, 1, 0, 0, 1, 0, 1, 0, 1]

Then the correlations will look like this

ds = DataSource("my_data")
ds.correlations()

         binary1  binary2
binary1      1.0      0.8
binary2      0.8      1.0