"I have two binary matrixes, of the same size (e.j. 5000x5000). Those matrixes represent the same area, divided in cells of the same size. Each cell of one matrix can be true or false, meaning some property is present or not in this cell. One matrix represents the presence of a property A, and the other one the property B. a= number of cells where both A and B are present b= number of cells where only B is present

Hallie Stanton

Hallie Stanton

Answered question

2022-11-04

I have two binary matrixes, of the same size (e.j. 5000x5000). Those matrixes represent the same area, divided in cells of the same size. Each cell of one matrix can be true or false, meaning some property is present or not in this cell. One matrix represents the presence of a property A, and the other one the property B.
Therefore, I can easily build a 2x2 contingency table using as variables the presence/absence of A and B:
A=1 A=0
B=1 a b
B=0 c d
a= number of cells where both A and B are present
b= number of cells where only B is present
etc.
And
I can apply a chi-square test on this table, building an "expected" contingency table, to assess the independency of both properties.
But I also need to assess if the number of cells that "overlap" (cells that are are true in both matrixes, i.e. where both A and B are present) is higher or lower than expected if both properties were independent. Of course I can compare real and expected value of a in the real and the expected contingency tables, but what I need is some thing like a probability or a measure of how much overlap is higher or lower than expected. In some way, it can also be seen as a measure of the "correlation" between both properties? I know if I had a smaller number of cells I could use Fisher's exact test, where obtained p-value will indicate the "direction" of the relationship between A and B. But as Fisher's exact test implies factorials, it is not possible.

Answer & Explanation

motylowceyvy

motylowceyvy

Beginner2022-11-05Added 19 answers

Well, if it can helps, I finally be able to use Fisher's exact test, which can provide information about the "direction" of the association between the variables. The problem was using this test with very high values, but it can be more or less solved.
Besides this, I found other coeficients that can help to determine the association between the variables in a contingency table like this, such as Yule Q, odds ratio and so on.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in High school probability

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?