In English, people say two things are “independent” to mean they “have nothing to do with one another.” This means the same thing in statistics. But how do statisticians formalize this idea?
Recall that 19% (= Pr(C)) of EMBAs own cats, and 48.5% (= Pr(D)) of EMBAs own dogs. If cat ownership and dog ownership are independent of each other, then we would expect 19% of dog owners to also own cats (or 48.5% of cat owners to also own dogs). In other words, the frequency of one random event (cat ownership) is unaffected by knowledge of the other (dog ownership). This implies the percentage of people who own both dogs and cats (= Pr(C∩D)) should be .19 × .485 (= Pr(D) × Pr(C)) = .092 = 9.2%. In other words, independence in statistics means a particular mathematical condition must hold. This notion is formalized in the following definition.
Definition. We say two events C and D are independent if and only if Pr(C∩D) = Pr(D) × Pr(C).
This is not a formula but rather a condition that we must check if we want to claim two events are independent. If the condition is true, then the events are independent. If not, they are dependent. For example, are dog ownership and cat ownership independent in our previous example? Let’s check. Pr(C∩D) = 7.5%, and Pr(D) × Pr(C) = .19 × .485 = 9.2%. These percentages are not equal, so dog ownership and cat ownership are not independent.