## Statistical Analysis CA 3 Data Management & Analytics

Data Management & Analytics

CA 3 Statistical Analysis

_______________________________________________________________________

Q1: Lift Analysis

Please calculate the following lift values for the table correlating Burger & Chips below:

Lift(Burger, Chips)

Lift(Burgers, ^Chips)

Lift(^Burgers, Chips)

Lift(^Burgers, ^Chips)

 Chips ^Chips Total Row Burgers 600 400 1000 ^Burgers 200 200 400 Total Column 800 600 1400

Solution 1

(Burgers u Chips)=600/1400=3/7=0.43

(Burgers)=1000/1400=5/7=0.71

(Chips)=800/1400=4/7=0.57

LIFT(Burgers,Chips)=0.43/(0.71*0.57)=0.43/0.40=1.075

*One meaning we arrive at is that Burgers & Chips are positively correlated.

Solution 2

(Burgers u ^Chips)=400/1400=2/7=0.29

(Burgers)=1000/1400=5/7=0.71

(^Chips)= 600/1400=3/7=0.43

LIFT(Burgers,^Chips)=0.29/(0.71*0.43)=0.29/0.31=0.94

*One meaning we arrive at is that Burgers & ^Chips are negatively correlated.

Solution 3

(^Burgers u Chips)=200/1400=1/7=0.14

(^Burgers)=400/1400=2/7=0.29

(Chips)=800/1400=4/7=0.57

LIFT(^Burgers,Chips)=0.14/(0.29*0.57)=0.14/0.17=0.82

*One meaning we arrive at is that ^Burgers & Chips are negatively correlated

Solution 4

(^Burgers u^Chips)=200/1400=1/7=0.14

(^Burgers)=400/1400=2/7=0.29

(^Chips)=600/1400=3/7=0.43

LIFT(^Burgers,^Chips)=0.14/(0.29*0.43)=0.14/0.12=1.7

*One meaning we arrive at is that ^Burgers & ^Chips are positively correlated.

Q2: Lift Analysis

Please calculate the following lift values for the table correlating Ketchup & Shampoo below:

Lift(Ketchup, Shampoo)

Lift(Ketchup, ^Shampoo)

Lift(^Ketchup, Shampoo)

Lift(^Ketchup, ^Shampoo)

 Shampoo ^Shampoo Total Row Ketchup 100 200 300 ^Ketchup 200 400 600 Total Column 300 600 900

Solution 1

(Ketchup u Shampoo)=100/900=1/9=0.11

(Ketchup)=300/900=1/3=0.33

(Shampoo)= 300/900=1/3=0.33

LIFT(Ketchup,Shampoo)=0.11/(0.33*0.33)=0.11/0.11=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

Solution 2

(Ketchup u^Shampoo)=200/900=2/9=0.22

(Ketchup)=300/900=1/3=0.33

(^Shampoo)=600/900=2/3=0.67

LIFT(Kethcup,^Shampoo)=0.22/(0.33*0.67)=0.22/0.22=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

Solution 3

(^Ketchup u Shampoo)=200/900=2/9=0.22

(^Ketchup)= 600/900=2/3=0.67

(Shampoo)=300/900=1/3=0.33

LIFT(^Ketchup,Shampoo)=0.22/(0.67*0.33)=0.22/0.22=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

Solution 4

(^Ketchup u^Shampoo)=400/900=4/9=0.44

(^Ketchup)= 600/900=2/3=0.67

(^Shampoo)= 600/900=2/3=0.67

LIFT(^Ketchup,^Shampoo)=0.44/(0.67*0.67)=0.44/0.44=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

Question 3: Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

Burgers & Sausages

Burgers & Not Sausages)

Sausages & Not Burgers

Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 Chips ^Chips Total Row Burgers 900(800) 100(200) 1000 ^Burgers 300(400) 200(100) 500 Total Column 1200 300 1500

Solution

X2=(900-800)2/800+(100-200)2/200+(300-400)2/400+(200-100)2/100

=1002/800+(-100)2/200+(-100)2/400+1002/400+1002/100

=10000/800+10000/200+10000/400+10000/100

= 12.5+50+25+100=187.5

Burgers & Chips are correlated because X2>0.

*Expected values are 800 & observed value is 900 we can be certain that burgers & chips are positively correlated.

*Expected values are 200 & observed value is 100 we can say Burgers & ^Chips are positively correlated.

*Expected values are 400 & observed value is 300 we can say ^Burgers & Chips are positively correlated.

*Expected values are 100 & observed value is 200 we can say ^Burgers & ^Chips are positively correlated.

Q4: Chi Squared Analysis

Please calculate the following Chi Squared values for the table correlating Burger & Sausages below (expected values in brackets)

Burgers & Sausages

Burgers & Not Sausages

Sausages & Not Burgers

Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 Sausages ^Sausages Total Row Burgers 800(800) 200(200) 1000 ^Burgers 400(400) 100(100) 500 Total Column 1200 300 1500

Solution

X2=(800-800)2/800+(200-200)2/200+(400-400)2/400+(100-100)2/100

=02/800+02/200+02/400+02/100=0

*Burgers & Sausages are independent because X2=0. Burgers & Sausages observed &   expected values are the same (800)-independent

*Burgers &^Sausages -observed & expected values are the same (200)-independent

*^Burgers & Sausages – observed & expected values are the same (400)-independent

*^Burgers & ^Sausages – observed & expected values are the same (100)-independent

Question 5:LIFT/Chi Squared Analysis

A: Under what conditions would Lift & Chi Squared analysis prove to be a poor algorithms to evaluate correlation/dependency between two events?

Solution

If there were too many null transactions Lift & Chi Analysis would prove to be poor algorithms to analyses the data.

B: Please suggest another algorithm that could be used to rectify the flaw in the Lift & Chi Squared Analysis?

Some other Algorithms that could be used include AllConf, Jaccard, MaxConf & Kulczynski.