Data Management & Analytics

Sinead McConn- 10013026

CA 3 Statistical Analysis

_______________________________________________________________________

*Q1: Lift Analysis*

Please calculate the following lift values for the table correlating Burger & Chips below:

Lift(Burger, Chips)

Lift(Burgers, ^Chips)

Lift(^Burgers, Chips)

Lift(^Burgers, ^Chips)

Chips | ^Chips | Total Row | |

Burgers | 600 | 400 | 1000 |

^Burgers | 200 | 200 | 400 |

Total Column | 800 | 600 | 1400 |

__Solution 1__

(Burgers u Chips)=600/1400=3/7=0.43

(Burgers)=1000/1400=5/7=0.71

(Chips)=800/1400=4/7=0.57

LIFT(Burgers,Chips)=0.43/(0.71*0.57)=0.43/0.40=1.075

*One meaning we arrive at is that Burgers & Chips are positively correlated.

__Solution 2__

(Burgers u ^Chips)=400/1400=2/7=0.29

(Burgers)=1000/1400=5/7=0.71

(^Chips)= 600/1400=3/7=0.43

LIFT(Burgers,^Chips)=0.29/(0.71*0.43)=0.29/0.31=0.94

*One meaning we arrive at is that Burgers & ^Chips are negatively correlated.

__Solution 3__

(^Burgers u Chips)=200/1400=1/7=0.14

(^Burgers)=400/1400=2/7=0.29

(Chips)=800/1400=4/7=0.57

LIFT(^Burgers,Chips)=0.14/(0.29*0.57)=0.14/0.17=0.82

*One meaning we arrive at is that ^Burgers & Chips are negatively correlated

__Solution 4__

(^Burgers u^Chips)=200/1400=1/7=0.14

(^Burgers)=400/1400=2/7=0.29

(^Chips)=600/1400=3/7=0.43

LIFT(^Burgers,^Chips)=0.14/(0.29*0.43)=0.14/0.12=1.7

*One meaning we arrive at is that ^Burgers & ^Chips are positively correlated.

*Q2:** Lift Analysis*

Please calculate the following lift values for the table correlating Ketchup & Shampoo below:

Lift(Ketchup, Shampoo)

Lift(Ketchup, ^Shampoo)

Lift(^Ketchup, Shampoo)

Lift(^Ketchup, ^Shampoo)

Shampoo | ^Shampoo | Total Row | |

Ketchup | 100 | 200 | 300 |

^Ketchup | 200 | 400 | 600 |

Total Column | 300 | 600 | 900 |

__Solution 1__

(Ketchup u Shampoo)=100/900=1/9=0.11

(Ketchup)=300/900=1/3=0.33

(Shampoo)= 300/900=1/3=0.33

LIFT(Ketchup,Shampoo)=0.11/(0.33*0.33)=0.11/0.11=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

__Solution 2__

(Ketchup u^Shampoo)=200/900=2/9=0.22

(Ketchup)=300/900=1/3=0.33

(^Shampoo)=600/900=2/3=0.67

LIFT(Kethcup,^Shampoo)=0.22/(0.33*0.67)=0.22/0.22=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

__Solution 3__

(^Ketchup u Shampoo)=200/900=2/9=0.22

(^Ketchup)= 600/900=2/3=0.67

(Shampoo)=300/900=1/3=0.33

LIFT(^Ketchup,Shampoo)=0.22/(0.67*0.33)=0.22/0.22=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

__Solution 4__

(^Ketchup u^Shampoo)=400/900=4/9=0.44

(^Ketchup)= 600/900=2/3=0.67

(^Shampoo)= 600/900=2/3=0.67

LIFT(^Ketchup,^Shampoo)=0.44/(0.67*0.67)=0.44/0.44=1

*One meaning we arrive at is that Ketchup & Shampoo are independent

*Question 3: Chi Squared Analysis*

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

Burgers & Sausages

Burgers & Not Sausages)

Sausages & Not Burgers

Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

Chips | ^Chips | Total Row | |

Burgers | 900(800) | 100(200) | 1000 |

^Burgers | 300(400) | 200(100) | 500 |

Total Column | 1200 | 300 | 1500 |

__Solution__

X2=(900-800)2/800+(100-200)2/200+(300-400)2/400+(200-100)2/100

=1002/800+(-100)2/200+(-100)2/400+1002/400+1002/100

=10000/800+10000/200+10000/400+10000/100

= 12.5+50+25+100=187.5

Burgers & Chips are correlated because X2>0.

*Expected values are 800 & observed value is 900 we can be certain that burgers & chips are positively correlated.

*Expected values are 200 & observed value is 100 we can say Burgers & ^Chips are positively correlated.

*Expected values are 400 & observed value is 300 we can say ^Burgers & Chips are positively correlated.

*Expected values are 100 & observed value is 200 we can say ^Burgers & ^Chips are positively correlated.

*Q4: Chi Squared Analysis*

Please calculate the following Chi Squared values for the table correlating Burger & Sausages below (expected values in brackets)

Burgers & Sausages

Burgers & Not Sausages

Sausages & Not Burgers

Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

Sausages | ^Sausages | Total Row | |

Burgers | 800(800) | 200(200) | 1000 |

^Burgers | 400(400) | 100(100) | 500 |

Total Column | 1200 | 300 | 1500 |

__Solution__

X2=(800-800)2/800+(200-200)2/200+(400-400)2/400+(100-100)2/100

=02/800+02/200+02/400+02/100=0

*Burgers & Sausages are independent because X2=0. Burgers & Sausages observed & expected values are the same (800)-independent

*Burgers &^Sausages -observed & expected values are the same (200)-independent

*^Burgers & Sausages – observed & expected values are the same (400)-independent

*^Burgers & ^Sausages – observed & expected values are the same (100)-independent

*Question 5:LIFT/Chi Squared Analysis*

A: Under what conditions would Lift & Chi Squared analysis prove to be a poor algorithms to evaluate correlation/dependency between two events?

__Solution__

If there were too many null transactions Lift & Chi Analysis would prove to be poor algorithms to analyses the data.

B: Please suggest another algorithm that could be used to rectify the flaw in the Lift & Chi Squared Analysis?

Some other Algorithms that could be used include AllConf, Jaccard, MaxConf & Kulczynski.

__ __