CA3 was to test the comprehension of association analysis. The CA had two components to it:

**Complete 15 SWIRL ‘R’ Programming questions****Complete 5 questions on the concepts of Lift, Chi Squared and other association analysis algorithms.**

**SWIRL ‘R’ Programming questions**

**What is SWIRL?**

Swirl is a software package for the R programming language that turns the R console into an interactive learning environment. Users are provided with real-time feedback as they are guided through self-paced practicals in the fields of data science and R programming.

**Who is swirl aimed at?**

Swirl is aimed at beginners in R programming.

**What’s needed to use swirl?**

You will need a computer, an Internet connection and a recent version of R installed on your machine and you’re good to go.

If you are interested in trying R programming using swirl, see details here

**Five questions on the concepts of Lift, Chi Squared and other association analysis algorithms.**

**Q1. LIFT Analysis**

Please calculate the following lift values for the table correlating Burger & Chips below:

- LIFT (Burger, Chips)
- LIFT (Burger, ^Chips)
- LIFT (^Burger, Chips)
- LIFT (^Burger, ^Chips)

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation.

Column1 |
Chips |
^Chips |
Total Row |

Burgers | 600 | 400 | 1000 |

^Burgers | 200 | 200 | 400 |

Total Column | 800 | 600 | 1400 |

**Answer Q1.**

LIFT (Burgers, Chips)

s (Burgers u Chips) = 600/1400 = 0.428

s(Burgers) = 1000/1400 = 0.714

s(Chips) = 800/1400 = 0.571

LIFT (Burgers, Chips) = 0.428/(0.714*0.571) = 1.049

LIFT (Burgers, Chips) > 1

**My answer suggests that Burgers and Chips are positively correlated.**

LIFT (Burgers, ^Chips)

s(Burgers u ^Chips) = 400/1400 = 0.285

s(Burgers) = 1000/1400 = 0.714

s(^Chips) = 600/1400 = 0.428

LIFT (Burgers, ^Chips) = 0.285/(0.714*0.428) = 0.932

LIFT (Burgers, ^Chips) < 1

**My answer suggests that Burgers and ^Chips are negatively correlated.**

LIFT (^Burgers, Chips)

s(^Burgers u Chips) = 200/1400 = 0.142

s(^Burgers) = 400/1400 = 0.285

s(Chips) = 800/1400 = 0.571

LIFT (^Burgers, Chips) = 0.142/(0.285*0.571) = 0.872

LIFT (^Burgers, Chips) < 1

**My answer suggests that ^Burgers and Chips are negatively correlated.**

LIFT (^Burgers, ^Chips)

s(^Burgers u ^Chips) = 200/1400 = 0.142

s(^Burgers) = 400/1400 = 0.285

s(^Chips) = 600/1400 = 0.428

LIFT (^Burgers, ^Chips) = 0.142/(0.285*0.428) = 1.164

LIFT (^Burgers, ^Chips) > 1

**My answer suggests that Burgers and Chips are positively correlated.**

**Q2. Please calculate the following lift values for the table correlating Ketchup & Shampoo below:**

- LIFT (Ketchup, Shampoo)
- LIFT (Ketchup, ^Shampoo)
- LIFT (^Ketchup, Shampoo)
- LIFT (^Ketchup, ^Shampoo)

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation.

Column1 |
Shampoo |
^Shampoo |
Total Row |

Ketchup | 100 | 200 | 300 |

^Ketchup | 200 | 400 | 600 |

Total Column | 300 | 600 | 900 |

**Answer Q2.**

LIFT (Ketchup, Shampoo)

s(Ketchup u Shampoo) = 100/900 = 0.111

s(Ketchup) = 300/900 = 0.333

s(Shampoo) = 300/900 = 0.333

LIFT (Ketchup, Shampoo) = 0.111/(0.333*0.333) = 1.001

LIFT (Ketchup, Shampoo) = 1

**My answer suggests that Ketchup and Shampoo are independent.**

LIFT (Ketchup, ^Shampoo)

s(Ketchup u ^Shampoo) = 200/900 = 0.222

s(Ketchup) = 300/900 = 0.333

s(^Shampoo) = 600/900 = 0.666

LIFT (Ketchup, ^Shampoo) = 0.222/(0.333*0.666) = 1.001

LIFT (Ketchup, ^Shampoo) = 1

**My answer suggests that Ketchup and Shampoo are independent.**

LIFT (^Ketchup, Shampoo)

s(^Ketchup u Shampoo) = 200/900 = 0.22

s(^Ketchup) = 600/900 = 0.67

s(Shampoo) = 300/900 = 0.33

LIFT (^Ketchup, Shampoo) = 0.222/(0.666*0.333) = 0.22/0.22 = 1.001

LIFT (Ketchup, Shampoo) = 1

**My answer suggests that Ketchup and Shampoo are independent.**

LIFT (^Ketchup, ^Shampoo)

s(^Ketchup u ^Shampoo) = 400/900 = 0.444

s(^Ketchup) = 600/900 = 0.666

s(^Shampoo) = 600/900 = 0.666

LIFT (^Ketchup, ^Shampoo) = 0.444/(0.666*0.666) = 1.001

LIFT (Ketchup, Shampoo) = 1 (Ketchup and Shampoo, Independent)

**My answer suggests that Ketchup and Shampoo are independent.**

**Q3. Chi Squared Analysis**

Please calculate the following chi Squared values for the table correlating Burger and Chips below (Expected values in brackets).

- Burgers & Chips
- Burgers & Not Chips
- Not Burgers & Chips
- Not Burgers & Not Chips

For the above options, please also indicate if each of your answer would suggest independent, positive or negative correlation.

Column1 |
Chips |
^Chips |
Total Row |

Burgers | 900 (800) | 100 (200) | 1000 |

^Burgers | 300 (400) | 200 (100) | 500 |

Total Column | 1200 | 300 | 1500 |

Chi-squared = ∑ (observed-expected)^{ 2}/ (expected)

Χ^{2 }= (900-800)^{2 }/ 800 + (100-200)^{2 }/ 200 + (300-400)^{2 }/ 400 + (200-100)^{2 }/ 100

= 100^{2 }/ 800 + (-100)^{2 }/ 200 + (-100)^{2 }/ 400 + 100^{2 }/ 100

= 10000/800 + 10000/200 +10000/400 + 10000/100 = 12.5 + 50 + 25 + 100 = 187.5

Burgers & Chips, correlated Χ^{2 }> 0.

**Expected 800, Observed 900, Burgers & Chips – Negatively Correlated.**

**Expected 200, Observed 100, Burgers & ^Chips – Negatively Correlated.**

**Expected 400, Observed 300, ^Burgers & Chips – Negatively Correlated.**

**Expected 100, Observed 200, ^Burgers & ^Chips – Negatively Correlated.**

**Q4: Chi Squared Analysis**

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

- Burgers & Sausages
- Burgers & Not Sausages
- Sausages & Not Burgers
- Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

Column1 |
Chips |
^Chips |
Total Row |

Burgers | 800 (800) | 200 (200) | 1000 |

^Burgers | 400 (400) | 100 (100) | 500 |

Total Column | 1200 | 300 | 1500 |

**Answer Q4.**

Χ^{2 }= (800-800)^{2 }/ 800 + (200-200)^{2 }/ 200 + (400-400)^{2 }/ 400 + (100-100)^{2 }/ 100

= 0^{2 }/ 800 + 0^{2 }/ 200 + 0^{2 }/ 400 + 0^{2 }/ 100 = 0

Burgers & Chips, Independent Χ^{2 }= 0.

**Burgers & Chips– Observed & Expected, 800 – Independent**

**Burgers & ^Chips – Observed & Expected, 200 – Independent**

**^Burgers & Chips – Observed & Expected, 400 – Independent**

**^Burgers & ^Chips – Observed & Expected, 100 – Independent**

**Q5:**

Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events?

**The conditions under Lift & Chi Squared analysis that could prove to be a poor algorithm to evaluate correlation / dependency between two events are when there are too many null transactions observed.**

Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared?

**Another algorithm that could be used to rectify the flow in Lift & Chi squared is: AllConf, Cosine, Jaccard, MaxConf, Kulczynski.**

Aside: Regarding the gif at the beginning of this blog post, **the word burgers is mentioned 55 times** through out the post, so I found it’s use appropriate.

**References**

Swirl, (2017). Available at: http://swirlstats.com/students.html (Accessed Mar. 2017).

*Royal with cheese *(2014). [Online image]. Available at: http://metro.co.uk/2014/09/18/national-cheeseburger-day-7-of-the-best-movie-quotes-featuring-cheeseburgers-4871752/ (Accessed Mar. 2017).