- Published: October 31, 2021
- Updated: October 31, 2021
- University / College: Arizona State University
- Language: English
- Downloads: 28
In the problem identified the predicted variable is categorical in nature and there are more than two options. We have used multinomial logistic regression which is a generalized model of logistic regression. This model will give probabilities for each possible outcome of the dependent variable depending on the value of the independent variables.
Assumptions:
i) Data is case specific, i.e. the set of values for independent variables is unique for each case
ii) Dependent variable cannot be exactly predicted, it only gives probabilities of selection
iii) No need for variables to be statistically independent of each other
iv) Low correlation between the independent variables
Reference variable:
Since there are multiple possible outcomes for the dependent variable, one of them has been taken as a reference variable and the results are based on comparison with this reference variable.
Mathematical Equation
ln [(P: Y= Y1) / (P: Y=Y0)] = B0 + B1x1 + B2x2 + ….
Y1 -> One possible outcome of dependent variable
Y0 -> Reference variable
Outcome:
For analysis we have taken Model 3 in spite of higher residual deviance as compared to Model 1 because of better sensitivity values.
Basic interpretation:
Compared to female, a male will choose
An Energy Drink more often than a Dairy Drink
A Carbondated Drink less often than a Dairy Drink
A Packaged Fruit Juice less often than a Dairy Drink
Compared to a consumer with high health consciousness, a consumer with low health consciousness will choose
An Energy Drink more often than a Dairy Drink
A Carbondated Drink more often than a Dairy Drink
A Packaged Fruit Juice more often than a Dairy Drink
Compared to a consumer from Central India, a consumer from North India will choose
An Energy Drink more often than a Dairy Drink
A Carbonated Drink less often than a Dairy Drink
A Packaged Fruit Juice more often than a Dairy Drink
Probability of selecting a particular category
These values are based on individual observations by a consumer.
As a marketer one can target the top two categories for every individual and make related content visible to them.
Scope for improvement:
i) Identify factors which has a higher sensitivity in the model
ii) Lower the number of factors
iii) A better representative sample
Currently there are 6 factors and possible number of unique combinations possible = 23!
If this can be brought down then the marketers can easily classify them according to the most likely beverage they are going to buy.
Other Findings
1) North vs East India: Preference of Carbonated Drinks
H0: Proportion preferring carbonated drinks the most across North and East India are same H1: Proportion preferring carbonated drinks the most across North and East India are not same
p-value: 4.05 e^(-08) => Reject Null Hypothesis
Underlying Reasons:
i) Preference for fruit Juice in North India is significantly more than in East India
ii) Climatic conditions vary in North India whereas in East India it’s constant
2) Low Vs High Health consciousness for Carbonated drinks
H0: Proportion preferring carbonated drinks the most is the same for those having high health consciousness and low health consciousness
H1: Proportion preferring carbonated drinks the most is different for those having high health consciousness and low health consciousness
p-value: 0.0016 => Reject Null Hypothesis
Underlying Reasons:
High health conscious consumers are aware of the adverse effects of Carbonated Drinks
3) Dairy Drinks vs Energy Drinks preference for students
H0: Proportion preferring energy drinks and dairy drinks are the same
H1: Proportion preferring energy drinks and dairy drinks are not same
p-value: 0.0016 => Reject Null Hypothesis
Underlying Reasons:
Price of Energy drinks is more than the price of Dairy drinks