A/B tests are very commonly performed by data analysts and data scientists. For this project, we will be working to understand the results of an A/B test run by an e-commerce website. Our goal is to work through this notebook to help a company understand if they should:
sns.countplot(data=df2,x='group');
print("Number of visitors in control group: ",df2.query('group=="control"').shape[0])
print("Number of visitors in Treatment group: ",df2.query('group=="treatment"').shape[0])
Number of visitors in control group: 145274 Number of visitors in Treatment group: 145310
p_conv=df2['converted'].mean()
"{:.2%}".format(p_conv)
'11.96%'
control group, what is the probability they converted?¶control_cr=df2.query('group=="control"')['converted'].mean()
"{:.2%}".format(control_cr)
'12.04%'
treatment group, what is the probability they converted?¶treatment_cr=df2.query('group=="treatment"')['converted'].mean()
"{:.2%}".format(treatment_cr)
'11.88%'
new_page conversions and old_page conversions in our sample dataset?¶# Calculate the actual difference (obs_diff) between the conversion rates for the two groups.
obs_diff=treatment_cr - control_cr
"{:.2%}".format(obs_diff)
'-0.16%'
treatment group users lead to more conversions?¶If we want to get whether the new new_page leads to more conversion rate or not, then we need to find the proportion of the treatment group in the conversion pool, in other words we need to calculate $P (treatment | converted) $
P_treatmeant_conv=p_treatment * treatment_cr / p_conv
"{:.2%}".format(P_treatmeant_conv)
'49.68%'
P_control_conv=p_control * control_cr / p_conv
"{:.2%}".format(P_control_conv)
'50.32%'
old_page leads to slightly more conversion than the new_page, however they are almost the same, In the next section we will attempt to calculate how significant is this observation from our sample and try to infer that to the actual population (All the website users)¶Since the difference between the two conversion rates is very small. Lets assume that our null hypothesis $H_0$ that the new_page conversion rate $p_{new}$ is the same or lower than conversion rate of old_page $p_{old}$ until the oppoiste proves to be true. And so, our assumption is:
null_dsprt=np.random.normal(0,np.std(p_diff),p_diff.size)
sns.displot(null_dsprt,color='#aaaaaa',height=4.5,aspect=2.4);
plt.title(r'Null distripution $ H_0: p_{new} - p_{old} \leq 0 $',fontsize=15);
null_dsprt=np.random.normal(0,np.std(p_diff),p_diff.size)
sns.displot(null_dsprt,color='#aaaaaa',height=4.5,aspect=2.4);
plt.axvline(np.percentile(null_dsprt,95),color='#3d4435',lw=3)
plt.title(r'Null distripution with significance level $\alpha$ of 5% (right tail test)',fontsize=15);
plt.legend(labels=[r'$\alpha$ of 5%']);
obs="{:.2%}".format(obs_diff)
sns.displot(data=null,x='differences',color='#aaaaaa',height=4.5,aspect=2.4);
plt.axvline(obs_diff,lw=3)
plt.axvline(np.percentile(null_dsprt,95),color='#3d4435',lw=3)
plt.title('Observed difference @ {} under the null hypothesis'.format(obs),fontsize=15)
plt.legend(labels=['observed difference',r'$\alpha$ of 5%'],loc='upper right');
null['p-value']=null['differences'].apply(lambda x: 1 if ((x > obs_diff)) else 0)
strin = 'The probability of observing a difference of {}\n or more extreme cases under the null hypothesis'.format(obs)
strin = strin + r" $ H_0: p_{new} - p_{old} \leq 0 $"
sns.displot(data=null,x='differences',hue='p-value',palette=['#aaaaaa','#0b5394'],legend=False,height=4.5,aspect=2.4);
plt.axvline(np.percentile(null_dsprt,95),color='#3d4435',lw=3)
plt.title(strin,fontsize=15);
plt.legend(labels=[r'$\alpha$ of 5%',"p-value = {:.1%}".format(null['p-value'].mean())],loc='upper right');
"{:.2%}".format(null['p-value'].mean())
'90.42%'
new_page is the same or lower than the conversion rate of the old_page $ H_0: p_{new} \leq p_{old} $¶new_page lower than the old_page $ p_{new} < p_{old}$?¶$$p_{new} \leq p_{old} $$
$$ H_1: p_{new} < p_{old} $$
new_page is the same as the conversion rate of the old_page $ H_0: p_{new} = p_{old} $¶old_page,since we have no evidence for novelity effect or change aversion.¶