A/B tests are very commonly performed by data analysts and data scientists. For this project, we will be working to understand the results of an A/B test run by an e-commerce website. Our goal is to work through this notebook to help a company understand if they should:
sns.countplot(data=df2,x='group');
print("Number of visitors in control group: ",df2.query('group=="control"').shape[0])
print("Number of visitors in Treatment group: ",df2.query('group=="treatment"').shape[0])
Number of visitors in control group: 145274 Number of visitors in Treatment group: 145310
p_conv=df2['converted'].mean()
"{:.2%}".format(p_conv)
'11.96%'
control
group, what is the probability they converted?¶control_cr=df2.query('group=="control"')['converted'].mean()
"{:.2%}".format(control_cr)
'12.04%'
treatment
group, what is the probability they converted?¶treatment_cr=df2.query('group=="treatment"')['converted'].mean()
"{:.2%}".format(treatment_cr)
'11.88%'
new_page
conversions and old_page
conversions in our sample dataset?¶# Calculate the actual difference (obs_diff) between the conversion rates for the two groups.
obs_diff=treatment_cr - control_cr
"{:.2%}".format(obs_diff)
'-0.16%'
treatment
group users lead to more conversions?¶If we want to get whether the new new_page
leads to more conversion rate or not, then we need to find the proportion of the treatment
group in the conversion pool, in other words we need to calculate $P (treatment | converted) $
P_treatmeant_conv=p_treatment * treatment_cr / p_conv
"{:.2%}".format(P_treatmeant_conv)
'49.68%'
P_control_conv=p_control * control_cr / p_conv
"{:.2%}".format(P_control_conv)
'50.32%'
old_page
leads to slightly more conversion than the new_page
, however they are almost the same, In the next section we will attempt to calculate how significant is this observation from our sample and try to infer that to the actual population (All the website users)¶Since the difference between the two conversion rates is very small. Lets assume that our null hypothesis $H_0$ that the new_page
conversion rate $p_{new}$ is the same or lower than conversion rate of old_page
$p_{old}$ until the oppoiste proves to be true. And so, our assumption is:
null_dsprt=np.random.normal(0,np.std(p_diff),p_diff.size)
sns.displot(null_dsprt,color='#aaaaaa',height=4.5,aspect=2.4);
plt.title(r'Null distripution $ H_0: p_{new} - p_{old} \leq 0 $',fontsize=15);
null_dsprt=np.random.normal(0,np.std(p_diff),p_diff.size)
sns.displot(null_dsprt,color='#aaaaaa',height=4.5,aspect=2.4);
plt.axvline(np.percentile(null_dsprt,95),color='#3d4435',lw=3)
plt.title(r'Null distripution with significance level $\alpha$ of 5% (right tail test)',fontsize=15);
plt.legend(labels=[r'$\alpha$ of 5%']);
obs="{:.2%}".format(obs_diff)
sns.displot(data=null,x='differences',color='#aaaaaa',height=4.5,aspect=2.4);
plt.axvline(obs_diff,lw=3)
plt.axvline(np.percentile(null_dsprt,95),color='#3d4435',lw=3)
plt.title('Observed difference @ {} under the null hypothesis'.format(obs),fontsize=15)
plt.legend(labels=['observed difference',r'$\alpha$ of 5%'],loc='upper right');
null['p-value']=null['differences'].apply(lambda x: 1 if ((x > obs_diff)) else 0)
strin = 'The probability of observing a difference of {}\n or more extreme cases under the null hypothesis'.format(obs)
strin = strin + r" $ H_0: p_{new} - p_{old} \leq 0 $"
sns.displot(data=null,x='differences',hue='p-value',palette=['#aaaaaa','#0b5394'],legend=False,height=4.5,aspect=2.4);
plt.axvline(np.percentile(null_dsprt,95),color='#3d4435',lw=3)
plt.title(strin,fontsize=15);
plt.legend(labels=[r'$\alpha$ of 5%',"p-value = {:.1%}".format(null['p-value'].mean())],loc='upper right');
"{:.2%}".format(null['p-value'].mean())
'90.42%'
new_page
is the same or lower than the conversion rate of the old_page
$ H_0: p_{new} \leq p_{old} $¶new_page
lower than the old_page
$ p_{new} < p_{old}$?¶$$p_{new} \leq p_{old} $$
$$ H_1: p_{new} < p_{old} $$
new_page
is the same as the conversion rate of the old_page
$ H_0: p_{new} = p_{old} $¶old_page
,since we have no evidence for novelity effect or change aversion.¶