Hello, Welcome to COT. This article explains about finding relationship between two categorical variables. We will understand the concept with an example in Python programming language. I will use 'variable', and 'feature' words interchangeably. So, don't be confused. You need some basic knowledge of Pandas library in Python to understand this article.

Variables which take some fixed number of values. For example: If the feature name is gender, it can only have only male, and female as values. The same is applicable for a feature named weather, it can only take sunny, cloudy, and rainy etc.

If the variables are continuous, then it is easy to find relationship/correlation between them using scatter plot. But when it comes to categorical variables, we get a little bit confused. So, today I am going to explain how to use

#

I am going to use Titanic dataset provided by kaggle. We will try find if there is a relationship between 'Embarked' (port of embarkation: C = Cherbourg, Q = Queenstown, S = Southampton), and 'Survival' features.

First, we have to make a hypothesis. We call it Null Hypothesis in the Chi-Square test of independence(no relationship). We will try to find if the hypothesis is right, or wrong using certain tests. What's the hypothesis in our case?.

Let us call our hypothesis H0.

Contingency table:

Chi-Square test requires your data in the form of contingency table. Contingency table is nothing, but it contains frequencies of different combinations made by values of two categorical variables. You can understand it by seeing image below. I took a sample(Table no. 1) from our dataset, and Table no. 2 shows contingency table for the sample:

Table no. 3 is real contingency table for the example we are considering. Here is the code for getting contingency table for our Titanic dataset.

Embarked column contains two null values, so we have remove those two rows.

Values stored in contingency table are called observed values. Now, we have to find expected values. To find expected values, first of all you have to find some details about contingency table like this:

Using the information in above table, we are going to find expected values table. Expected value for a certain position using above information can be calculated like following:

Scipy library in Python provides a function for all this. So, there is no need to waste your time, if you understand the math. I will do it later in this article, keep reading.

DOF: Number of values in the table which are free to vary is called Degree of freedom. In chi-square method DOF is always:

In our case, it should be:

Alpha: It is a probability value. For example, if its value is 0.05, then there are 5%(or less than 5%, you will understand later) that our H0(The Null Hypothesis) is true.

Standard value used for alpha in Chi-Square test is 0.05. So, our value for alpha is

I will call this value stat for simplicity. We can find the stat value by applying Chi-Square formula on our contingency, and expected values tables. We can do it like following.

We will find all the required values using Scipy in step 7.

We have to find critical value of stat for our problem with the help of DOF, and alpha values in step 3. There is a standard Chi-Sqaure table for it, you just need DOF, and alpha as row, and column indexes. You can see it in the image shown below:

All these calculations can be done using just 2 Scipy functions.

If the stat value is greater than critical value(5.99), then reject the hypothesis H0, otherwise you cannot reject you hypothesis. In our case, it is greater than 5.99, so we have to reject the H0. You can do it in Python like this:

Thanks for reading the article. If you have any doubt, please ask in the comments below.

**Categorical Variables:**Variables which take some fixed number of values. For example: If the feature name is gender, it can only have only male, and female as values. The same is applicable for a feature named weather, it can only take sunny, cloudy, and rainy etc.

If the variables are continuous, then it is easy to find relationship/correlation between them using scatter plot. But when it comes to categorical variables, we get a little bit confused. So, today I am going to explain how to use

**Chi-Square Test**for finding if there is a relationship, or dependence between two features.#
**Step by Step: How to use Chi-Square test?.**

I am going to use Titanic dataset provided by kaggle. We will try find if there is a relationship between 'Embarked' (port of embarkation: C = Cherbourg, Q = Queenstown, S = Southampton), and 'Survival' features.**Step 1**: Making Null HypothesisFirst, we have to make a hypothesis. We call it Null Hypothesis in the Chi-Square test of independence(no relationship). We will try to find if the hypothesis is right, or wrong using certain tests. What's the hypothesis in our case?.

Let us call our hypothesis H0.

*H0: There is no relationship between 'Embarked', and 'Survival' features.***Step 2**: Get your contingency tableContingency table:

Chi-Square test requires your data in the form of contingency table. Contingency table is nothing, but it contains frequencies of different combinations made by values of two categorical variables. You can understand it by seeing image below. I took a sample(Table no. 1) from our dataset, and Table no. 2 shows contingency table for the sample:

Table no. 3 is real contingency table for the example we are considering. Here is the code for getting contingency table for our Titanic dataset.

Embarked column contains two null values, so we have remove those two rows.

**Step 3**: Find Expected values.Values stored in contingency table are called observed values. Now, we have to find expected values. To find expected values, first of all you have to find some details about contingency table like this:

Using the information in above table, we are going to find expected values table. Expected value for a certain position using above information can be calculated like following:

Scipy library in Python provides a function for all this. So, there is no need to waste your time, if you understand the math. I will do it later in this article, keep reading.

**Step 4**:**Find Degree of Freedom(DOF), and decide Alpha.**DOF: Number of values in the table which are free to vary is called Degree of freedom. In chi-square method DOF is always:

*DOF = (num_of_rows - 1)*(num_of_cols - 1)*In our case, it should be:

*DOF = (3 - 1)*(2 - 1) = 2*Alpha: It is a probability value. For example, if its value is 0.05, then there are 5%(or less than 5%, you will understand later) that our H0(The Null Hypothesis) is true.

Standard value used for alpha in Chi-Square test is 0.05. So, our value for alpha is

*0.05*.**Step 5**: Find Chi-Square value(or statistic value).I will call this value stat for simplicity. We can find the stat value by applying Chi-Square formula on our contingency, and expected values tables. We can do it like following.

We will find all the required values using Scipy in step 7.

**Step 6**: Find critical value of stat.We have to find critical value of stat for our problem with the help of DOF, and alpha values in step 3. There is a standard Chi-Sqaure table for it, you just need DOF, and alpha as row, and column indexes. You can see it in the image shown below:

All these calculations can be done using just 2 Scipy functions.

**Step 7**: Decide whether to reject the H0 hypothesis, or not.If the stat value is greater than critical value(5.99), then reject the hypothesis H0, otherwise you cannot reject you hypothesis. In our case, it is greater than 5.99, so we have to reject the H0. You can do it in Python like this:

Thanks for reading the article. If you have any doubt, please ask in the comments below.

## 16 Comments

This is a great motivational article. In fact, I am happy with your good work. They publish very supportive data, really. Continue. Continue blogging. Hope you explore your next post

ReplyDelete360DigiTMG big data course

Thank you so much for your feedback. We will work hard to provide more quality, on point and easy explanation content.✌🏻

DeleteI would prescribe my profile is critical to me, I welcome you to talk about this point...

ReplyDeletedifference between analysis and analytics

Machine Learning Projects for Final Year machine learning projects for final year

ReplyDeleteDeep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.

Python Training in Chennai Python Training in Chennai Angular Training Project Centers in Chennai

Really Nice Information It's Very Helpful All courses Checkout Here.

ReplyDeletedata science course in pune

aşk kitapları

ReplyDeletetiktok takipçi satın al

instagram beğeni satın al

youtube abone satın al

twitter takipçi satın al

tiktok beğeni satın al

tiktok izlenme satın al

twitter takipçi satın al

tiktok takipçi satın al

youtube abone satın al

tiktok beğeni satın al

instagram beğeni satın al

trend topic satın al

trend topic satın al

youtube abone satın al

takipçi satın al

beğeni satın al

tiktok izlenme satın al

sms onay

youtube izlenme satın al

tiktok beğeni satın al

sms onay

sms onay

perde modelleri

instagram takipçi satın al

takipçi satın al

tiktok jeton hilesi

instagram takipçi satın al pubg uc satın al

sultanbet

marsbahis

betboo

betboo

betboo

beğeni satın al

ReplyDeleteinstagram takipçi satın al

ucuz takipçi

takipçi satın al

https://takipcikenti.com

https://ucsatinal.org

instagram takipçi satın al

https://perdemodelleri.org

https://yazanadam.com

instagram takipçi satın al

balon perdeler

petek üstü perde

mutfak tül modelleri

kısa perde modelleri

fon perde modelleri

tül perde modelleri

https://atakanmedya.com

https://fatihmedya.com

https://smmpaketleri.com

https://takipcialdim.com

https://yazanadam.com

yasaklı sitelere giriş

aşk kitapları

yabancı şarkılar

sigorta sorgula

https://cozumlec.com

word indir ücretsiz

tiktok jeton hilesi

rastgele görüntülü sohbet

erkek spor ayakkabı

fitness moves

gym workouts

https://marsbahiscasino.org

http://4mcafee.com

http://paydayloansonlineare.com

tiktok jeton hilesi

ReplyDeletetiktok jeton hilesi

referans kimliği nedir

gate güvenilir mi

tiktok jeton hilesi

paribu

btcturk

bitcoin nasıl alınır

yurtdışı kargo

seo fiyatları

ReplyDeletesaç ekimi

dedektör

instagram takipçi satın al

ankara evden eve nakliyat

fantezi iç giyim

sosyal medya yönetimi

mobil ödeme bozdurma

kripto para nasıl alınır

mmorpg oyunlar

ReplyDeleteinstagram takipçi satın al

TİKTOK JETON HİLESİ

tiktok jeton hilesi

Antalya Sac Ekimi

referans kimliği nedir

İNSTAGRAM TAKİPÇİ SATIN AL

metin2 pvp serverlar

Takipci satin al

tuzla lg klima servisi

ReplyDeletebeykoz vestel klima servisi

üsküdar vestel klima servisi

beykoz bosch klima servisi

üsküdar bosch klima servisi

tuzla arçelik klima servisi

çekmeköy samsung klima servisi

ataşehir samsung klima servisi

kadıköy arçelik klima servisi

Good content. You write beautiful things.

ReplyDeletesportsbet

vbet

vbet

hacklink

taksi

hacklink

mrbahis

korsan taksi

sportsbet

Success Write content success. Thanks.

ReplyDeletebetmatik

kıbrıs bahis siteleri

betpark

betturkey

canlı poker siteleri

kralbet

canlı slot siteleri

çorum

ReplyDeleteantep

ısparta

hatay

mersin

S7KV

kars

ReplyDeletekütahya

aydın

balıkesir

bitlis

XHXR

salt likit

ReplyDeletesalt likit

dr mood likit

big boss likit

dl likit

dark likit

İ4UTSX