๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐ŸฅŽ Casual inference

[The Brave and True] 11. Propensity score

by isdawell 2023. 7. 13.
728x90

 

 

๐Ÿ‘€ ์ธ๊ณผ์ถ”๋ก  ๊ฐœ์ธ ๊ณต๋ถ€์šฉ ํฌ์ŠคํŠธ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ถœ์ฒ˜๋Š” ์ฒจ๋ถ€ํ•œ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”!

 

 

 

โ€ป ์ •๋ฆฌ 1

โ€ป ์ •๋ฆฌ 2

 

 

๐Ÿ“œ ์ •๋ฆฌ 

 

•   ์„ฑํ–ฅ์ ์ˆ˜ = Treatment ๋ฅผ ๋ฐ›์„ ํ™•๋ฅ  
•   ์„ฑํ–ฅ์ ์ˆ˜๊ฐ€ ์žˆ๋‹ค๋ฉด Confounder ๋ฅผ ์ง์ ‘ ์ œ์–ดํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ, ์„ฑํ–ฅ์ ์ˆ˜๋ฅผ ํ†ต์ œํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ์ถฉ๋ถ„ํ•˜๋‹ค. 

 

 

 

 

 

 

โ‘   Example 


 

โ—ฏ   ์ฃผ์ œ 

 

•  ํ•™์ƒ๋“ค์ด ํ•™๊ต์—์„œ ์„ฑ์žฅ ๋งˆ์ธ๋“œ์…‹์— ๋Œ€ํ•œ ์„ธ๋ฏธ๋‚˜๋ฅผ ์ฐธ์—ฌํ•˜๊ณ  ๊ต์œก์„ ๋ฐ›์€ ํ•™์ƒ๋“ค์ด ํ•™์—…์ ์œผ๋กœ ์–ด๋– ํ•œ ์„ฑ์ทจ๊ฐ€ ์žˆ์—ˆ๋Š”์ง€ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ์„ธ๋ฏธ๋‚˜ ์ˆ˜์—…์„ ๋ฐ›์€ ํ•™์ƒ๋“ค์˜ ๋Œ€ํ•™์ƒํ™œ์„ ์ถ”์ ํ•œ๋‹ค. 

 

 

 

โ—ฏ   ๋ฐ์ดํ„ฐ์…‹ 

 

•  school_achievement : ํ‘œ์ค€ํ™”๋œ ์„ฑ์ทจ๋„ (ํ‘œ์ค€ํ™” ๋จ = ๋ณ€์ˆ˜๊ฐ€ ํ‘œ์ค€ํŽธ์ฐจ๋กœ ์ธก์ •๋จ) 

•  success_expect : ์ž๊ธฐ๊ฐœ๋ฐœ ์„ฑ๊ณต ๊ธฐ๋Œ€๋„ (๋ฏธ๋ž˜ ์„ฑ๊ณต์— ๋Œ€ํ•œ ์ž์ฒด ๊ธฐ๋Œ€ ํ‰๊ฐ€) → ๋ฌด์ž‘์œ„ ํ• ๋‹น ์ด์ „์— ์ธก์ •๋œ ์„ ํ–‰ ์„ฑ๊ณผ์— ๋Œ€ํ•œ proxy ๋ณ€์ˆ˜๋กœ ํ™œ์šฉ 

•  intervention : ์„ธ๋ฏธ๋‚˜ ์ˆ˜์—… ์ฐธ๊ฐ€ ์—ฌ๋ถ€ 

 

 

 

•  ๋ฌด์ž‘์œ„ ์—ฐ๊ตฌ์ด๊ธด ํ•˜๋‚˜, ์ฐธ์—ฌ์˜ ๊ธฐํšŒ๋Š” ๋ฌด์ž‘์œ„๋กœ ์ด๋ฃจ์–ด์กŒ์Œ์—๋„ ์ฐธ์—ฌ ์ž์ฒด๋Š” ๊ทธ๋ ‡์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค. ์ž๊ธฐ๊ณ„๋ฐœ ์„ฑ๊ณต ๊ธฐ๋Œ€๋„๊ฐ€ ๋†’์€ ํ•™์ƒ๋“ค์€ ์„ฑ์žฅ ๋งˆ์ธ๋“œ ์„ธ๋ฏธ๋‚˜์— ์ฐธ์—ฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์•„๋ž˜์™€ ๊ฐ™์ด ์„ฑ๊ณต ๊ธฐ๋Œ€๋„ ๋ณ„ ํ‰๊ท  ์„ธ๋ฏธ๋‚˜ ์ฐธ์„๋ฅ ์„ ํ™•์ธํ•ด๋ณด๋ฉด ๊ธฐ๋Œ€๋„๊ฐ€ ๋†’์„ ์ˆ˜๋ก ์ฐธ์—ฌ์œจ์ด ์˜ฌ๋ผ๊ฐ€๋Š” ๊ฒƒ์„ ๋ฐ์ดํ„ฐ๋กœ๋„ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค โ‡จ ํŽธํ–ฅ์ด Positive 

 

data.groupby("success_expect")["intervention"].mean()

 

 

โ—ฏ   ์ฒ˜์น˜ํšจ๊ณผ  

 

•   ํšŒ๊ท€๋ถ„์„์„ ํ†ตํ•ด E[Y(0) | T=1] ๊ณผ E[Y(0) | T=0] ์˜ ๊ฐ’์ด ์–ด๋–ป๊ฒŒ ์ฐจ์ด ๋‚˜๋Š”์ง€ ํ™•์ธํ•ด๋ณด์ž 

 

smf.ols("achievement_score ~ intervention", data=data).fit().summary().tables[1]

 

•  ๊ฐœ์ž…์ด ์žˆ๋Š” ๊ฒฝ์šฐ์™€ ์—†๋Š” ๊ฒฝ์šฐ๋ฅผ ๋‹จ์ˆœํžˆ ๋น„๊ตํ•˜๋ฉด Treatment ๋ฅผ ๋ฐ›๋Š” ์‚ฌ๋žŒ์˜ ์„ฑ์ทจ๋„ ์ ์ˆ˜๊ฐ€ ํ‰๊ท ๋ณด๋‹ค 0.3185 ๋†’์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค (0.4723 - 0.1538) 

•   ํ‘œ์ค€ํ™”๋œ ๋ณ€์ˆ˜์ด๊ธฐ ๋•Œ๋ฌธ์— Treatment ๋ฅผ ๋ฐ›์€ ๊ทธ๋ฃน์€ ๋ฐ›์ง€ ์•Š์€ ๊ทธ๋ฃน๋ณด๋‹ค 0.3185๋งŒํผ ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ ํฌ๋‹ค๊ณ  ํ•ด์„ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 

 

 

 

 

โ‘ก  Propensity score 


 

โ—ฏ   ์„ฑํ–ฅ ์ ์ˆ˜ 

 

•   Confounder X ๋“ค์— ๋Œ€ํ•ด ์ง์ ‘์ ์œผ๋กœ ์ œ์–ดํ•˜์ง€ ์•Š์•„๋„ ์กฐ๊ฑด๋ถ€ ๋…๋ฆฝ์„ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์•„์ด๋””์–ด์—์„œ ์‹œ์ž‘ 

โ†ช  (Y1, Y0) ⊥ T | X 

 

•   ์„ฑํ–ฅ์ ์ˆ˜ : Treatment ์˜ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  P(T|X) ์ด๋‹ค. P(x) ๋กœ ๊ธฐ์ž…ํ•˜๊ธฐ๋„ ํ•œ๋‹ค. 

•   ์„ฑํ–ฅ์ ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜๋ฉด Treatment ์— ๋Œ€ํ•œ ์ž ์žฌ์  ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ๋…๋ฆฝ์„ฑ์„ ์–ป๊ธฐ ์œ„ํ•ด Confounder X ์ „์ฒด๋ฅผ ์กฐ๊ฑดํ™” ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ์„ฑํ–ฅ์ ์ˆ˜ ํ•˜๋‚˜๋งŒ ์ œ์–ดํ•˜๋Š” ๊ฒƒ์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•˜๋‹ค. 

 

 

 

•   ๋ณ€์ˆ˜ X๋ฅผ Treatment T ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ผ์ข…์˜ ํ•จ์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, ์„ฑํ–ฅ์ ์ˆ˜๋Š” X์™€ Treatment T ์‚ฌ์ด์˜ ์ค‘๊ฐ„ ์ง€์ ์„ ๋งŒ๋“ ๋‹ค. 

 

•   Treatment ๋ฅผ ๋ฐ›์€ ๊ทธ๋ฃน๊ณผ ๋ฐ›์ง€ ์•Š์€ Control ๊ทธ๋ฃน์—์„œ ๊ฐ๊ฐ ํ•œ ๋ช…์”ฉ Treatment ๋ฅผ ๋ฐ›์„ ํ™•๋ฅ ์„ ๊ฐ™๋„๋ก ์„ ํƒํ•œ๋‹ค๋ฉด ๋น„๊ต ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ๋งŒ์•ฝ ๋‘ ๋ช…์˜ Treatment๋ฅผ ๋ฐ›์„ ํ™•๋ฅ ์ด ์ •ํ™•ํ•˜๊ฒŒ ๊ฐ™๋‹ค๋ฉด, ๊ทธ๋“ค ์ค‘ 1๋ช…์€ Treatment๋ฅผ ๋ฐ›๊ณ  ๋‚˜๋จธ์ง€ ํ•œ ๋ช…์ด Treatment๋ฅผ ๋ฐ›์ง€ ๋ชปํ•œ ์œ ์ผํ•œ ์ด์œ ๋Š” ์šฐ์—ฐ์œผ๋กœ ๋ฐœ์ƒํ•œ ๊ฒฐ๊ณผ์ด๋‹ค. (๋ฌด์ž‘์œ„ ์กฐ๊ฑด ์„ฑ๋ฆฝ) 

 

 

 

 

 

โ—ฏ  ์ˆ˜ํ•™์  ํ•ด์„ 

 

•   P(x) = E[T|X]

•   (Y1, Y0) ⊥ T | P(x) ๊ฐ€ E[T | P(x),X] = E[Y | P(x)] ์ธ์ง€ ๋ณด์ด๋ฉด ๋œ๋‹ค. 

•   P(x) ๋ฅผ ์กฐ๊ฑด์œผ๋กœ ๋‘๋ฉด X๊ฐ€ T์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์ •๋ณด๋ฅผ ์ค„ ์ˆ˜ ์—†์Œ์„ ์˜๋ฏธํ•œ๋‹ค. 

 

 

 

 

 

 

 

 

โ‘ข  Propensity weighting 


 

โ—ฏ  IPTW

 

•  Treatment๋ฅผ ๋ฐ›์€ ๋ชจ๋“  ์‚ฌ๋žŒ์„ Treatment์˜ ์—ญํ™•๋ฅ ๋กœ ์ธก์ •ํ•˜์—ฌ Treatment๋ฅผ ๋ฐ›์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋งค์šฐ ๋‚ฎ์€ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋Š” ๋ฐฉ๋ฒ• 

 

 

โ€ป  treatment๋ฅผ ๋ฐ›์„ ํ™•๋ฅ ์ด ์ž‘์€ ๊ทธ๋ฃน์—๋Š” ๋” ๋งŽ์€ weight์„ ์ฃผ์–ด์„œ ํ™•๋ฅ ์„ ๋” ํ‚ค์šฐ๊ณ , treatment๋ฅผ ๋ฐ›์„ ํ™•๋ฅ ์ด ํฐ ๊ทธ๋ฃน์—๋Š” weight๋ฅผ ์ž‘๊ฒŒ ์ฃผ์–ด์„œ ํ™•๋ฅ ์„ ๋‚ฎ์ถ˜๋‹ค. treatment๋ฅผ ๋ฐ›์„ ํ™•๋ฅ ์„ ๋น„์Šทํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ์ฃผ๊ฒ ๋‹ค๋Š” ๊ฒŒ weighting ๋ฐฉ๋ฒ•์ด๋‹ค. 

 

 

 

•  ํŒŒ๋ž€์ƒ‰ ์  : Treatment ๋ฅผ ๋ฐ›์ง€ ์•Š๋Š” ๊ทธ๋ฃน,  ๋นจ๊ฐ„์ƒ‰ ์  : Treatment ๋ฅผ ๋ฐ›์€ ๊ทธ๋ฃน   

•  ์™ผ์ชฝ ํ•˜๋‹จ ๊ทธ๋ž˜ํ”„ : P(x) ์„ฑํ–ฅ์ ์ˆ˜ 

•  IPTW ์ ์šฉ ์ดํ›„ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด ๋นจ๊ฐ„์ƒ‰์ด ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ›์•˜์Œ์„ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

โ‘ฃ  Propensity score estimation 


 

โ—ฏ   ์„ฑํ–ฅ์ ์ˆ˜ ์ถ”์ • 

 

•  ์„ฑํ–ฅ์ ์ˆ˜๋ฅผ ์ถ”์ •ํ•  ๋•Œ, ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์€ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด์ง€๋งŒ, ๊ทธ๋ž˜๋””์–ธํŠธ ๋ถ€์ŠคํŒ…๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ๊ธฐ๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

•  ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋Š” ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋ฅผ ๋”๋ฏธ๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•œ๋‹ค. 

 

categ = ["ethnicity", "gender", "school_urbanicity"]
cont = ["school_mindset", "school_achievement", "school_ethnic_minority", "school_poverty", "school_size"]

data_with_categ = pd.concat([
    data.drop(columns=categ), # dataset without the categorical features
    pd.get_dummies(data[categ], columns=categ, drop_first=False)# categorical features converted to dummies
], axis=1)

print(data_with_categ.shape)

#  (10391, 32)

 

 

 

•  ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ์‚ฌ์šฉํ•ด ์„ฑํ–ฅ์ ์ˆ˜๋ฅผ ์ถ”์ •ํ•ด๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค. 

 

from sklearn.linear_model import LogisticRegression

T = 'intervention'
Y = 'achievement_score'
X = data_with_categ.columns.drop(['schoolid', T, Y])

ps_model = LogisticRegression(C=1e6).fit(data_with_categ[X], data_with_categ[T])

data_ps = data.assign(propensity_score=ps_model.predict_proba(data_with_categ[X])[:, 1])

data_ps[["intervention", "achievement_score", "propensity_score"]].head()

 

 

•  ๊ฐ€์ค‘์น˜๋ฅผ ์ƒ์„ฑํ•˜์—ฌ Treated/Untreated sample size ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

weight_t = 1/data_ps.query("intervention==1")["propensity_score"]
weight_nt = 1/(1-data_ps.query("intervention==0")["propensity_score"])
print("Original Sample Size", data.shape[0])
print("Treated Population Sample Size", sum(weight_t))
print("Untreated Population Sample Size", sum(weight_nt))

 

 

 

•  ๋˜ํ•œ ์„ฑํ–ฅ์ ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด Confounder ์˜ ์ฆ๊ฑฐ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค. ๋ชจ์ง‘๋‹จ์˜ ์„ธ๋ถ„ํ™”๊ฐ€ ๋‹ค๋ฅธ ๊ฒƒ๋ณด๋‹ค ์„ฑํ–ฅ ์ ์ˆ˜๊ฐ€ ๋” ๋†’๋‹ค๋ฉด ๋ฌด์ž‘์œ„๊ฐ€ ์•„๋‹Œ ๊ฒƒ์ด Treatment๋ฅผ ์œ ๋ฐœํ•˜๊ณ  ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค. 

 

sns.boxplot(x="success_expect", y="propensity_score", data=data_ps)
plt.title("Confounding Evidence");

 

 

 

•  Treatment๋ฅผ ๋ฐ›์€ ๊ทธ๋ฃน์™€ Treatment๋ฅผ ๋ฐ›์ง€ ์•Š์€ ๊ทธ๋ฃน ์‚ฌ์ด์— ์ค‘๋ณต์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์„ฑํ–ฅ ์ ์ˆ˜์˜ ์‹ค์ฆ์  ๋ถ„ํฌ๋ฅผ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

sns.distplot(data_ps.query("intervention==0")["propensity_score"], kde=False, label="Non Treated")
sns.distplot(data_ps.query("intervention==1")["propensity_score"], kde=False, label="Treated")
plt.title("Positivity Check")
plt.legend();

 

โ†ช Treatment ๊ฐ€ ๊ท ํ˜•์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋˜์—ˆ์Œ์„ ํ™•์ธ

 

 

 

•  ATE ์ถ”์ • 

 

 

weight = ((data_ps["intervention"]-data_ps["propensity_score"]) /
          (data_ps["propensity_score"]*(1-data_ps["propensity_score"])))

y1 = sum(data_ps.query("intervention==1")["achievement_score"]*weight_t) / len(data)
y0 = sum(data_ps.query("intervention==0")["achievement_score"]*weight_nt) / len(data)

ate = np.mean(weight * data_ps["achievement_score"])

print("Y1:", y1)
print("Y0:", y0)
print("ATE", np.mean(weight * data_ps["achievement_score"]))

 

โ†ช  Treatment ๋ฐ›์€ ๊ฐœ์ธ์ด Treatment๋ฅผ ๋ฐ›์ง€ ์•Š์€ ๋™๋ฃŒ๋ณด๋‹ค ์„ฑ์ทจ๋„ ๋ฉด์—์„œ 0.38 ํ‘œ์ค€ ํŽธ์ฐจ๊ฐ€ ๋  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€ํ•ด์•ผ ํ•œ๋‹ค. 

โ†ช  ์•„๋ฌด๋„ Treatment๋ฅผ ๋ฐ›์ง€ ์•Š์€ ๊ฒฝ์šฐ ์ผ๋ฐ˜์ ์ธ ์„ฑ์ทจ ์ˆ˜์ค€์ด ํ˜„์žฌ๋ณด๋‹ค 0.12 ํ‘œ์ค€ ํŽธ์ฐจ ๋‚ฎ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ด์•ผ ํ•จ

โ†ช  ๋งจ ์ฒ˜์Œ ์–ป์€ 0.47 ๊ณผ ๋น„๊ตํ•˜๋ฉด X๋ฅผ ํ†ต์ œํ•˜๋Š” ๊ฒƒ์ด ๋ณด๋‹ค Robust ํ•œ ๊ฒฐ๊ณผ์ž„์„ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

•  P(x) ์ถ”์ • ๊ณผ์ •์˜ error ๋ฅผ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ๋ถ€์ŠคํŠธํŠธ๋žฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ATE ์ถ”์ •๊ฐ’์˜ ๋ถ„ํฌ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ œ์‹œํ•ด ๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค. 

 

 

 

 

 

 

 

 

โ‘ค  Propensity score ์˜ ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ 


 

•  ์„ฑํ–ฅ ์ ์ˆ˜ ์ถ”์ •์„ ํ•˜๋Š” ๋ฐ ์žˆ์–ด ์ •ํ™•๋„๋ฅผ ๋†’์ด๋ ค๊ณ  ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ์‹ค, ์„ฑํ–ฅ ์ ์ˆ˜์˜ ์˜ˆ์ธก๋ ฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด ์ธ๊ณผ ์ถ”๋ก  ๋ชฉํ‘œ๊ฐ€ ์†์ƒ๋  ์ˆ˜๋„ ์žˆ๋‹ค. ์„ฑํ–ฅ ์ ์ˆ˜๋Š” Treatment์˜ ํ™•๋ฅ ์„ ์ž˜ ์˜ˆ์ธกํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ๋‹จ์ง€ ๋ชจ๋“  confounder ๋ณ€์ˆ˜๋ฅผ ํฌํ•จํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋œ๋‹ค. 

 

 

•  Treatment๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ Confounder๋ฅผ ์ œ์–ดํ•˜๋Š” โ€‹โ€‹๋ฐฉ์‹์œผ๋กœ ์˜ˆ์ธก์„ ๊ตฌ์„ฑํ•ด์•ผ ํ•œ๋‹ค. Treatment ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์•„๋ฌด๋Ÿฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋Š” ๋ณ€์ˆ˜๋ฅผ ํฌํ•จํ•˜๋ฉด ์ถ”์ •๊ฐ’์˜ ๋ถ„์‚ฐ์ด ์ฆ๊ฐ€ํ•œ๋‹ค. 

 

 

 

•  ์„ฑํ–ฅ ์ ์ˆ˜๊ฐ€ ์žˆ์„ ๋•Œ X๋ฅผ ํ†ต์ œํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. ์„ฑํ–ฅ ์ ์ˆ˜๋Š” ํŠน์ง• ๊ณต๊ฐ„์— ๋Œ€ํ•œ ์ผ์ข…์˜ ์ฐจ์› ์ถ•์†Œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ์„ฑํ–ฅ ์ ์ˆ˜๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ์ž…๋ ฅ ๊ธฐ๋Šฅ์œผ๋กœ ์ทจ๊ธ‰ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ํšŒ๊ท€๋ชจ๋ธ์„ ๋Œ๋ฆฐ ์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด๋ฉด 

 

smf.ols("achievement_score ~ intervention + propensity_score", data=data_ps).fit().summary().tables[1]

 

์ด์ „์— ์–ป์€ 0.47๋ณด๋‹ค ๋‚ฎ๊ฒŒ ATE ๋ฅผ 0.39๋กœ ์ถ”์ •ํ•จ์„ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 

 

 

 

 

 

 

728x90

๋Œ“๊ธ€