๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐ŸฅŽ Casual inference

์ธ๊ณผ์ถ”๋ก ์˜ ๋ฐ์ดํ„ฐ ๊ณผํ•™ - ์ž ์žฌ์ ๊ฒฐ๊ณผ ํ”„๋ ˆ์ž„์›Œํฌ

by isdawell 2023. 4. 21.
728x90

 

์ฐธ๊ณ ์˜์ƒ : Bootcamp 2-1. ์ž ์žฌ์  ๊ฒฐ๊ณผ ํ”„๋ ˆ์ž„์›Œํฌ 

 

 

 

1. Potential outcome framework


 

โ—ฏ  Potential outcome framework 

 

 

•  ํŠน์ • ์›์ธ (treatment) ์˜ ์ธ๊ณผ์  ํšจ๊ณผ๋ฅผ ์ž ์žฌ์  ๊ฒฐ๊ณผ์˜ ์ฐจ์ด๋กœ์„œ ์ •์˜ํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๊ฒƒ 

•  ๊ทธ๋•Œ์˜ ๊ฒฐ์ •์ด ์›์ธ์ด ๋˜์–ด ์ง€๊ธˆ์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ด. ๊ทธ๋•Œ์˜ ๊ฒฐ์ •์ด ๋‹ฌ๋ž๋‹ค๋ผ๋ฉด ์ž ์žฌ์ ์ธ ๊ฒฐ๊ณผ๋Š” ์–ด๋–ป๊ฒŒ ๋˜์—ˆ์„๊นŒ โ‡จ ์šฐ๋ฆฌ๊ฐ€ ์ผ์ƒ์†์—์„œ ์‚ฌ๊ณ ํ•˜๋Š” ๋ฐฉ์‹์ด Potential outcome ์ด ์ธ๊ณผ๊ด€๊ณ„๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ ๋ณผ ์ˆ˜ ์žˆ์Œ 

 

•  Causal effect = (treatment ๋ฅผ ๋ฐ›์€ ์‹ค์ œ ๊ฒฐ๊ณผ) - (treatment ๋ฅผ ๋ฐ›์ง€ ์•Š์•˜๋”๋ผ๋ฉด ์žˆ์—ˆ์„ ์ž ์žฌ์  ๊ฒฐ๊ณผ) 

 

 

โ—ฏ  Counterfactual 

 

•  treatment ๋ฅผ ๋ฐ›์ง€ ์•Š์•˜๋”๋ผ๋ฉด ์žˆ์—ˆ์„ ์ž ์žฌ์  ๊ฒฐ๊ณผ๋ฅผ Counterfactual ์ด๋ผ ๋ถ€๋ฅธ๋‹ค. 

 

 

โ—ฏ  Average Treatment Effect 

 

 

•  1๋ฒˆ๊ณผ 2๋ฒˆ์ด treatment ๋ฅผ ๋ฐ›์€ ์‚ฌ๋žŒ๋“ค์ด๊ณ , 3๋ฒˆ๊ณผ 4๋ฒˆ์ด treatment ๋ฅผ ๋ฐ›์ง€ ์•Š์€ ์‚ฌ๋žŒ๋“ค์ด๋ผ ํ•˜๋ฉด, 1๋ฒˆ๊ณผ 2๋ฒˆ์—์„œ๋Š” ์ด๋ฏธ ์ฒ˜์น˜๋ฅผ ๋ฐ›์•˜๊ธฐ ๋•Œ๋ฌธ์— potential outcome ๋Š” ํ˜„์‹ค์—์„œ ๊ด€์ฐฐ ๋ถˆ๊ฐ€๋Šฅ ํ•˜๋‹ค. 3๋ฒˆ๊ณผ 4๋ฒˆ์— ๋Œ€ํ•ด์„œ๋„ treatment ๋ฅผ ๋ฐ›์•˜๋‹ค๋ฉด ์žˆ์—ˆ์„ ๊ฒฐ๊ณผ๋ฅผ ๊ด€์ฐฐํ•  ์ˆ˜๋Š” ์—†๋‹ค. 

•  ๋”ฐ๋ผ์„œ ํŠน์ • ๊ฐœ์ธ์— ๋Œ€ํ•œ individual treatment ๋Š” ๊ตฌํ•  ์ˆ˜ ์—†๋‹ค. ํ•˜์ง€๋งŒ ์ ์–ด๋„, treatment ๋ฅผ ๋ฐ›์€ ๊ทธ๋ฃน์—์„œ์˜ ํ‰๊ท ๊ณผ, ๋ฐ›์ง€ ์•Š์€ ๊ทธ๋ฃน์—์„œ์˜ ํ‰๊ท ์„ ๊ตฌํ•ด ํ‰๊ท ์ ์ธ ์ธ๊ณผํšจ๊ณผ๋Š” ์ถฉ๋ถ„ํžˆ ์ถ”์ •ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค โ‡จ Average treatment effect ATE 

 

 

 

2. Fundamental problem of causal inference 


 

 

 

•  Potential outcome framework ์—์„œ ์ธ๊ณผ์ถ”๋ก ์„ ์œ„ํ•ด ์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์€ counterfactual ๋ถ€๋ถ„์ธ๋ฐ, ํ˜„์‹ค์—์„œ ์šฐ๋ฆฌ๊ฐ€ ์ง์ ‘ ๋ฐ์ดํ„ฐ๋กœ ๊ด€์ฐฐ ๊ฐ€๋Šฅํ•œ ๊ฒƒ์€ treatment ๋ฅผ ๋ฐ›์ง€ ์•Š์€ control group ์ด๋‹ค. 

 

 

•  ์ธ๊ณผ์ถ”๋ก ์ด ์–ด๋ ค์šด ์ด์œ ๋Š”, ์ด์ƒ์ ์ธ counterfactual ๊ณผ ํ˜„์‹ค์—์„œ์˜ control group ๊ฐ„์˜ ์ฐจ์ด์— ๊ธฐ์ธํ•œ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ, ์ฆ‰ counterfactual ์— ๊ฐ€๊น๋„๋ก control ์„ ์„ค์ •ํ•˜๋Š” ์ธ๊ณผ์ถ”๋ก  ๋ฐฉ๋ฒ•๋“ค์ด ๋“ฑ์žฅํ•˜๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค. 

 

•  Potential outcome framework ๋Š” ์ธ๊ณผ์ถ”๋ก ์„ missing value ๊ด€์ ์œผ๋กœ ๋ณด๊ธฐ๋„ ํ•œ๋‹ค. treament group ๊ณผ control group ์—์„œ์˜ ๊ด€์ธก ๋ถˆ๊ฐ€๋Šฅํ•œ cell ์˜ ๊ฐ’์„ ๋ฌด์‹œํ•ด๋„ ๋œ๋‹ค๋ผ๋Š” ์ธก๋ฉด์—์„œ ์ด๋Ÿฌํ•œ ์กฐ๊ฑด์„ Ignorability ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ํ˜น์€ ๋‘ ๊ทธ๋ฃน ๊ฐ„์˜ ๊ฐ’์„ ๊ตํ™˜ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์˜๋ฏธ์—์„œ exchangability ๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค. 

 

•  ๋ฐ˜๋ฉด, selection bias ๋‚˜ causal graph ๊ด€์ ์—์„œ์˜ ์šฉ์–ด๋กœ ๋ณด๋ฉด, treatment ์™€ result ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์™ธ๋ถ€ ์š”์ธ(๊ต๋ž€ ์š”์ธ, confounder) ์ด ์—†์–ด์•ผ ์ธ๊ณผ์ถ”๋ก ์˜ ์กฐ๊ฑด์ด ์„ฑ๋ฆฝํ•œ๋‹ค๋Š” ์˜๋ฏธ์—์„œ Unconfoundess ๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค. ํ†ต๊ณ„์ ์ธ ๊ด€์ ์—์„œ๋Š” Exogeneity (๋‚ด์ƒ์„ฑ์ด ์—†๋Š” ์กฐ๊ฑด, ์™ธ์ƒ์„ฑ) ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. 

 

โ‡จ ๊ฒฐ๊ตญ ๋ชจ๋‘ ๊ทผ๋ณธ์ ์œผ๋กœ๋Š” counterfactual ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด control group ์„ ์ฐพ์Œ์œผ๋กœ์จ counterfactual ์„ ๋Œ€์‹ ํ•˜๋Š” ์ „๋žต์ด๋ผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

•  ๋ฐ˜๋ ค๋™๋ฌผ์„ ํ‚ค์šฐ๋Š” ์‚ฌ๋žŒ๊ณผ ํ‚ค์šฐ์ง€ ์•Š๋Š” ์‚ฌ๋žŒ์„ ๋‹จ์ˆœ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์ด ์™œ ์ธ๊ณผ์ถ”๋ก ์„ ์œ„ํ•ด ์ ์ ˆํ•˜์ง€ ์•Š์€์ง€ ์ด์ œ ์–ด๋Š์ •๋„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

3.  Selection bias


 

โ—ฏ  Selection bias 

 

 

•  ๋ฐ˜๋ ค๋™๋ฌผ์„ ํ‚ค์šฐ๋Š” ๊ฒƒ๋„ ์‚ฌ๋žŒ๋“ค์˜ ์ž๋ฐœ์ ์ธ ์„ ํƒ์ด๋‹ค. ๋งŒ์•ฝ ๊ทธ๋Ÿฌํ•œ ์‚ฌ๋žŒ๋“ค์˜ ์ž๋ฐœ์ ์ธ ์„ ํƒ์ด, ์šฐ๋ฆฌ๊ฐ€ ๊ด€์‹ฌ์žˆ๋Š” ๊ฒฐ๊ณผ์ธ '์šฐ์šธ์ฆ' ๊ณผ ๊ด€๋ จ์ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. ๊ฐ€๋ น, ์šฐ์šธ์ฆ์ด ๋†’์€ ์‚ฌ๋žŒ๋“ค์ด ๋ฐ˜๋ ค๋™๋ฌผ์„ ํ‚ค์šฐ๊ณ ์ž ํ•˜๋Š” ์„ฑํ–ฅ์ด ๋†’์„ ์ˆ˜ ์žˆ๋‹ค. 

•  ์ธ๊ณผ์ ์ธ ํšจ๊ณผ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์™„๋ฒฝํ•œ counterfactual ์„ ํ˜„์‹ค์—์„œ ๊ด€์ฐฐํ•  ์ˆ˜๋Š” ์—†๊ฒ ์ง€๋งŒ, ๋ฐ˜๋ ค๋™๋ฌผ์„ ํ‚ค์šฐ๋Š” ์‚ฌ์‹ค์„ ์ œ์™ธํ•˜๊ณ ๋Š” ๋‚˜๋จธ์ง€ ์š”์ธ๋“ค์ด ๊ทธ๋‚˜๋งˆ ์ตœ๋Œ€ํ•œ ๋น„์Šทํ•œ ์‚ฌ๋žŒ๋“ค๋ผ๋ฆฌ ๋ฌถ์–ด์„œ ๊ทธ๋“ค์„ ๋น„๊ตํ•˜๋ฉด ์ตœ๋Œ€ํ•œ ๋ฐ˜๋ ค๋™๋ฌผ์„ ํ‚ค์šฐ๋Š” ๊ฒƒ์˜ ์ธ๊ณผํšจ๊ณผ๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค. 

•  ๊ทธ๋Ÿฌ๋‚˜ ์ •๋ง๋กœ ๋น„์Šทํ•œ์ง€์˜ ์—ฌ๋ถ€๋Š” ๋˜ ๋‹ค์‹œ ํ•œ๋ฒˆ ๋” ์‚ดํŽด๋ด์•ผ ํ•œ๋‹ค. 

•  Counterfactual ๊ณผ Control group ๊ฐ„์˜ ์ฐจ์ด๋ฅผ selection bias ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. treatment ๋ฅผ ๋ฐ›์„์ง€ ๋ง์ง€๋ฅผ ์‚ฌ๋žŒ๋“ค์ด ์„ ํƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๊ณณ์—์„œ ๋ถ€ํ„ฐ ํŽธํ–ฅ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. selection bias ๋ฅผ ์•ผ๊ธฐํ•˜๋Š” ๊ต๋ž€ ์š”์ธ์„ confounding factor ๋ผ ๋ถ€๋ฅธ๋‹ค. ์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์˜ X ๋ผ๋Š” ๋ณ€์ˆ˜๊ฐ€ ๋‘ ๊ทธ๋ฃน ๊ฐ„์˜ ์ฐจ์ด selection bias ๋ฅผ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ X ๋ฅผ confounding factor ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. 

 

 

 

•  ํ˜„์‹ค์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ๋ถ„์„ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€, treatment ๋ฅผ ๋ฐ›์€ ์‚ฌ๋žŒ๋“ค์˜ outcome ๊ณผ control group ์˜ outcome ์˜ ์ฐจ์ด๋‹ค. ์—ฌ๊ธฐ์„œ ๊ฐ™์€ ๊ฐ’์ธ counterfactual ์„ ๋”ํ•˜๊ณ  ๋นผ๋ฉด ์œ„์™€ ๊ฐ™์ด ์“ธ ์ˆ˜ ์žˆ๋‹ค. ์ด ์‹์„ ๋‹ค์‹œ ์žฌ์ •๋ฆฝํ•˜๋ฉด, Causal effect ๋ถ€๋ถ„๊ณผ  selection bias (causual effect ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉํ•ดํ•˜๋Š” ์š”์ธ, treatment ๊ทธ๋ฃน์—์„œ treat ๋ฅผ ๋ฐ›์ง€ ์•Š์•˜์„ ๋•Œ ์ƒ๊ธฐ๋Š” ๊ฒฐ๊ณผ : counterfactual ์™€ ์‹ค์ œ treat ๋ฅผ ๋ฐ›์ง€ ์•Š์€ ์ง‘๋‹จ : control ๊ฐ„์˜ ์ฐจ์ด) ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆ„์–ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

•  Observed effect of the treatment = Causal effect + selection bias 

 

 

 

โ—ฏ  Ceteris Paribus โ“ Comparable control group 

 

 

•  selection bias ๋ฅผ ์—†์• ๊ธฐ ์œ„ํ•œ ์กฐ๊ฑด : Ceteris Paribus โ‡จ treatment ๋ฅผ ๋ฐ›์•˜๋‹ค๋Š” ์‚ฌ์‹ค๋งŒ ์ œ์™ธํ•˜๊ณ  ๋‚˜๋จธ์ง€ ๋ชจ๋“  ์š”์ธ๋“ค์ด ๋™์ผํ•ด์„œ ๋น„๊ต ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์กฐ๊ฑด 

 

 

•  Selection bias ๋ฅผ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์— ๊ด€๋ จํ•œ ์—ฐ๊ตฌ ์‚ฌ๋ก€ : ์žฌํƒ๊ทผ๋ฌด์˜ ์ธ๊ณผ์ ์ธ ํšจ๊ณผ๋ฅผ ์ถ”์ •

•  ์ฝœ์„ผํ„ฐ ์ง์›๋“ค์—๊ฒŒ ์žฌํƒ๊ทผ๋ฌด ํฌ๋ง์ž๋ฅผ ์ง€์› ๋ฐ›์Œ → ์žฌํƒ ๊ทผ๋ฌด๋ฅผ ์„ ํƒํ•œ ์‚ฌ๋žŒ์ด treatment group ์ด ๋˜๊ณ , ์žฌํƒ๊ทผ๋ฌด๋ฅผ ํ•˜์ง€ ์•Š๊ฒ ๋‹ค๊ณ  ์„ ํƒํ•œ ์‚ฌ๋žŒ์ด control group ์ด ๋จ 

•  ์žฌํƒ๊ทผ๋ฌด๋ฅผ ์ง€์›ํ•œ ์‚ฌ๋žŒ๊ณผ ๊ทธ๋ ‡์ง€ ์•Š์€ ์‚ฌ๋žŒ ๊ฐ„์— ์—ฌ๋Ÿฌ ํŠน์„ฑ๋“ค์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Œ (ex. ์•„์ด๋ฅผ ๊ฐ€์ง„ ์‚ฌ๋žŒ) 

•  ๊ฐ€์žฅ ์ด์ƒ์ ์ธ ์ƒํ™ฉ : ์žฌํƒ๊ทผ๋ฌด์˜ ์ธ๊ณผ์ ์ธ ํšจ๊ณผ๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ  treament group ๊ณผ, ์žฌํƒ๊ทผ๋ฌด๋ฅผ ํ•˜์ง€ ์•Š์•˜์œผ๋ฉด ์žˆ์—ˆ์„ counterfactual ๋ฅผ ๋น„๊ต 

•  ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„œ๋Š” selection bias ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ๋ฐ–์— ์—†์Œ โ‡จ ํ˜„์‹ค์—์„œ ๋น„๊ต ๊ฐ€๋Šฅํ•œ control group ์„ ๋งŒ๋“ค์–ด์•ผ ํ•จ 

 

 

•   ์žฌํƒ๊ทผ๋ฌด์— ์ž๋ฐœ์ ์œผ๋กœ ์ง€์›ํ•œ ์‚ฌ๋žŒ๋“ค ์ค‘์—์„œ, ๋žœ๋คํ•˜๊ฒŒ treatment group ๊ณผ control group ์œผ๋กœ ๋‚˜๋ˆ„๋ฉด, randomized control group ์€ ์ด์ƒ์ ์ธ counterfactual ์ด ๋œ๋‹ค. 

•   ์œ„์™€ ๊ฐ™์ด ์‹คํ—˜์„ ์„ค๊ณ„ํ•˜๋ฉด, Randomized control group ๊ณผ self-selected control group ๊ฐ„์˜ ์ฐจ์ด์ธ selection bias ๋ฅผ ๊ตฌํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

 

 

 

 

 

728x90

๋Œ“๊ธ€