๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐ŸฅŽ Casual inference

์ธ๊ณผ์ถ”๋ก ์˜ ๋ฐ์ดํ„ฐ ๊ณผํ•™ - ์ธ๊ณผ์ถ”๋ก ์„ ์œ„ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ

by isdawell 2023. 6. 26.
728x90

 

๐Ÿ‘€ ์ธ๊ณผ์ถ”๋ก  ๊ฐœ์ธ ๊ณต๋ถ€์šฉ ํฌ์ŠคํŠธ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ถœ์ฒ˜๋Š” ์ฒจ๋ถ€ํ•œ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”!

 

 

 

•  Econ ML : Microsoft EconML 

•  Causal Graph : Microsoft DoWhy 

 

→ ๋ฉ”๋‰ด์–ผ์„ ์ฝ๊ณ  ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด๋Š” ๊ฒƒ์„ ์ถ”์ฒœ 

 

•  Multi-armed Bandits : ์˜จ๋ผ์ธ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•์ด ๊นŒ๋‹ค๋กญ๊ณ  ์ด๋ฅผ ํ•  ์ˆ˜ ์žˆ๋Š” ์ฝ”๋”ฉ ๋Šฅ๋ ฅ์ด๋ฉด ํ”„๋ ˆ์ž„์€ ๋ณ„ ์˜๋ฏธ ์—†์Œ 

 

 

 

โ‘   ECON ML models (Potential outcomes) 


 

•   ์ธ๊ณผ ๊ด€๊ณ„/๊ตฌ์กฐ๋ฅผ ์ด๋ฏธ ์•Œ๊ณ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. 

 

 

•   ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ฐ๊ธฐ ๋‹ค๋ฅด์ง€๋งŒ, ๋น„์Šทํ•œ ๋ชฉ์ ์„ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์กด์žฌํ•˜๋Š” ๋ชจ๋ธ๋“ค์ด๋‹ค. A/B testing ์ด ๋˜๋Š” ํ™˜๊ฒฝ์—์„œ EconML ์„ ์ ์šฉํ•ด๋ณด๋ฉด ์ข‹๋‹ค. (Same Goal, Different approach) 

 

 

 

โ—ฏ  ML ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ causal model ์„ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ  

 

•   show better performance (with abundant data ๋ฐ์ดํ„ฐ๊ฐ€ ํ’๋ถ€ํ•  ๋•Œ๋งŒ ์„ฑ๋Šฅ์ด ๋” ์ข‹๊ฒŒ ๋‚˜์˜ด) 

 

•   measure heterogeneity effects of treatment (CATE > ATE) 

โ†ช ์ฟ ํฐ ์ง€๊ธ‰ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ ๋‘ ๊ทธ๋ฃน์˜ ํ‰๊ท ์ ์ธ ์ฐจ์ด : ATE 
โ†ช ML ์ด ๋“ค์–ด๊ฐ€๋ฉด์„œ condition ์ด ๋”ฐ๋ผ ๋ถ™์„ ์ˆ˜ ์žˆ์Œ. (ํ‰์ณ์„œ ํ‰๊ท ๋‚ด๋Š” ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ) ์‚ฌ๋žŒ์˜ ํŠน์„ฑ๋งˆ๋‹ค effect ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋Š” ๋ถ€๋ถ„๋“ค์„ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ์Œ 

 

•   address high-dim and sparse data (e.g. lasso with Double ML) 

•   address flexible data forms (e.g. continuous treatment effects) : ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ์šด feature ๋ฅผ ๋ฝ‘์•„ causal effect ๋ฅผ ์ถ”์ •ํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•œ๋‹ค. 

 

•   have less restrictive assumptions (e.g. non-parametric) 

 

 

 

โ—ฏ  ์—ฌ๋Ÿฌ ๋ชจ๋ธ๋“ค์„ ์กฐํ•ฉํ•ด์„œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

๋ชจ๋ธ ๋ณ„ ํŠน์ง•

 

 

 

โ—ฏ  EconML ์˜ Motivating 

 

•   heterogeneity effects of treatment ๋ฅผ ํ†ตํ•ด effect ๊ฐ€ ๋น„์Šทํ•œ ๋Œ€์ƒ๋ผ๋ฆฌ ๋ฌถ์–ด์„œ Targeting, Segmentation, Personalization ๋“ฑ ๋น„์ฆˆ๋‹ˆ์Šค task ๋“ค์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋‹ค. 

 

 

 

โ—ฏ  EconML - Example : Uplift 

 

•   random ํ•˜๊ฒŒ campaign ์„ assign ํ•˜๋ฉด loss ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ highly effective ํ•œ ์œ ์ €๋“ค์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. 

•   Potential ์— ๊ฐ€๊น๊ฒŒ ํšจ์œจ์„ ๋†’์ด์ž = Uplift 

 

 

•   Uplift Process : A/B testing ์‹คํ–‰ → characteristics ์— ๋”ฐ๋ฅธ heterogeneity effect ๋ฅผ ๊ฐœ์ธ ๋ณ„๋กœ ๋ฝ‘์•„๋ƒ„ → effect ๋ฅผ ranking ์ˆœ์œผ๋กœ ๋‚˜์—ดํ•ด์„œ ์ƒ์œ„๋ถ€ํ„ฐ ์ ์šฉํ•ด ๋‚˜์•„๊ฐ (์˜ˆ์‚ฐ์„ ranking ์ด ๋†’์€ ์‚ฌ๋žŒ๋ถ€ํ„ฐ ์ ์šฉํ•ด ๋‚˜์•„๊ฐ€๋Š” ๋ฐฉ์‹) 

•   Uplift ๋ฐฉ์‹์„ ํ†ตํ•ด ์˜ˆ์‚ฐ์„ ์‚ฌ์šฉํ•  ๋•Œ์—๋Š” ๊ทธ๋ž˜ํ”„์—์„œ 20%๋งŒ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ๋„ Top quantile ranking ๋ฐฉ์‹์—์„  40%์˜ ํšจ๊ณผ๊ฐ€ ๋‚˜์˜ด,  40%์˜ ์˜ˆ์‚ฐ์„ ์ผ๋Š”๋ฐ 70%์˜ ํšจ๊ณผ๊ฐ€ ๋‚˜์˜ด

 

 

 

โ—ฏ  EconML ๊ตฌํ˜„ ๋ฐฉ๋ฒ• 

 

 

 

•   CATE ์—์„œ ์ค‘์š”ํ•œ ๊ฑด X (covariates - ex. demographic information) 

 

•   ๊ตฌํ˜„ ์ž์ฒด๋Š” ๋ฌธ์ œ๊ฐ€ ๋˜์ง€ ์•Š๋Š”๋‹ค. ๊ฒฐ๊ตญ ๋ชจ๋ธ๋ง์„ ์œ„ํ•œ Theory & Domain knowledge ๊ฐ€ ํ•„์ˆ˜์ด๋‹ค. ์ฆ‰ ML/DL causal model ์€ ์—ฌ์ „ํžˆ manual design ์ด ํ•ต์‹ฌ์ด๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค. Unobserved important confounder ๊ฐ€ ์—†์–ด์•ผ ํ•œ๋‹ค โ‡จ A/B test ๋ฅผ ์—ฌ๋Ÿฌ๋ฒˆ ํ•ด๋ณด๋ฉด์„œ Treatment ๋ฅผ ๋ฐ”๊ฟ”๋ณด๋Š” Agile ํ•œ ๋ฐฉ์‹์œผ๋กœ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ด๋ณด์ž! ์‹ค๋ฌด์—์„œ๋Š” ์‹คํ—˜๋ฐ์ดํ„ฐ+Uplift ๋ฅผ ๋จผ์ € ์ ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. 

 

 

 

โ—ฏ  ์–ด๋–ค ๋ชจ๋ธ์„ ์„ ํƒํ•ด์•ผ ํ• ๊นŒ

 

•   ๋จธ์‹ ๋Ÿฌ๋‹ 

 

•   Econ ML model ๋„ ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ์ฒ ํ•™์€ ๋‹ค๋ฅด์ง€๋งŒ ํ•˜๋Š” ์ผ์€ ๊ฐ™๋‹ค. ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฒฝ์šฐ์—๋Š” ์™œ ์ž˜๋˜๋Š”์ง€ ์„ค๋ช…์„ ํ•  ์ˆ˜ ์—†๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‹ค ๊ตฌํ˜„ํ•ด๋ณด๊ณ  ๊ฐ€์žฅ ์ž˜ ์ž‘๋™ํ•˜๋Š” ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค. (cross validation) 

•   Application ์˜ value ์™€ impact ์— ๋” ์ดˆ์ ์„ ๋งž์ถ”์ž! 

 

 

 

โ—ฏ  ์ž๋™ํ™” ๋ชจ๋ธ 

 

•   Google's Bayesian structural time series (a.k.a Causal Impact) 

 

๊ด‘๊ณ ๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ํšจ๊ณผ๊ฐ€ ์ข‹๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์คŒ

 

โ†ช  ๋ช…๋ฐฑํ•œ ํŒจํ„ด๋งŒ ์žก์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค. trend/seasonal/cyclical ํ•œ obvious ํ•œ ํŒจํ„ด์ด ๋ณด์ด๋Š” ๊ฒฝ์šฐ์—๋งŒ ์ž˜ ๋™์ž‘ํ•œ๋‹ค. 

 

 

•   Transfer Entropy : Multivariate TS features 

 

๋ฐ์ดํ„ฐ๊ฐ€ ๊น”๋”ํ•˜๊ณ , ๋ถ„๋ช…ํ•œ ์ธ๊ณผํŒจํ„ด์ด ์žˆ๋‹ค๋ฉด ์‹œ๋„ํ•ด๋ณผ๋งŒํ•œ ๋ชจ๋ธ

 

 

 

 

โ‘ก  Graph-Based causal models 


 

•  ๋ฒ ์ด์ง€์•ˆ ๋„คํŠธ์›Œํฌ๋ฅผ ๋งŒ๋“  ์‚ฌ๋žŒ 

 

 

 

•  Bayesian Network can incorporate complex relationships : ๋ฐ์ดํ„ฐ๊ฐ€ ํ’๋ถ€ํ•˜๊ณ  ๋ถ„๋ช…ํ•œ ์ธ๊ณผ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉด ์ถ”์ฒœํ•˜๋Š” ๋ฐฉ์‹ 

 

 

โ†ช  ๋ƒ‰์žฅ๊ณ ๋ฅผ ํŒ”์•„๋„ ๊ธฐ์‚ฌ๋ฅผ ๋ถ€๋ฅด๋ฉด ์ธ๊ฑด๋น„์— ์˜ํ•ด ๋งˆ์ง„์ด ์•ˆ๋‚จ์Œ. ์ฝœ์„ผํ„ฐ์— ๋ƒ‰์žฅ๊ณ  ๊ณ ์žฅ์ด ๋‚ฌ์„ ๋•Œ ๋ฉ”๋‰ด์–ผ์„ causal relation ์œผ๋กœ ๋„ฃ์–ด์„œ ์ƒ๋‹ดํ•˜๋„๋ก ํ•จ. ๊ธฐ์‚ฌ๊ฐ€ ํ•„์š” ์—†์ด ์ˆ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•œ ์ผ€์ด์Šค๋“ค์„ ๋งŽ์ด ๋ฐœ๊ตดํ•ด ๋น„์šฉ์„ ์ ˆ๊ฐํ•œ ์‚ฌ๋ก€  

 

 

•  Do-calculus 

 

•   Graph-based model ์€ ์‚ฌํšŒ๊ณผํ•™ ๋ถ„์•ผ์— ์ ์šฉํ•˜๊ธฐ๋Š” ์กฐ๊ธˆ ์–ด๋ ต๋‹ค. ์ฃผ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ๊น”๋”ํ•˜๊ณ  + ์ธ๊ณผ ๊ด€๊ณ„๊ฐ€ ๋ช…ํ™•ํ•œ ๊ฒฝ์šฐ์—๋งŒ ์ ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. 

 

→ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ์ธ๊ณผ๋ชจ๋ธ์ด ์‚ฌ์šฉ๋˜๋Š” ์˜ˆ์‹œ์— ๋Œ€ํ•œ ๋…ผ๋ฌธ

 

 

 

โ‘ข  Multi-armed Bandits 


 

•  ๊ธฐ๋ณธ์€ ๊ฐ•ํ™”ํ•™์Šต์— ์žˆ๋‹ค. 

 

•  Exploration (์•ˆํ•ด ๋ณธ ๊ฒƒ์„ ํ•ด๋ณด๋Š” ๊ฒƒ) and Exploitation (ํ˜„์žฌ ์•Œ๊ณ ์žˆ๋Š” ๊ฒƒ์— ๋” ์ง‘์ค‘)

 

 

•  MAB : ์ตœ์†Œํ•œ์˜ ๋…ธ๋ ฅ์œผ๋กœ ์ตœ๋Œ€ํ•œ์˜ ๋…ธ๋ ฅ์„, Exploration & Exploitation  ์„ ์ ์ ˆํžˆ ์กฐํ•ฉํ•ด์„œ ๋‹ฌ์„ฑํ• ๊นŒ 

 

 

•  MAB ๋Š” Advanced AB testing ์ด๋‹ค. 

 

ex. ๊ด‘๊ณ ์ฑ„๋„ ๋ฐฐ๋ถ„ ๋ฌธ์ œ

 

•  Two naive approaches : (์™„์ „ exploration ๋ฐฉ์‹์œผ๋กœ ์‹œ๋„ํ•ด๋ณด๋Š” ๊ฒƒ = Random), (์™„์ „ exploitation ๋ฐฉ์‹์œผ๋กœ ์‹œ๋„ํ•ด๋ณด๋Š” ๊ฒƒ = Greedy) 

 

 

•  Epsilon greedy approach : ๋žœ๋คํ•˜๊ฒŒ exploit ์™€ explore ์ค‘์— ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์„ž์–ด์„œ ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹ (Epsilon ์˜ ๋น„์œจ๋กœ) 

 

 

•  Sampling ์„ ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ˆ๊นŒ Confidence interval ์„ ํ•ด๋†“๊ณ  ์‹ ๋ขฐ๊ฐ€ ๋“ค ์ •๋„์˜ ์ˆ˜์ค€์ด ์•„๋‹ˆ๋ฉด exploration ์„ ๋” ํ•˜์ž๋Š” ๋ฐฉ์‹ : UCB 

 

 

 

 

•  ํŠน์ • ๋ถ„ํฌ์—์„œ sampling ์„ ํ•ด๋ณด์ž๋Š” ๋ฐฉ์‹ : Thompson Sampling 

โ†ช ์ข€ ๋” ์ž˜ ๋™์ž‘ํ•˜๋Š” ๋ฐฉ์‹์ด๋ผ baseline ์œผ๋กœ ๋งŽ์ด ์„ค์ •ํ•จ 

 

์ผ๋ฐ˜์ ์œผ๋กœ Binary ์—์„œ beta distribution ์„ ์‚ฌ์šฉ

 

 

•  ์ฑ„๋„๋“ค์˜ covariate ์„ ๋„ฃ์–ด์„œ ๊ณ ๋ คํ•˜๋Š” ๋ฐฉ์‹ (CATE) : Contextual bandits 

 

 

728x90

๋Œ“๊ธ€