๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐Ÿ EconML Library

1. Machine learning-based estimation of heterogeneous Treatment Effects and Motivating examples

by isdawell 2023. 6. 27.
728x90

 

 

 

โ—ฏ ์‹ค์Šต ๋”ฐ๋ผ ๊ณต๋ถ€ํ•œ ๋…ธํŠธ๋ถ

 

https://colab.research.google.com/drive/1gM9uiyTwW80ZhWGqXkK_PbVXQGGfacTR?usp=sharing 

 

EconML day1 .ipynb

Colaboratory notebook

colab.research.google.com

 

 

 

 

โ‘  ML based estimation of heterogeneous Treatment Effects 


 

•  ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ฐ€์žฅ ํฐ ๋ชฉ์ ์€ ๋„๋ฉ”์ธ ์‘์šฉ์—์„œ ์˜์‚ฌ๊ฒฐ์ •์„ ์ž๋™ํ™” ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. 

 

•  ๋Œ€๋ถ€๋ถ„์˜ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์˜ personalized ๋œ ์˜์‚ฌ๊ฒฐ์ • ์‹œ๋‚˜๋ฆฌ์˜ค์˜ ํ•ต์‹ฌ์ ์ธ ๋ฌธ์ œ๋Š” heterogeneous treatment effects ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. (Intervention ์˜ ํšจ๊ณผ๋Š” ๋ฌด์—‡์ธ๊ฐ€) 

 

•  ๊ฐ€๋ น ๊ฐ€๊ฒฉํ• ์ธ์ด ์ˆ˜์š”์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„, ์†Œ๋น„์ž ํŠน์„ฑ ํ•จ์ˆ˜๋กœ ์ถ”์ •ํ•˜๋Š”, personalized pricing ๋ฌธ์ œ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ์•ฝ๋ฌผ์น˜๋ฃŒ๊ฐ€ ํ™˜์ž์˜ ์ž„์ƒ ๋ฐ˜์‘์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ํ™˜์ž์˜ ํŠน์„ฑ ํ•จ์ˆ˜๋กœ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ธ ์˜๋ฃŒ ์ž„์ƒ์‹œํ—˜์—์„œ๋„ ํ•ด๋‹น ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

•  ์œ„์™€ ๊ฐ™์€ ์˜ˆ์‹œ๋“ค์˜ ๊ฒฝ์šฐ๋Š” observational data ๊ฐ€ ํ’๋ถ€ํ•˜๊ณ , unknown policy ์— ์˜ํ•ด treatment ๊ฐ€ ๊ฐ€ํ•ด์ง€๊ณ , A/B test ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์ œํ•œ๋˜์–ด ์žˆ๋Š” ์ƒํ™ฉ์— ๋†“์—ฌ์ ธ ์žˆ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. 

 

 

•  EconML ํŒจํ‚ค์ง€๋Š” ๊ณ„๋Ÿ‰๊ฒฝ์ œํ•™๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ๊ต์ฐจ์ ์—์„œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์„ ํ†ตํ•ด heterogeneous treatment effect ์ถ”์ •ํ•˜๋Š” ์ตœ์‹  ๊ธฐ๋ฒ•์„ ๊ตฌํ˜„ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ Random forests, boosting, lasso and neural nets ๋“ฑ์˜ ๊ธฐ๋ฒ•์„ ํ†ตํ•ด heterogeneity effect ๋ฅผ ๋ชจ๋ธ๋ง ํ•˜๋Š”๋ฐ์— ์ƒ๋‹นํ•œ flexibility ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋™์‹œ์— ์ธ๊ณผ์ถ”๋ก  ๋ฐ ๊ณ„๋Ÿ‰๊ฒฝ์ œํ•™์  ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์ธ๊ณผ์  ํ•ด์„์„ ๋ณด์กดํ•˜๊ณ , ์œ ํšจํ•œ ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๊ตฌ์„ฑํ•จ์œผ๋กœ์จ ํ†ต๊ณ„์  ํƒ€๋‹น์„ฑ์„ ์ œ๊ณตํ•œ๋‹ค. 

 

 

 

โ‘ก Motivating Examples 


 

•  EconML ์€ feature set X ๋ฅผ controlling ํ•˜๋ฉด์„œ, treatment variable T ๊ฐ€ outcome variable Y ์— ๋ฏธ์น˜๋Š” causal effect ๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆํ•œ ๋ชจ๋ธ์ด๋‹ค. 

 

 

โ‘ด  Recommendation A/B testing 

 

 Interpret experiments with imperfect compliance

 

•  Question. ์—ฌํ–‰ ์›น์‚ฌ์ดํŠธ์—์„œ ๋ฉค๋ฒ„์‹ญ ํ”„๋กœ๊ทธ๋žจ์— ๊ฐ€์ž…ํ•˜๋ฉด ์‚ฌ์šฉ์ž๊ฐ€ ์›น์‚ฌ์ดํŠธ์— ๋” ๋งŽ์€ ์‹œ๊ฐ„์„ ๋ณด๋‚ด๋Š”์ง€ ์•Œ๊ณ ์‹ถ๋‹ค. 

•  Problem. ๋ฉค๋ฒ„์‹ญ ํšŒ์›์ด ๋˜๊ธฐ๋กœ ์„ ํƒํ•œ ๊ณ ๊ฐ์€ engagement ๊ฐ€ ๋‹ค๋ฅธ ๊ณ ๊ฐ๋ณด๋‹ค ์ด๋ฏธ ๋†’์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘ ์‚ดํŽด๋ณด๊ณ  ๋ฉค๋ฒ„์‹ญ ํšŒ์›๊ณผ ๋น„ํ™”์›์„ ๋น„๊ตํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค. ๋˜ํ•œ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋ฉค๋ฒ„์‹ญ ๊ฐ€์ž…์„ ๊ฐ•์š”ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ง์ ‘์ ์ธ A/B ํ…Œ์ŠคํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜๋„ ์—†๋‹ค. 

•  Solution. ํšŒ์‚ฌ์ธก์—์„œ ์ƒˆ๋กญ๊ณ  ๋น ๋ฅธ ๊ฐ€์ž… ๊ณผ์ •์˜ ๊ฐ€์น˜๋ฅผ ํ…Œ์ŠคํŠธ ํ•˜๊ธฐ ์œ„ํ•ด ์ด์ „์— ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ ์ ์ด ์žˆ๋‹ค. EconML ์˜ DRIV estimator ๋Š” ๋ฉค๋ฒ„์‹ญ์— ๋Œ€ํ•œ ์ด ์‹คํ—˜์  nudge ๋ฅผ ๋ฉค๋ฒ„์‹ญ likelihood ์˜ ๋ฌด์ž‘์œ„ ๋ณ€๋™์„ ์ƒ์„ฑํ•˜๋Š” ๋„๊ตฌ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. DRIV ๋ชจ๋ธ์€ ๋น ๋ฅธ ๊ฐ€์ž… ๊ณผ์ •์„ ์ œ์•ˆ๋ฐ›์€ ๋ชจ๋“  ๊ณ ๊ฐ์ด ๋ฉค๋ฒ„์‹ญ ํšŒ์›์ด ๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋ผ๋Š” ์ ์„ ๊ณ ๋ คํ•ด ๋น ๋ฅธ ๊ฐ€์ž… ๊ณผ์ •์˜ ํšจ๊ณผ๊ฐ€ ์•„๋‹Œ ๋ฉค๋ฒ„์‹ญ์˜ ํšจ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค. (Returns the effect of membership rather than the effect of receiving the quick sign-up)

 

 

โ—ฏ ์ฝ”๋“œ ์‹ค์Šต 

 

โ—ฏ Case study : Trip advisor 

 

 

 

 

โ‘ต  Customer segmentation

 

Estimate individualized responses to incentives

 

•  Question.  ๋ฏธ๋””์–ด ๊ตฌ๋… ์„œ๋น„์Šค์—์„œ ๋งž์ถค์˜ ์š”๊ธˆ์ œ๋ฅผ ํ†ตํ•ด ํƒ€๊ฒŸ ํ• ์ธ์„ ์ œ๊ณตํ•˜๋ ค๊ณ  ํ•œ๋‹ค. 

•  Problem. ๊ณ ๊ฐ์˜ ๋งŽ์€ ํŠน์ง•์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์–ด๋–ค ๊ณ ๊ฐ์ด ๋” ๋‚ฎ์€ ๊ฐ€๊ฒฉ์— ๊ฐ€์žฅ ํฐ ๋ฐ˜์‘์„ ๋ณด์ผ์ง€ ํ™•์‹ ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค. 

•  Solution. EconML ์˜ DML estimator ๋ฅผ ํ†ตํ•ด ๊ธฐ์กด ๋ฐ์ดํ„ฐ์—์„œ ์กด์žฌํ•˜๋Š” ์œ ์ € ๊ฐ€๊ฒฉ ๋ณ€๋™์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ˆ˜ ์œ ์ €์˜ feature ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” heterogeneous ํ•œ ๊ฐ€๊ฒฉ ๋ฏผ๊ฐ๋„๋ฅผ ์ถ”์ •ํ•œ๋‹ค. Tree interpreter ๋Š” ํ• ์ธ์— ๋Œ€ํ•œ ๋ฐ˜์‘์˜ ๊ฐ€์žฅ ํฐ ์ฐจ์ด๋ฅผ ์„ค๋ช…ํ•˜๋Š” ์ฃผ์š” feature ์— ๋Œ€ํ•œ ์š”์•ฝ์„ ์ œ๊ณตํ•œ๋‹ค. 

 

โ—ฏ ์ฝ”๋“œ ์‹ค์Šต 

 

 

 

 

โ‘ถ  Multi-investment attribution

 

Distinguish the effects of multiple outreach efforts

 

•  Question.  ์Šคํƒ€ํŠธ์—…์ด ์‹ ๊ทœ ๊ณ ๊ฐ์„ ๋ชจ์ง‘ํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ํšจ๊ณผ์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์•Œ๊ณ ์ž ํ•œ๋‹ค : ๊ฐ€๊ฒฉํ• ์ธ, Adoption ์„ ์ด๋ผ๋Š” ๊ธฐ์ˆ ์  ์ง€์›, ํ˜น์€ ์ด ๋‘ ๊ฐ€์ง€๋ฅผ ๊ฒฐํ•ฉํ•œ ๋ฐฉ์‹ 

•  Problem. ๊ณ ๊ฐ์„ ์žƒ์„ ์ˆ˜ ์žˆ๋Š” ์œ„ํ—˜ ๋•Œ๋ฌธ์— ์ง€์›ํ™œ๋™์— ๋“œ๋Š” ๋…ธ๋ ฅ์— ๋งŽ์€ ๋น„์šฉ์ด ์†Œ์š”๋œ๋‹ค. ๋Œ€๊ธฐ์—…์€ ๊ธฐ์ˆ ์  ์ง€์›์„ ๋ฐ›์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋” ๋†’์€ ๊ฒƒ์ฒ˜๋Ÿผ, ๊ณ ๊ฐ์—๊ฒŒ ์ „๋žต์ ์œผ๋กœ ์ธ์„ผํ‹ฐ๋ธŒ๋ฅผ ์ œ๊ณตํ•ด์™”๋‹ค. 

•  Solution. EconML ์˜ Doubly Robust Learner model ์€ ์—ฌ๋Ÿฌ ์ด์‚ฐ์ ์ธ treatment ์˜ ํšจ๊ณผ๋ฅผ jointly ํ•˜๊ฒŒ ์ถ”์ •ํ•œ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ confounding correlation ์„ ํ•„ํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•ด ๊ด€์ธก๋œ ๊ณ ๊ฐ feature ์˜ flexible ํ•œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๊ฐ effort ๊ฐ€ ๋งค์ถœ์— ๋ฏธ์น˜๋Š” ์ธ๊ณผ์  ํšจ๊ณผ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. 

 

 

โ—ฏ ์ฝ”๋“œ ์‹ค์Šต

 

 

728x90

๋Œ“๊ธ€