๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐Ÿ“— NLP

[cs224n] 5๊ฐ• ๋‚ด์šฉ ์ •๋ฆฌ

by isdawell 2022. 3. 22.
728x90

๐Ÿ’ก ์ฃผ์ œ : Dependency Parsing 

 

๐Ÿ“Œ ํ•ต์‹ฌ 

  • Task : ๋ฌธ์žฅ์˜ ๋ฌธ๋ฒ•์ ์ธ ๊ตฌ์„ฑ, ๊ตฌ๋ฌธ์„ ๋ถ„์„
  • Dependency Parsing : ๋‹จ์–ด ๊ฐ„ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜์—ฌ ๋‹จ์–ด์˜ ์ˆ˜์‹ (๋ฌธ๋ฒ•) ๊ตฌ์กฐ๋ฅผ ๋„์ถœํ•ด๋‚ด๊ธฐ 

 

๐Ÿ“Œ ๋ชฉ์ฐจ  

1. Dependency Parsing ์ด๋ž€ 

 

(1) Parsing 

 

โœ” ์ •์˜ 

 

  • ๊ฐ ๋ฌธ์žฅ์˜ ๋ฌธ๋ฒ•์ ์ธ ๊ตฌ์„ฑ์ด๋‚˜ ๊ตฌ๋ฌธ์„ ๋ถ„์„ํ•˜๋Š” ๊ณผ์ •
  • ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ์ด๋ฃจ๋Š” ๋‹จ์–ด ํ˜น์€ ๊ตฌ์„ฑ ์š”์†Œ์˜ ๊ด€๊ณ„๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, parsing์˜ ๋ชฉ์ ์— ๋”ฐ๋ผ Consitituency parsing๊ณผ Dependency parsing์œผ๋กœ ๊ตฌ๋ถ„

 

โœ” ๋น„๊ต 

 

  • ํ† ํฌ๋‚˜์ด์ง• : ๋ฌธ์žฅ์ด ๋“ค์–ด์˜ค๋ฉด ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋‹จ์œ„๋กœ ์ชผ๊ฐœ์ฃผ๋Š” ๊ฒƒ 
  • pos-tagging : ํ† ํฐ๋“ค์— ํ’ˆ์‚ฌ tag ๋ฅผ ๋ถ™์—ฌ์ฃผ๋Š” ๊ณผ์ • 
  • Paring : ๋ฌธ์žฅ ๋ถ„์„ ๊ฒฐ๊ณผ๊ฐ€ Tree ํ˜•ํƒœ๋กœ ๋‚˜์˜ค๋Š” ๊ฒƒ 

 

 

(2) Constituency Parsing (18๊ฐ•)

 

โœ” ์ •์˜ 

 

  • ๋ฌธ์žฅ์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ตฌ(phrase) ๋ฅผ ํŒŒ์•…ํ•˜์—ฌ ๋ฌธ์žฅ ๊ตฌ์กฐ๋ฅผ ๋ถ„์„ → ๋ฌธ์žฅ์˜ ์™ธํ˜•์ ์ธ ๊ตฌ์กฐ๋ฅผ ํŒŒ์•… 
  • ์˜์–ด์™€ ๊ฐ™์€ ์–ด์ˆœ์ด ๊ณ ์ •์ ์ธ ์–ธ์–ด์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ 
  • ์žฌ๊ท€์ ์œผ๋กœ ์ ์šฉ์ด ๊ฐ€๋Šฅ 
  • (๋‹จ์–ด) - (๊ตฌ) - (๋ฌธ์žฅ) 

 

 

 

โœ” ๊ธฐ๋ณธ ๊ฐ€์ • 

 

  • ๋ฌธ์žฅ์ด ํŠน์ • ๋‹จ์–ด(ํ† ํฐ) ๋“ค๋กœ ๋ญ‰์ณ์ ธ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค๊ณ  ๋ณด๋Š” ๊ฒƒ 

 

 

(3) Dependency Parsing โญโญ 

 

โœ” ์ •์˜ 

 

  • ๋ฌธ์žฅ์˜ ์ „์ฒด์ ์ธ ๊ตฌ์„ฑ/๊ตฌ์กฐ ๋ณด๋‹ค๋Š” ๊ฐ ๊ฐœ๋ณ„๋‹จ์–ด ๊ฐ„์˜ '์˜์กด๊ด€๊ณ„' ๋˜๋Š” '์ˆ˜์‹๊ด€๊ณ„' ์™€ ๊ฐ™์€ ๋‹จ์–ด๊ฐ„ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ 
  • ํ•œ๊ตญ์–ด์™€ ๊ฐ™์€ ์ž์œ  ์–ด์ˆœ์„ ๊ฐ€์ง€๊ฑฐ๋‚˜ ๋ฌธ์žฅ ์„ฑ๋ถ„ ์ƒ๋žต์ด ๊ฐ€๋Šฅํ•œ ์–ธ์–ด์—์„œ ์„ ํ˜ธํ•˜๋Š” ๋ฐฉ์‹ 

 

โœ” ๊ฒฐ๊ณผ ํ˜•ํƒœ 

 

  • ๊ฐœ๋ณ„ ๋‹จ์–ด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜์—ฌ 'ํ™”์‚ดํ‘œ' ์™€ '๋ผ๋ฒจ' ๋กœ ๊ด€๊ณ„๋ฅผ ํ‘œ์‹œ 

 

 

  • ์ˆ˜์‹๋ฐ›๋Š” ๋‹จ์–ด๋ฅผ head ํ˜น์€ governor ๋ผ๊ณ  ๋ถ€๋ฅด๊ณ , ์ˆ˜์‹ํ•˜๋Š” ๋‹จ์–ด๋ฅผ dependent ํ˜น์€ modifier ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. 
  • nsubj, dobj, det ๊ณผ ๊ฐ™์€ ๋ ˆ์ด๋ธ”์„ ํ†ตํ•ด ๊ฐœ๋ณ„ ๋‹จ์–ด ์‚ฌ์ด์˜ ์ˆ˜์‹๊ด€๊ณ„๋ฅผ ํ‘œ์‹œํ•œ๋‹ค. 

 

๐Ÿ‘€ ํ•ต์‹ฌ : Constituency parsing ์€ ๋ฌธ์žฅ ๊ตฌ์กฐ๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•, Dependency parsing ์€ ๋‹จ์–ด ๊ฐ„ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ• 

 

 

2. Dependency Parsing ์ด ํ•„์š”ํ•œ ์ด์œ  

 

๐Ÿ‘€ ์ธ๊ฐ„์€ ์ž‘์€ ๋‹จ์–ด๋“ค์„ ํฐ ๋‹จ์–ด๋กœ ์กฐํ•ฉํ•˜๋ฉด์„œ ๋ณต์žกํ•œ ์•„์ด๋””์–ด๋ฅผ ํ‘œํ˜„ํ•˜๊ณ  ์ „๋‹ฌํ•œ๋‹ค. 

๐Ÿ‘€ ๋ฌธ์žฅ์˜ ์˜๋ฏธ๋ฅผ ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ฒŒ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, Parsing ์„ ํ†ตํ•ด ๋ชจํ˜ธ์„ฑ์„ ์—†์• ์ž!

 

 

(1) Phrase Attachment Ambiguity 

  • ํ˜•์šฉ์‚ฌ๊ตฌ, ๋™์‚ฌ๊ตฌ, ์ „์น˜์‚ฌ๊ตฌ ๋“ฑ์ด ์–ด๋–ค ๋‹จ์–ด๋ฅผ ์ˆ˜์‹ํ•˜๋Š”์ง€์— ๋”ฐ๋ผ ์˜๋ฏธ๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” ๋ชจํ˜ธ์„ฑ 

 

 

(์™ผ) ๊ณผํ•™์ž๋“ค์€ ์šฐ์ฃผ์—์„œ ๊ณ ๋ž˜๋ฅผ ๊ด€์ธกํ–ˆ๋‹ค. "count" → from space     (์˜ค) ๊ณผํ•™์ž๋“ค์€ ์šฐ์ฃผ์—์„œ ์˜จ ๊ณ ๋ž˜๋ฅผ ๊ด€์ธกํ–ˆ๋‹ค. "whale" → from space

 

 

(2) Coordination Scope Ambiguity 

  • ํŠน์ • ๋‹จ์–ด๊ฐ€ ์ˆ˜์‹ํ•˜๋Š” ๋Œ€์ƒ์˜ ๋ฒ”์œ„๊ฐ€ ๋‹ฌ๋ผ์ง์— ๋”ฐ๋ผ ์˜๋ฏธ๊ฐ€ ๋ณ€ํ•˜๋Š” ๋ชจํ˜ธ์„ฑ 
  • ์ค‘์˜์ ์œผ๋กœ ํ•ด์„๋  ์—ฌ์ง€๊ฐ€ ์žˆ์Œ 

 

 

๐Ÿ‘€ ์šฐ๋ฆฌ๊ฐ€ ์˜์–ด๋ฅผ ์ฝ์„ ๋•Œ / ๋กœ ๋Š์–ด์ฝ๊ณ  ์ˆ˜์‹๋ฐ›๋Š” ๋ถ€๋ถ„์„ ๊ด„ํ˜ธ์ณ์„œ ํ™”์‚ดํ‘œ๋กœ ์ด์–ด์ฃผ๋“ฏ, ์–ธ์–ด๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ฌธ์žฅ์˜ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•  ํ•„์š”๊ฐ€ ์žˆ์œผ๋ฉฐ ๋ฌด์—‡๊ณผ ๋ฌด์—‡์ด ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ํ•„์š”ํ•˜๋‹ค. 

 

 

 

3. Dependency Grammar

(1) Structure 

  • Seqence ์™€ Tree ์˜ ๋‘ ๊ฐ€์ง€ ํ˜•ํƒœ๋กœ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. 

 

(์™ผ) ์ˆ˜์‹ํ•˜๋Š” ๋‹จ์–ด๋ฅผ ํ™”์‚ดํ‘œ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹ (์˜ค) ํŠธ๋ฆฌํ˜•ํƒœ๋กœ parsing output ์„ ๋„์ถœํ•˜๋Š” ๋ฐฉ์‹

 

 

โœ” ๊ทœ์น™ 

  • ํ™”์‚ดํ‘œ๋Š” ์ˆ˜์‹์„ ๋ฐ›๋Š” ๋‹จ์–ด (head) ์—์„œ ์ˆ˜์‹์„ ํ•˜๋Š” ๋‹จ์–ด (dependent) ๋กœ ํ–ฅํ•œ๋‹ค.
    • ์˜ˆ์‹œ. The ball ๐Ÿ‘‰ the ๋Š” ์ˆ˜์‹ํ•˜๋Š” ๋‹จ์–ด, ball ์€ ์ˆ˜์‹ ๋ฐ›๋Š” ๋‹จ์–ด์ด๋ฏ€๋กœ ball ์—์„œ the ๋ฐฉํ–ฅ์œผ๋กœ ํ™”์‚ดํ‘œ 

 

 

  • ํ™”์‚ดํ‘œ ์œ„์˜ label ์€ ๋‹จ์–ด๊ฐ„ ๋ฌธ๋ฒ•์  ๊ด€๊ณ„ (dependency) ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ ํ™”์‚ดํ‘œ๋Š” ์ˆœํ™˜ํ•˜์ง€ ์•Š๋Š”๋‹ค → Tree ํ˜•ํƒœ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ์ด์œ  

 

  • ์–ด๋– ํ•œ ๋‹จ์–ด์˜ ์ˆ˜์‹๋„ ๋ฐ›์ง€ ์•Š๋Š” ๋‹จ์–ด๋Š” ๊ฐ€์ƒ์˜ ๋…ธ๋“œ์ธ ROOT ์˜ dependent ๋กœ ๋งŒ๋“ค์–ด ๋ชจ๋“  ๋‹จ์–ด๊ฐ€ ์ตœ์†Œ 1๊ฐœ ๋…ธ๋“œ์˜ dependent ๊ฐ€ ๋˜๋„๋ก ํ•œ๋‹ค. 

 

 

 

(2) ๋ณดํŽธ์ ์ธ ํŠน์ง•  

 

โœ” ํŠน์ง• 

  • Bilexical affinities : ๋‘ ๋‹จ์–ด ์‚ฌ์ด์˜ ์‹ค์ œ ์˜๋ฏธ๊ฐ€ ๋“œ๋Ÿฌ๋‚˜๋Š” ๊ด€๊ณ„ ( discussion → issue ๋Š” ๊ทธ๋Ÿด๋“ฏํ•œ ๊ด€๊ณ„์ž„ ) 
  • Dependency distance : dependency ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ ์ฃผ๋กœ ๊ฐ€๊นŒ์šด ์œ„์น˜์—์„œ dependent ๊ด€๊ณ„๊ฐ€ ํ˜•์„ฑ๋จ 
  • Intervening material : ๋งˆ์นจํ‘œ, ์„ธ๋ฏธํด๋ก ๊ฐ™์€ ๊ตฌ๋‘์ ์„ ๋„˜์–ด dependent ํ•œ ๊ด€๊ณ„๊ฐ€ ํ˜•์„ฑ๋˜์ง€๋Š” ์•Š์Œ 
  • valency of heads : head ์˜ ์ขŒ์šฐ์ธก์— ๋ช‡๊ฐœ์˜ dependents ๋ฅผ ๊ฐ€์งˆ ๊ฒƒ์ธ๊ฐ€ 

 

(3) Tree Bank 

  • Tree bank ์—ฐ๊ตฌ : ์งง์€ ๋ฌธ์žฅ์˜ ๊ทน์„ฑ์„ ์˜ˆ์ธกํ•˜๊ณ  ๋ฌธ์žฅ์˜ ์–ด์ˆœ์„ ๋ฌด์‹œํ•˜๋Š” BoW ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ทจํ•˜์—ฌ ์–ด๋ ค์šด ๋ถ€์ • ์˜ˆ์ œ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ํšจ์œจ์ ์ธ ๋ชจ๋ธ์„ ์ƒ์„ฑํ–ˆ๋‹ค. 
  • Tree bank dataset ์€ ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋ฌธ์žฅ๋“ค์˜ dependency ๋ฅผ ํŒŒ์•…ํ•˜์—ฌ dependency structure ๋ฅผ ๊ตฌ์„ฑํ•œ ๋ฐ์ดํ„ฐ์…‹
  • ์˜์–ด ์™ธ์— ๋‹ค์–‘ํ•œ ์–ธ์–ด๋“ค์— ๋Œ€ํ•ด์„œ ์ƒ์„ฑํ–ˆ๋‹ค. 
  • ๊ฐ์„ฑ๋ถ„์„ ์ž‘์—…์˜ ์ด์ง„ ๋ถ„๋ฅ˜ ์ •ํ™•๋„๊ฐ€ ์ƒ์Šนํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€๋‹ค. 

 

 

 

4. Dependency Paring ๋ฐฉ๋ฒ• 

 

(1) Graph Based

  • ๊ฐ€๋Šฅํ•œ ์˜์กด ๊ด€๊ณ„๋ฅผ ๋ชจ๋‘ ๊ณ ๋ คํ•œ ๋’ค ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ ๊ตฌ๋ฌธ๋ถ„์„ ํŠธ๋ฆฌ๋ฅผ ์„ ํƒ 
  • ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ์˜ ํŠธ๋ฆฌ ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์†๋„๋Š” ๋Š๋ฆฌ์ง€๋งŒ ์ •ํ™•๋„๋Š” ๋†’์Œ 

 

 

 

(2) Transition Based 

 

โœ” ์ •์˜ 

  • ๋‘ ๋‹จ์–ด์˜ ์˜์กด์—ฌ๋ถ€๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ๊ฒฐ์ • ํ•˜๋ฉฐ ์ ์ง„์ ์œผ๋กœ ๊ตฌ๋ฌธ ๋ถ„์„ ํŠธ๋ฆฌ๋ฅผ ๊ตฌ์„ฑ 

 

 

  • ๋ฌธ์žฅ์— ์กด์žฌํ•˜๋Š” sequence ๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ์ž…๋ ฅํ•˜๋ฉด์„œ ๋‹จ์–ด ์‚ฌ์ด์— ์กด์žฌํ•˜๋Š” dependency ๋ฅผ ๊ฒฐ์ •ํ•ด๋‚˜์•„๊ฐ 
  • Graph based ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ์†๋„๋Š” ๋น ๋ฅด์ง€๋งŒ sequence ๋ผ๋Š” ํ•œ ๋ฐฉํ–ฅ์œผ๋กœ๋งŒ ๋ถ„์„์ด ์ด๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์ง„ ๋ชปํ•˜์—ฌ ์ •ํ™•๋„๋Š” ๋‚ฎ์Œ 

 

(2)-1. Greedy transition-based parsing

 

โœ” 3๊ฐ€์ง€ ๊ตฌ์กฐ

  • BUFFER : ๋ฌธ์žฅ์— ํฌํ•จ๋œ ๋‹จ์–ด ํ† ํฐ๋“ค์ด ์ž…๋ ฅ๋˜์–ด ์žˆ๋Š” ๊ณณ์œผ๋กœ ๋‹จ์–ด์˜ ์ž…์ถœ๋ ฅ ๊ตฌ์กฐ๋Š” Fitst in First Out ์ด๋‹ค. 
  • STACK : buffer ์—์„œ out ๋œ ๋‹จ์–ด๋“ค์ด ๋“ค์–ด์˜ค๋Š” ๊ณณ์œผ๋กœ ์ดˆ๊ธฐ์—๋Š” ROOT ๋งŒ ์กด์žฌํ•˜๋ฉฐ ๋‹จ์–ด์˜ ์ž…์ถœ๋ ฅ ๊ตฌ์กฐ๋Š” Last in Last Out ์ด๋‹ค. ์ด๋•Œ ROOT ํ† ํฐ์€ BUFFER ๊ฐ€ ๋น„์›Œ์งˆ ๋•Œ๊นŒ์ง€ ๊ฒฐ์ • ๋Œ€์ƒ์ด ๋˜์ง€ ์•Š๋Š”๋‹ค. 
  • Set of Arcs : parsing ์˜ ๊ฒฐ๊ณผ๋ฌผ์ด ๋‹ด๊ธฐ๊ฒŒ ๋˜๋Š” ๊ณณ (์ดˆ๊ธฐ์—๋Š” ๊ณต์ง‘ํ•ฉ ์ƒํƒœ) 
  • ๋ฌธ์žฅ์ด ์ž…๋ ฅ๋˜๋ฉด ์ด 3๊ฐ€์ง€ ๊ตฌ์กฐ๋ฅผ ๊ฑฐ์ณ output ์ด ๋„์ถœ๋˜๋Š” parsing ์ด ์ด๋ฃจ์–ด์ง„๋‹ค. 

โœ” State(c) , c = (buffer, stack, set of arcs) 

  • ๋ชจ๋“  decision ์€ State c ๋ฅผ input ์œผ๋กœ ํ•˜๋Š” ํ•จ์ˆ˜ f(c) ๋ฅผ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง€๋ฉฐ, ์ด๋•Œ ํ•จ์ˆ˜ f ๋Š” dependency ๋ฅผ ๊ฒฐ์ • (ํ™”์‚ดํ‘œ ๋ฐฉํ–ฅ, ์˜์กด ๊ด€๊ณ„ label) ํ•˜๊ฒŒ ๋˜๋Š” ํ•จ์ˆ˜๋กœ SVM, NN ๋“ฑ์ด ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. 

 

โœ” Decision 

  • Shift : buffer ์—์„œ stack ์œผ๋กœ ์ด๋™ํ•˜๋Š” ๊ฒฝ์šฐ 
  • Right - Arc : Stack ์˜ ๋‘๋ฒˆ์งธ ๋‹จ์–ด์—์„œ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด๋กœ ๊ฐ€๋Š” ๊ฒƒ์œผ๋กœ ์šฐ์ธก์œผ๋กœ dependency ๊ฐ€ ๊ฒฐ์ •๋˜๋Š” ๊ฒฝ์šฐ 
  • Left - Arc : Stack ์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ์–ด์—์„œ ๋‘๋ฒˆ์งธ ๋‹จ์–ด๋กœ ๊ฐ€๋Š” ๊ฒƒ์œผ๋กœ ์ขŒ์ธก์œผ๋กœ dependency ๊ฐ€ ๊ฒฐ์ •๋˜๋Š” ๊ฒฝ์šฐ 

 

 

 

โœ” Embedding 

  • state ๋ฅผ ํ•จ์ˆ˜์˜ input ์œผ๋กœ ๋ฐ›๊ธฐ ์œ„ํ•œ state ์ž„๋ฒ ๋”ฉ ๊ณผ์ •์ด ํ•„์š”ํ•˜๊ฒŒ ๋œ๋‹ค. 
  • 2005 ๋…„์— ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ๋ฐฉ๋ฒ• : Indicator feature ์กฐ๊ฑด๋“ค์„ ๋งŒ์กฑํ•˜๋ฉด 1 , ์•„๋‹ˆ๋ฉด 0 ์œผ๋กœ ํ‘œํ˜„ 
  • s1 ๐Ÿ‘‰ stack ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด , b1 ๐Ÿ‘‰ buffer ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ์–ด , lr() ๐Ÿ‘‰ ๊ด„ํ˜ธ์•ˆ ๋‹จ์–ด์˜ left child ๋‹จ์–ด , rc() ๐Ÿ‘‰ ๊ด„ํ˜ธ์•ˆ ๋‹จ์–ด์˜ right child ๋‹จ์–ด , w ๐Ÿ‘‰ ๋‹จ์–ด, t ๐Ÿ‘‰ ํƒœ๊น…  ๊ฐ™์€ notation ์˜ ์˜๋ฏธ๋ฅผ ์ž˜ ์•Œ์•„๋‘์–ด์•ผํ•จ! 
  • ๋ณดํ†ต ํ•˜๋‚˜์˜ state ๋ฅผ 10^6 ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜๊ฒŒ ๋จ 

 

 

์˜ค๋ฅธ์ชฝ์˜ parsing state ๋ฅผ ์™ผ์ชฝ์˜ ์›ํ•ซ ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๋กœ ์ž„๋ฒ ๋”ฉ

 

โž• ๊ฐ•์˜์—์„œ๋Š” 'I ate fish' ๋กœ ๊ณผ์ •์„ ์„ค๋ช…ํ•จ 

 

 

 

(2)-2. Neural Dependency Parser โญ โญ 

 

โœ” Chen and Manning (2014) ๋…ผ๋ฌธ์—์„œ ๋ฐœํ‘œ๋œ ๋ชจ๋ธ 

โœ” dense feature ๋ฅผ ์‚ฌ์šฉํ•œ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ trainsition-based parser ๋ฅผ ์ œ์•ˆํ•˜์—ฌ ์†๋„์™€ ์„ฑ๋Šฅ์„ ๋ชจ๋‘ ํ–ฅ์ƒ

โœ” word vector , POS, arc decision ์„ ์ž…๋ ฅํ•˜์—ฌ ๊ฐ feature vector ๋ฅผ concatenate ํ•œ๋‹ค. 

 

 

 

 

 

โœ” Input layer

  • word, POS tag, arc labels ๊ฐ€ input ์œผ๋กœ ์ž…๋ ฅ๋œ๋‹ค. 

example

 

1. word feature 

 

  • STACK๊ณผ BUFFER์˜ TOP 3 words (6๊ฐœ) + STACK TOP 1, 2 words์˜ ์ฒซ๋ฒˆ์งธ, ๋‘๋ฒˆ์งธ left & right child word (8๊ฐœ) + STACK TOP 1,2 words์˜ left of left & right of right child word (8๊ฐœ)

2. POS tags feature 

  • word features ์˜ ๊ฐ POS tag ๋“ค (18๊ฐœ)

3. Arc labels 

  • word features ์˜ ๊ฐ arc - label ๋“ค (18๊ฐœ) 

 

 

 

 

โœ” Embedding

 

 

โœ” Hidden Layer 

  • ์ผ๋ฐ˜์ ์ธ Feed forward network 
  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ cube function ์„ ์‚ฌ์šฉ ๐Ÿ‘‰ word, POS tag, arc-label ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์„ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ์Œ 
  • cube function์„ ์ ์šฉํ•˜๊ฒŒ ๋˜๋ฉด input์œผ๋กœ ๋“ค์–ด๊ฐ€๋Š” 3๊ฐœ์˜ feature์ธ word, POS tag, arc-label์˜ ์กฐํ•ฉ์ด ๊ณ„์‚ฐ๋˜๋ฉด์„œ feature๊ฐ„์˜ ์ƒํ˜ธ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค. xi*xj*xk ๐Ÿ‘‰ ๋‹ค๋ฅธ ๋น„์„ ํ˜• ํ•จ์ˆ˜ ๋Œ€๋น„ ์„ฑ๋Šฅ์ด ์ข‹์€ ๊ฒƒ์œผ๋กœ ์†Œ๊ฐœ๋จ 

 

 

 

โœ” Output layer 

 

  • ์€๋‹‰์ธต์„ ๊ฑฐ์นœ feature vector ๋ฅผ linear projection ํ•œ ํ›„ ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ๋‹ค. 
  • softmax ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ€๋Šฅํ•œ label ์˜ ๋ชจ๋“  ๊ฒฝ์šฐ์˜ ์ˆ˜์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ๊ตฌํ•˜๊ฒŒ ๋˜๊ณ , Shift, Left-Arc, Right-Arc ์ค‘ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ๋กœ ๋ถ„๋ฅ˜๋  decision ์ด ์„ ํƒ๋˜๊ฒŒ ๋œ๋‹ค. 

 

 

โœ” ์„ฑ๋Šฅ๋น„๊ต 

 

1. Ablation studies ์—ฐ๊ตฌ ๊ฒฐ๊ณผ

 

 

  • cube function ์ด ๋‹ค๋ฅธ ํ™œ์„ฑํ™”ํ•จ์ˆ˜๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋ก 
  • Pre trained word vector (word2vec) ์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์ด random initialization ๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋ก 
  • Word, Pos, label 3๊ฐ€์ง€ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋ก 

 

 

2. POS and Label Embedding 

 

์œ ์‚ฌํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ POS, Arc label ๋“ค์ด ๊ตฐ์ง‘ํ™”๋œ ๊ฒƒ์„ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ์Œ

 

  • random initialization ๋œ Pos tag ์™€ Arc-label ์˜ ํ•™์Šต์ด ์ง„ํ–‰๋˜๋ฉฐ ์˜๋ฏธ์ ์ธ ์œ ์‚ฌ์„ฑ์ด ๋‚ดํฌ๋œ๋‹ค. 
  • t-SNE ์„ ํ†ตํ•ด 2์ฐจ์› ๊ณต๊ฐ„์ƒ์— ํ‘œํ˜„๋˜์—ˆ์„ ๋•Œ ์œ ์‚ฌํ•œ ์š”์†Œ๋“ค์ด ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•œ ๊ฒƒ์„ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. 

 

 

3. Tree bank ๋ฐ์ดํ„ฐ์…‹ ์‹คํ—˜ ๊ฒฐ๊ณผ 

 

  • UAS : Arc ๋ฐฉํ–ฅ๋งŒ ์˜ˆ์ธก
  • LAS : Arc ๋ฐฉํ–ฅ๊ณผ label ๊นŒ์ง€ ์˜ˆ์ธก (์˜์‚ฌ๊ฒฐ์ • ๊ฐœ์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚จ)
  • Parsing ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ด๋ณด๋ฉด ์ฒซ๋ฒˆ์งธ Parser (Trainsition-based with conventional features) ์˜ ๊ฒฝ์šฐ Graph-based ๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด์ง€๋งŒ ์„ฑ๋Šฅ์ด ์กฐ๊ธˆ ๋‚ฎ์Œ 
  • ๋งˆ์ง€๋ง‰์˜ ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•œ ๋ชจ๋ธ์ด Graph-based ์™€ ์„ฑ๋Šฅ์€ ์œ ์‚ฌํ•˜๋ฉด์„œ๋„ ์†๋„๋Š” ๋น ๋ฆ„!

 

 

์ถ”๊ฐ€ parsing model 

 

โž•  Dynamic programming

  • ๊ธด๋ฌธ์žฅ์ด ์žˆ์œผ๋ฉด ๊ทธ ๋ฌธ์žฅ๋“ค์„ ๋ช‡๊ฐœ๋กœ ๋‚˜๋ˆ„์–ด ํ•˜์œ„ ๋ฌธ์ž์—ด์— ๋Œ€ํ•œ ํ•˜์œ„ ํŠธ๋ฆฌ๋ฅผ ๋งŒ๋“ค๊ณ  ์ตœ์ข…์ ์œผ๋กœ ๊ทธ๊ฒƒ๋“ค์„ ํ•ฉ์น˜๋Š” parsing ๋ฐฉ์‹ 

 

โž•  Constraint Satisfaction 

  • ๋ฌธ๋ฒ•์  ์ œํ•œ ์กฐ๊ฑด์„ ์ดˆ๊ธฐ์— ์„ค์ •ํ•˜๊ณ  ๊ทธ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด ๋‚จ๊ธฐ๊ณ  ๋ชปํ•˜๋ฉด ์ œ๊ฑฐํ•˜์—ฌ ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œํ‚ค๋Š” ๋‹จ์–ด๋งŒ parsing ํ•˜๋Š” ๋ฐฉ์‹ 

 

 

๐Ÿ‘€ ๋” ์ฐพ์•„๋ณด๊ธฐ : projectivity ํˆฌ์‚ฌ์„ฑ ) dependency arc ๊ฐ€ ์„œ๋กœ ๊ฒน์น˜์ง€ ์•Š๋Š” ์„ฑ์งˆ. non-projectivity ์˜ ๊ฒฝ์šฐ์—”?

 

 

๐Ÿ“Œ ์‹ค์Šต ์ž๋ฃŒ 

 

๐Ÿ‘€ spacy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (ํ•œ๊ตญ์–ด ์ง€์› x) 

 

์ž์—ฐ์–ด์ฒ˜๋ฆฌ(NLP) 29์ผ์ฐจ (spaCy ์†Œ๊ฐœ)

2019.08.06

omicro03.medium.com

 

 

spacy๋ฅผ ์ด์šฉํ•ด์„œ ์ž์—ฐ์–ด์ฒ˜๋ฆฌํ•˜์ž.

intro

frhyme.github.io

 

 

Hitchhiker's Guide to NLP in spaCy

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

 

 

๐ŸŒฑ (์ด๋ฏธ์ง€) ์†Œ์†๋œ ๋™์•„๋ฆฌ week5 ์„ธ์…˜ ์ฐธ๊ณ  ์ž๋ฃŒ :  https://github.com/Ewha-Euron/2022-1-Euron-NLP

 

728x90

๋Œ“๊ธ€