๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐Ÿ“— NLP

[cs224n] 10๊ฐ• ๋‚ด์šฉ ์ •๋ฆฌ

by isdawell 2022. 5. 13.
728x90

๐Ÿ’ก ์ฃผ์ œ : Question Answering 


๐Ÿ“Œ ํ•ต์‹ฌ 

 

  • Task : QA ์งˆ๋ฌธ ์‘๋‹ต, reading comprehension, open-domain QA
  • SQuAD dataset
  • BiDAF , BERT

 

 

 

1๏ธโƒฃ Introduction 


 

1. Motivation : QA

 

โœ” QA ์™€ IR system ์˜ ์ฐจ์ด 

 

โ—ฝ IR = information retrieval ์ •๋ณด๊ฒ€์ƒ‰ 

 

๐Ÿ’จ QA : Query (specifit) → Answer : ๋ฌธ์„œ์—์„œ ์ •๋‹ต ์ฐพ๊ธฐ

   ex. ์šฐ๋ฆฌ๋‚˜๋ผ ์ˆ˜๋„๋Š” ์–ด๋””์•ผ? - ์„œ์šธ

 

๐Ÿ’จ IR : Query (general) → Document list : ์ •๋‹ต์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ๋ฌธ์„œ ์ฐพ๊ธฐ 

   ex. ๊น€์น˜๋ณถ์Œ๋ฐฅ์€ ์–ด๋–ป๊ฒŒ ๋งŒ๋“ค์–ด? - ์œ ํŠœ๋ธŒ ์˜์ƒ ๋ฆฌ์ŠคํŠธ, ๋ธ”๋กœ๊ทธ ๋ฆฌ์ŠคํŠธ 

 

๐Ÿ‘‰ ์ตœ๊ทผ์—๋Š” ์Šค๋งˆํŠธํฐ, ์ธ๊ณต์ง€๋Šฅ ์Šคํ”ผ์ปค ๊ธฐ๋ฐ˜์˜ ์ •๋ณด์ทจ๋“์ด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์ฆ‰๊ฐ์ ์ธ ์งˆ๋ฌธ์— ๋‹ตํ•˜๋Š” QA ๋ชจ๋ธ์˜ ์ค‘์š”์„ฑ์ด ๋†’์•„์ง€๊ณ  ์žˆ๋‹ค. 

 

 

์‚ฌ๋žŒ์˜ ์–ธ์–ด๋กœ ๋œ "์งˆ๋ฌธ"์— ์ž๋™์ ์œผ๋กœ "๋‹ต"ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค๊ธฐ 

 

 

 

 

 

 

โœ” QA 2 step 

 

1. Finding documents that contain an answer
๐Ÿ’จ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ •๋‹ต์ด ์žˆ์„ ๊ฒƒ ๊ฐ™์€ ๋ฌธ์„œ๋“ค ์ฐพ๊ธฐ : Traditional IR/web search ๊ณผ ๊ด€๋ จ์ด ์žˆ์Œ


2. Finding an answer in the documents 
๐Ÿ’จ ์ฐพ์€ ๋ฌธ์„œ๋“ค ๋‚ด์—์„œ ์ •๋‹ต ์ฐพ๊ธฐ : Machine reading comprehension ๊ณผ ๊ด€๋ จ์ด ์žˆ์Œ 

 

 

 

๐Ÿ’จ ๋Œ€๋ถ€๋ถ„์˜ state-of-the-art-question answering ์‹œ์Šคํ…œ๋“ค์€ end-to-end train ๊ณผ ์‚ฌ์ „ํ›ˆ๋ จ๋œ language ๋ชจ๋ธ ์œ„์— build ๋œ๋‹ค. 

 

 

 

โœ” Beyond textual QA problems 

 

 

๐Ÿ’จ ์˜ค๋Š˜๋‚ ์—๋Š” ๊ตฌ์กฐํ™”๋˜์ง€ ์•Š์€ text ์— ๊ธฐ๋ฐ˜ํ•œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๋Š” ์œ ํ˜•์ด ์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋‹ค. 

 

โ—ฝ Knowledge based QA : ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ๋Œ€ํ•ด์„œ QA ๋ฅผ ๊ตฌ์ถ• 

โ—ฝ Visual QA : ์ด๋ฏธ์ง€์— ๊ธฐ๋ฐ˜ํ•œ QA

 

 

 


 

 

2. Machine Reading comprehension 

 

โœ” ์ •์˜

 

(P,Q) ๐Ÿ‘‰ A : Text ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ๋‹จ์„ ์ดํ•ดํ•˜๊ณ  ํ•ด๋‹น ๋‚ด์šฉ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์— ๋‹ตํ•˜์ž 

 

  • ๊ธฐ๊ณ„๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฃผ์–ด์ง„ ๋ฌธ์„œ๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” Task 
  • passage ์™€ question ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ •๋‹ต์„ ์ถœ๋ ฅํ•˜๋Š”์ง€ ํ™•์ธํ•˜์—ฌ ๊ธฐ๊ณ„์˜ reading ability ๋ฅผ ์ธก์ •ํ•œ๋‹ค. 
  • ๋งŽ์€ ๋‹ค๋ฅธ NLP task ๋“ค๋„ reading comprehension ๋ฌธ์ œ๋กœ ๋‹จ์ˆœํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. 

 

์ถœ์ฒ˜ : https://www.youtube.com/watch?v=7u6Ys7I0z2E&list=PLm3NA2yoJ4bpUQFsxkWqTrLpemjjR62j6&index=33

 

 

๐Ÿ“‘ ๋‹ต๋ณ€ 1 → ๊ธฐ๊ณ„๊ฐ€ ์งˆ๋ฌธ/๋ฌธ๋‹จ์„ ์ œ๋Œ€๋กœ ์ดํ•ดํ–ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Œ 

๐Ÿ“‘ ๋‹ต๋ณ€ 2 → ๊ธฐ๊ณ„๊ฐ€ ์งˆ๋ฌธ/๋ฌธ๋‹จ์„ ์ œ๋Œ€๋กœ ์ดํ•ดํ–ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์—†์Œ

 

 

โ—ฝ 2015๋…„ ์ด์ „์—๋Š” MRC ์— ์ ํ•ฉํ•œ ๋ฐ์ดํ„ฐ์…‹ (passage, question, answer) ๊ณผ NLP ์‹œ์Šคํ…œ์ด ์กด์žฌํ•˜์ง€ ์•Š์•˜๋‹ค. 

โ—ฝ 2015๋…„ ์ดํ›„ MRC ๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ NLP ์‹œ์Šคํ…œ์ด ๊ณ„์†ํ•ด์„œ ๋“ฑ์žฅ, ๋ฐœ์ „ํ•˜๊ณ  ์žˆ๋‹ค. 

 

 

 

โœ” ์—ญ์‚ฌ 

 

  • 2013๋…„ Machine comprehension : MCT test corpus ๋ฅผ ๊ฐ€์ง€๊ณ  ์ง€๋ฌธ์— ์žˆ๋Š” ๋‹ต์„ ๊ทธ๋Œ€๋กœ ์ฐพ๋Š” ๋ฌธ์ œ 

 

 

โ—ฝ Passage (P) : Document, Context 

โ—ฝ Answer (A) : Extractive AQ, Sub-sequence 

โ—ฝ ๋‹ต๋ณ€์€ ํ•ญ์ƒ ๋ฌธ๋‹จ ์† ํ•˜์œ„ ๋ฌธ์žฅ ์ผ๋ถ€๋กœ ๊ตฌ์„ฑ๋จ 

โ—ฝ ๊ณผ๊ฑฐ QA ๋ชจ๋ธ์€ ์ฃผ๋กœ NER ๊ธฐ๋ฐ˜์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜์ž‘์—…์ด ๋งŽ๊ณ  ๋ณต์žกํ•œ ํ˜•ํƒœ๋ฅผ ๋„๊ณ  ์žˆ์—ˆ๋‹ค. 

 

 

 

โœ” ์งˆ๋ฌธ ์œ ํ˜•  

 

ํฌ๊ฒŒ 6๊ฐ€์ง€ ํƒ€์ž…์œผ๋กœ ๋ถ„๋ฅ˜ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

โœ” MT vs MRC 

 

Machine Translation Reading Comprehension
Source ๋ฌธ์žฅ, Target ๋ฌธ์žฅ Passage, question 
Autogressive decoder : word by word target ๋ฌธ์žฅ ์ƒ์„ฑ Two Classifier : ์ •๋‹ต์˜ start, end ์œ„์น˜๋งŒ ์˜ˆ์ธก 
Source ๋ฌธ์žฅ์˜ ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ํ˜„์žฌ์˜ target ๋‹จ์–ด์™€ ๊ฐ€์žฅ ๊ด€๋ จ ์žˆ์„๊นŒ? Passage ์˜ ์–ด๋–ค ๋‹จ์–ด๋“ค์ด question ์˜ ๋‹จ์–ด์™€ ๊ฐ€์žฅ ๊ด€๋ จ์ด ์žˆ์„๊นŒ

 

โญ ๋‘ NLP task ๋ชจ๋‘ Attention ์ด ํ•ต์‹ฌ!

 

 


 

 

 

3. SQuAD dataset 

 

โœ” ์ •์˜

 

โ—ฝ Stanford Question Answering Dataset ๐Ÿ‘‰ large scale supervised dataset 

โ—ฝ 10๋งŒ๊ฐœ ์ด์ƒ์˜ ์งˆ๋ฌธ-๋‹ต๋ณ€ ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌ , ๋‹ต๋ณ€์€ ๋ฐ˜๋“œ์‹œ passage ๋‚ด์—์„œ span ์œผ๋กœ ๋‚˜์˜ค๋„๋ก (passage ์— ๋“ฑ์žฅํ•œ ๋ฌธ๊ตฌ์—ฌ์•ผ ํ•œ๋‹ค๋Š” ๋œป) ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. 

โ—ฝ SQuAD dataset ์ด ๊ต‰์žฅํžˆ ์ •๊ตํ•˜๊ฒŒ ๊ตฌ์ถ•๋˜์–ด QA task ์— ํฌ๊ฒŒ ๊ธฐ์—ฌํ•˜์˜€๊ณ , ์ง€๊ธˆ๊นŒ์ง€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. 

 

 

 

 

  • Passage : ์œ„ํ‚คํ”ผ๋””์•„์˜ 100๊ฐœ ~ 150๊ฐœ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ๋‹จ๋“ค (= paragraph, context ๋ผ๊ณ  ์ง€์นญํ•˜๊ธฐ๋„ ํ•จ) ๐Ÿ‘‰ passage ๋‚ด์— ์ •๋‹ต์ด ์žˆ์Œ 
  • Question : ํด๋ผ์šฐ๋“œ ์†Œ์Šค ๋ฐฉ๋ฒ•์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ์งˆ๋ฌธ - ์‚ฌ๋žŒ๋“ค์ด ๋ฌธ๋‹จ์„ ์ฝ๊ณ  ์งˆ๋ฌธ๊ณผ ๊ทธ์— ์•Œ๋งž๋Š” ๋‹ต์„ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹ (= query ๋ผ๊ณ  ์ง€์นญํ•˜๊ธฐ๋„ ํ•จ)
  • Answer : 3๊ฐœ์˜ ๊ฐ€๋Šฅํ•œ ์ •๋‹ต ๋‹ต์•ˆ์„ ๋„ฃ์–ด์คŒ 

 

 

 

 

  • query ๊ฐ€ context ๋ณด๋‹ค ์งง์Œ (N๊ฐœ ํ† ํฐ > M๊ฐœ ํ† ํฐ) 
  • SQuAD ๋ฅผ ํ‘ธ๋Š” ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ : 2018๋…„ ์ด์ „๊นŒ์ง€๋Š” LSTM ๊ธฐ๋ฐ˜์˜ attetion ๋ชจ๋ธ์ด ์ฃผ๋ฅผ ์ด๋ฃจ์—ˆ๊ณ , 2019๋…„๋„ ์ดํ›„ ๋ถ€ํ„ฐ๋Š” BERT ์™€ ๊ฐ™์€ Pre-trained ๋œ ๋ชจ๋ธ์—์„œ Fine tuning ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ฐ”๋€Œ์—ˆ๋‹ค. 

 

 

 

โœ” version 1.1

 

์ธ๊ฐ„์˜ ์„ฑ๋Šฅ์„ ๋›ฐ์–ด๋„˜์Œ

 

โ—ฝ 3 gold answer : ๋‹ต๋ณ€ ์œ ํ˜•์˜ ๋ณ€ํ˜•์—๋„ ์ž˜ ๋Œ€์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด ์„ธ ์‚ฌ๋žŒ์—๊ฒŒ ๋‹ต๋ณ€์„ ์–ป์Œ

 

โ—ฝ ํ‰๊ฐ€์ง€ํ‘œ 

 

  • Exact Match : (์ธ๊ฐ„์ด ๋งŒ๋“ ) 3๊ฐœ ๋‹ต ์ค‘์— ํ•˜๋‚˜๋กœ ๋‚˜์™”์œผ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0์œผ๋กœ binary accuracy ๐Ÿ‘‰ ์ธ๊ฐ„์ด ์„ค์ •ํ•œ ๋‹ต๋„ ์œ ๋™์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ๊ด€์ ์ธ ์ง€ํ‘œ์ธ F1 score ๋„ ๊ณ„์‚ฐ 
  • F1 : ๋‹จ์–ด ๋‹จ์œ„๋กœ ๊ตฌํ•œ F1-score 3๊ฐœ ์ค‘์— max one ์„ per-question F1-score ๋กœ ๋‘๊ณ  macro average 
  • punctuation ๊ณผ a,an,the ๋Š” ๋ฌด์‹œํ•œ๋‹ค. 

 

 

โœ” ํ‰๊ฐ€์ง€ํ‘œ ๊ณ„์‚ฐ ์˜ˆ์‹œ 

 

 

(1) ์˜ˆ์ธก๋œ ๋‹ต๋ณ€์„ ๊ฐ gold answer ๊ณผ ๋น„๊ตํ•œ๋‹ค. ์ด๋•Œ a, an, the, . (๊ตฌ๋‘์ ) ์€ ์ œ๊ฑฐํ•œ๋‹ค.

(2) Max score ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. 

(3) ๋ชจ๋“  ์˜ˆ์ œ์— ๋Œ€ํ•ด EM ๊ณผ F1์„ ํ‰๊ท ํ•œ ๊ฐ’์„ ๊ตฌํ•œ๋‹ค. 

 

 

 

 

โœ” version 2.0 ๐Ÿ‘‰ unanswerable question ์ด ์ถ”๊ฐ€๋จ

 

์ธ๊ฐ„์˜ ์„ฑ๋Šฅ์„ ๋›ฐ์–ด๋„˜์Œ

 

 

โ—ฝ 1.1 ์—์„œ๋Š” ๋ชจ๋“  ์งˆ๋ฌธ์— ๋‹ต๋ณ€์ด ํ•ญ์ƒ ์กด์žฌ (answerable) ํ•˜๋‹ค๋ณด๋‹ˆ, ๋ฌธ๋‹จ ๋‚ด์—์„œ ๋ฌธ๋งฅ์„ ์ดํ•ดํ•˜์ง€ ์•Š๊ณ  ๋‹จ์ˆœํžˆ ranking task ๋กœ ์ž‘๋™ํ•˜๋Š” ๋ฌธ์ œ์ ์ด ์กด์žฌํ•จ (๋‹ต์— ๊ทผ์ ‘ํ•ด ๋ณด์ด๋Š” span ์„ ์ฐพ์„ ๋ฟ)

โ—ฝ v1.1 ์— ์ƒˆ๋กœ์šด 5๋งŒ๊ฐœ ์ด์ƒ์˜ ์‘๋‹ต ๋ถˆ๊ฐ€๋Šฅํ•œ (unanswerable) ์งˆ๋ฌธ์„ ๋ณ‘ํ•ฉํ•˜์˜€๋‹ค : v2.0 ์—์„œ๋Š” dev/test ๋ฐ์ดํ„ฐ ์ ˆ๋ฐ˜์€ passage ์— answer ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๊ณ  ์ ˆ๋ฐ˜์€ ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค. 

โ—ฝ ์˜จ๋ผ์ธ์—์„œ ์‚ฌ๋žŒ๋“ค์ด unanswerable question ์„ ์ง์ ‘ ์ƒ์„ฑํ•˜์—ฌ ์„ฑ๋Šฅ์ด ๋†’๋‹ค. 

โ—ฝ ํ‰๊ฐ€ ์‹œ no answer ๋ฅผ no answer ๋ผ๊ณ  ํ•ด์•ผ ๋งž๊ฒŒ ์˜ˆ์ธกํ•œ ๊ฒƒ์ž„ ๐Ÿ‘‰ threshold ๋ฅผ ๋‘๊ณ  ๊ทธ ์ดํ•˜์ผ ๋•Œ๋Š” ์˜ˆ์ธกํ•œ answer ๋ฅผ ๋ฑ‰์ง€ ์•Š๋Š”๋‹ค. 

 

 

 

โœ” ํ•œ๊ณ„์ 

 

๐Ÿ‘€ SQuAD ๋ฌธ์ œ๋ฅผ ์ž˜ ํ‘ผ๋‹ค๊ณ  ํ•ด์„œ ๋…ํ•ด๋ฅผ ์ž˜ํ•œ๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์—†๋‹ค. 

 

โ—ฝ Span-based answers ๋งŒ ์กด์žฌ :  yes/no, counting, implicit ์งˆ๋ฌธ๋“ค์— ๋Œ€ํ•œ ๋‹ต๋ณ€์˜ ๊ตฌ์ฒด์ ์ธ ์ด์œ ๋ฅผ ์ฐพ๊ธฐ ์–ด๋ ต๋‹ค. 

 

โ—ฝ Passage ๋‚ด์—์„œ๋งŒ ์ •๋‹ต์„ ์ฐพ๋„๋ก ํ•˜๋Š” ์งˆ๋ฌธ๊ตฌ์„ฑ 

  - ์—ฌ๋Ÿฌ ๋ฌธ์„œ๋“ค์„ ๋น„๊ตํ•ด ์ง„์งœ ์ •๋‹ต์„ ์ฐพ์•„๋‚ผ ํ•„์š”๊ฐ€ ์—†์Œ 

  - ์‹ค์ œ ๋งˆ์ฃผํ•˜๊ฒŒ๋  ์งˆ๋ฌธ-๋‹ต๋ณ€ (๋ฐ์ดํ„ฐ) ๋ณด๋‹ค, ์‰ฝ๊ฒŒ ๋‹ต๋ณ€์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋Š” ํ•œ๊ณ„ 

 

โ—ฝ ๋™์ผ ์ง€์‹œ์–ด (coreference) ๋ฌธ์ œ๋ฅผ ์ œ์™ธํ•˜๊ณ ๋Š” Multi-fact ๋ฌธ์ œ, ๋ฌธ์žฅ ์ถ”๋ก  ๋ฌธ์ œ๊ฐ€ ๊ฑฐ์˜ ์—†๋‹ค. 

 

๐Ÿคธ‍โ™€๏ธ ๊ทธ๋Ÿผ์—๋„ ์ง€๊ธˆ๊นŒ์ง€ QA ๋ชจ๋ธ์— ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋œ well structed, clean ํ•œ ๋ฐ์ดํ„ฐ์…‹์ด๋‹ค. 

 

 

 

 

 

โœ” KorQuAD 2.0 

 

โ—ฝ ํ•œ๊ตญ์–ด ์œ„ํ‚ค๋ฐฑ๊ณผ๋กœ ๋ฐ์ดํ„ฐ ๊ตฌ์ถ• 

 

 

 

 

 

2๏ธโƒฃ QA models


0. Bi-LSTM

 

๐Ÿ‘€ ๋Œ€์‘ํ•˜๋Š” ๋‹จ์–ด์˜ ์ฃผ๋ณ€์ •๋ณด๋ฅผ ๊ท ํ˜•์žˆ๊ฒŒ ๋‹ด์•„๋‚ด๊ธฐ ์œ„ํ•ด ์–‘๋ฐฉํ–ฅ์œผ๋กœ ์‚ดํŽด๋ณธ๋‹ค. ์–‘๋ฐฉํ–ฅ LSTM ์€ ์ง€๊ธˆ๊นŒ์ง€์˜ LSTM ๊ณ„์ธต์— ์—ญ๋ฐฉํ–ฅ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” LSTM ๊ณ„์ธต๋„ ์ถ”๊ฐ€ํ•œ๋‹ค. 

 

https://ranghee.github.io/deep-leaning/Bi-LSTM-post/

 

๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹2

 

โ—ฝ ์ฃผํ™ฉ์ƒ‰ ๋ธ”๋ก : forward ์ •๋ฐฉํ–ฅ LSTM, ์ดˆ๋ก์ƒ‰ ๋ธ”๋ก : backward ์—ญ๋ฐฉํ–ฅ LSTM 

 

โ—ฝ model input ํ˜•ํƒœ : Tokenizing ๊ณผ Embedding ๊ณผ์ •์ด ์™„๋ฃŒ๋œ sequence ๐Ÿ‘‰ forward, backward ๋ฐฉํ–ฅ์˜ LSTM ์— ๊ฐ๊ฐ ์ž…๋ ฅ๋œ๋‹ค. 

 

โ—ฝ forward LSTM Cell ๊ณผ backward LSTM Cell ์˜ ์€๋‹‰๊ฐ’์„ Concat (๊ฐ timestep๋ณ„๋กœ) ํ•˜์—ฌ, ์ดํ›„ ๋ชจ๋“  concat ๊ฒฐ๊ณผ๋ฌผ์„ ๊ฐ€์ง€๊ณ  ์ตœ์ข… ์ถœ๋ ฅ์„ ๋ฝ‘๊ฒŒ ๋œ๋‹ค. 

 

 

backward ๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ• - by indexing

 

 


 

 

 

1. Stanford attentive reader +

 

โœ” ๊ฐœ์š” 

 

โ—ฝ simplest neural question answering system 

โ—ฝ Bi-LSTM ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•ด ๊ฐ ๋ฐฉํ–ฅ์˜ ์ตœ์ข… hidden state ๋“ค์„ concat ํ•˜์—ฌ question vector ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. 

 

 

โœ” ๋ชจ๋ธ ๊ตฌ์กฐ : 3 parts 

 

1๏ธโƒฃ Question vector ์ƒ์„ฑ

  (1) Glove ์‚ฌ์ „ ์ž„๋ฒ ๋”ฉ ์‚ฌ์šฉ
  (2) One layer ์˜ Bi-LSTM ์‚ฌ์šฉ 
  (3) ์–‘๋ฐฉํ–ฅ์˜ ๋งˆ์ง€๋ง‰ hidden state ๋“ค์„ concatenate
  (4) question vector ์ƒ์„ฑ 

 

 

 

  • ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์— ์žˆ๋Š” ๋‹จ์–ด๋“ค์„ ์‚ฌ์ „์— ํ•™์Šต๋œ 300 ์ฐจ์›์˜ GloVe word embedding ์—์„œ lookup ์„ ์ง„ํ–‰ํ•œ ํ›„, 1-layer ์–‘๋ฐฉํ–ฅ LSTM ๋ชจ๋ธ์— ์ง‘์–ด๋„ฃ๋Š”๋‹ค. ๊ฐ ๋ฐฉํ–ฅ์˜ ๋งˆ์ง€๋ง‰ hidden state ๋“ค์„ concatenate ํ•˜์—ฌ ๊ณ ์ •๋œ size ์˜ question vector ๋ฅผ ์–ป๋Š”๋‹ค. 

 

 

2๏ธโƒฃ Passage vector ์ƒ์„ฑ

   (1) Glove ์‚ฌ์ „ embedding ์‚ฌ์šฉ
   (2) One layer Bi-LSTM ์‚ฌ์šฉ 
   (3) ์–‘๋ฐฉํ–ฅ์˜ hidden state position ๋ณ„๋กœ concatenatepassage ๋‹จ์–ด ๊ฐœ์ˆ˜๋งŒํผ!
   (4) Passage vector ์ƒ์„ฑ 

 

 

 

  • passage ์˜ ๊ฐ ๋‹จ์–ด ๋ฒกํ„ฐ๋“ค๋„ ๋˜‘๊ฐ™์ด Bi-LSTM ์„ ์‚ฌ์šฉํ•˜์—ฌ, ๊ฐ ๋‹จ์–ด ์‹œ์ ์˜ ๋‘ ๋ฐฉํ–ฅ hidden state ๋ฅผ concat ํ•˜์—ฌ passage word vector ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. 

 

 

 

 

 

3๏ธโƒฃ Attention ์ ์šฉ 

โญ passage ์—์„œ ์–ด๋””๊ฐ€ answer ์‹œ์ž‘์ด๊ณ  ๋์ธ์ง€ ์˜ˆ์ธก โญ

  (1) αi : i ๊ฐœ์˜ p (passage) ๋ฒกํ„ฐ์™€ ํ•œ ๊ฐœ์˜ q (question) ๋ฒกํ„ฐ๋ฅผ ์ด์šฉํ•ด attention ์„ ์ ์šฉํ•œ ํ›„ softmax 
  (2) Os : αi์™€ pi ๋ฒกํ„ฐ๋ฅผ ๊ณฑํ•˜์—ฌ ๋ชจ๋‘ ๋”ํ•จ 
  (3) αs: Os ์— linear transform ์„ ์ทจํ•จ 

 

๐Ÿ’จ Attention : ํ•ด๋‹น ์‹œ์ ์—์„œ ์˜ˆ์ธกํ•ด์•ผํ•  ๋‹จ์–ด์™€ ์—ฐ๊ด€์ด ์žˆ๋Š” ์ž…๋ ฅ ๋‹จ์–ด ๋ถ€๋ถ„์„ ์ข€ ๋” ์ง‘์ค‘(attention)ํ•ด์„œ ๋ณด๊ธฐ

 

 

  • 1 ๊ฐœ์˜ question vector ์™€ i ๊ฐœ์˜ passage vector ๋“ค์— ๋Œ€ํ•ด attention ์„ ์ ์šฉํ•˜์—ฌ passage ์—์„œ ์–ด๋””๊ฐ€ answer ์‹œ์ž‘์ด๊ณ  ๋์ธ์ง€๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹
  • softmax ๋ฅผ ์ทจํ•œ ๊ฐ’์ธ αi ์™€ passage vector Pi ๋ฅผ ๊ณฑํ•œ ๋’ค,  ์ „๋ถ€ ๋”ํ•˜๋ฉด output vector Os ๊ฐ€ ๋‚˜์˜ค๊ณ , ์ด๋ฅผ linear transform ์„ ์‹œ์ผœ์ค€ ๊ฐ’์ด start token ์— ๋Œ€ํ•ด ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด์™€ ๋˜‘๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ end token ์— ๋Œ€ํ•ด์„œ๋„ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.
  • start token ๊ณผ end token ์˜ loss ๊ฐ’์„ ๋”ํ•œ ๊ฒƒ์ด ์ตœ์ข… ๋ชฉ์ ํ•จ์ˆ˜๊ฐ€ ๋˜๊ณ  ์ด๋ฅผ ์ค„์—ฌ ๋‚˜์•„๊ฐ€๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ๋‹ค. 

 

 

 

 

 


 

2. Stanford attentive reader ++ (DrQA)

 

โœ” ๋ชจ๋ธ ๊ตฌ์กฐ

 

 

 

 

๐Ÿ’จ 3 layer Bi-LSTM 

 

โญ ์ด์ „์—๋Š” ์ตœ์ข… hidden state ๋งŒ ๊ฐ€์ ธ์™”๋˜ ๊ฒƒ์„ ์ง€๊ธˆ์€ question์˜ ๋ชจ๋“  ๋‹จ์–ด ์‹œ์  hidden state ์˜ attention ์„ ๊ตฌํ•ด์„œ ๊ทธ ๊ฐ€์ค‘ํ•ฉ์„ question vector ๋กœ ์‚ฌ์šฉ 

 

 

 

 

 

โœ” Paragraph Encoding 

 

 

 

โ—พ  3-layer Bi-LSTM 

 

โ—พ  input embedding ์— ๋Œ€ํ•ด GloVe vector ๋งŒ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ๋‹จ์–ด๋ณ„๋กœ ๋ณ„๋„์˜ ์ถ”๊ฐ€ feature ๋ฅผ ๋ถ™์—ฌ์ฃผ์—ˆ๋‹ค ๐Ÿ‘‰ ์–ธ์–ดํ•™์  ํŠน์„ฑ์„ ๋‹ด์€ ํ’ˆ์‚ฌ์ •๋ณด (POS) ์™€  NER ํƒœ๊ทธ๋“ค์„ ์›ํ•ซ์ธ์ฝ”๋”ฉํ•œ ๋ฒกํ„ฐ๋กœ ๋ถ™์—ฌ์คŒ  

 

โ—พ  Unigram probability ์ถ”๊ฐ€ ๐Ÿ‘‰ Term frequency ๋ฐ˜์˜ 

 

โ—พ  EM : ๋ฌธ๋‹จ ๋‚ด ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ ์งˆ๋ฌธ์— ๋“ฑ์žฅํ•˜๋Š”์ง€ ์•„๋‹Œ์ง€๋ฅผ ํŒ๋‹จํ•˜๋Š” feature 

    โ—ฝ 3 binary features : exact (์ •ํ™•ํžˆ ์ผ์น˜), uncased (์—†์Œ), lemma (์–ด๊ทผ์ถ”์ถœ์„ ํ•˜๊ฒŒ๋˜๋ฉด ์ผ์น˜ํ•˜๋Š”์ง€ ์—ฌ๋ถ€) 

 

โ—พ  Aligned question embedding 

    โ—ฝ Question ์˜ ๋‹จ์–ด qi ์™€ paragraph ๋‹จ์–ด์˜ pi ์˜ ์œ ์‚ฌ๋„๋ฅผ ํฌ์ฐฉํ•˜๊ธฐ ์œ„ํ•จ 

    โ—ฝ soft alignmnet : ์ •ํ™•ํžˆ ๋‹จ์–ด์˜ ํ˜•ํƒœ๊ฐ€ ์ผ์น˜ํ•˜๊ฑฐ๋‚˜, lemma ๋ฅผ ๊ฑฐ์ณ๋„ ํ˜•ํƒœ๊ฐ€ ์ผ์น˜ํ•œ ๋‹จ์–ด๋Š” ์•„๋‹ˆ์ง€๋งŒ ์„œ๋กœ ์œ ์‚ฌํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋‹จ์–ด

    โ—ฝ soft alignmnet ๋ฅผ ์ถ”๊ฐ€ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค. 

 

 

๐Ÿ‘‰ ์ดํ›„ attention ๊ณผ์ •์€ SAR+ ๊ณผ ๊ณผ์ •์ด ๊ฐ™์Œ 

 

 

๐Ÿ‘€ BiDAF ๊ฐ™์€ ๋ณต์žกํ•œ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ ๋„ ์„ฑ๋Šฅ์ด ๋†’์•˜๋˜ ๋ชจ๋ธ 

 

 

 


 

 

3. BiDAF

 

โœ” ๊ฐœ์š” 

 

๐Ÿ‘€ Bidirectional Attention Flow → BERT ๊ฐ€ ๋“ฑ์žฅํ•˜๊ธฐ ์ „์— ๊ฐ€์žฅ ์œ ๋ช…ํ•œ reading comprehension ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์˜€์Œ 

 

 

 

 

 

โญ attention ์ด (pharagraph <-> question) ์–‘๋ฐฉํ–ฅ์œผ๋กœ ์ „๋‹ฌ๋˜๋Š” ์ 

 

  • Query (question) ๊ณผ Context (passage) ์‚ฌ์ด์— attention flow layer ๊ฐ€ bi-directional ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ 
  • ๊ฐ question word ์™€ passage word ์„œ๋กœ ๊ฐ„์˜ ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ 

 

 

 

โœ” model architecture 

 

 

โ‘  Embedding layer 

 

 

  • Glove ๋กœ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ ๊ฐ ๋‹จ์–ด์— ๋Œ€ํ•ด word vector ๋„์ถœ + Character level CNN ์„ ํ†ตํ•ด ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Character vector ๋„์ถœ ๐Ÿ’จ Concatenate 
  • Concat ํ•œ ๋ฒกํ„ฐ๋ฅผ Two-layer Highway Network (f) ๋ฅผ ๊ฑฐ์นœ ํ›„ Bi-directional LSTM ์— ๋„ฃ์–ด์ค€๋‹ค. 

 

 

  • Bi-LSTM ์œผ๋กœ ๋„์ถœ๋œ ์–‘๋ฐฉํ–ฅ hidden state ๋ฅผ concat ๐Ÿ’จ Contextual embedding  ์ƒ์„ฑ → Attention layer ์˜ ์ž…๋ ฅ์ด ๋จ 

 

 

 

โ‘ก Attention Flow layer 

 

 

๐Ÿ’จ input : query ์™€ context ์˜ contextual vector representation (ci, qj) 

๐Ÿ’จ output : context ๋‹จ์–ด๋“ค์˜ query-aware vector representation (gi) , ์ด์ „ layer ์˜ contextual embedding 

 

 

 

 

 

  • ๋ชจ๋“  (ci, qj) ์Œ์— ๋Œ€ํ•ด ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. 
  • ์–‘๋ฐฉํ–ฅ attention ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Shared similarity matrix S ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. 
  • S ๋Š” α ๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ๊ฑฐ์ณ ๋„์ถœ๋œ๋‹ค. 
  • α : h ์™€ u ๊ทธ๋ฆฌ๊ณ  h ์™€ u ๋ฅผ element wise ๋กœ ๊ณฑํ•œ ์ด 3๊ฐœ์˜ ๋ฒกํ„ฐ๋ฅผ concat ํ•œ ํ›„ linear transform ์„ ํ•ด์ค€๋‹ค. 

 

2-(1). Context2Query Attention 

 

 

โ—พ ๊ฐ context ๋‹จ์–ด์— ๋Œ€ํ•ด ๊ฐ€์žฅ ์œ ์‚ฌํ•œ query ๋‹จ์–ด ์ฐพ๊ธฐ (์งˆ๋ฌธ์˜ ์–ด๋–ค ๋‹จ์–ด๋“ค์ด ci ์™€ ์—ฐ๊ด€์ด ์žˆ๋Š”๊ฐ€) 

 

 

 

2-(2). Query2Context Attention 

 

 

โ—พ query ๋‹จ์–ด ์ค‘ ํ•˜๋‚˜์™€ ๊ฐ€์žฅ ๊ด€๋ จ์žˆ๋Š” context ๋‹จ์–ด๋“ค ์„ ํƒํ•˜๊ธฐ (๋ฌธ๋‹จ์—์„œ ์–ด๋–ค ๋‹จ์–ด๋“ค์ด ์งˆ๋ฌธ ๋‹จ์–ด์™€ ์—ฐ๊ด€์ด ์žˆ๋Š”๊ฐ€)

 

 

ํ–‰๋ณ„๋กœ Max ๊ฐ’๋งŒ์„ ๊ฐ€์ ธ์™€์„œ ๋ฒกํ„ฐ๋ฅผ ๋ณ„๋„๋กœ ๊ตฌ์„ฑํ•˜๊ณ  ์ด์— ๋Œ€ํ•ด softmax ๋ฅผ ์ทจํ•จ

 

 

2-(3). Combine contextual embedding , attention vectors 

 

concat ๋งŒ ํ•ด์คŒ (linear transform X)

 

 

 

 

โ‘ข start token, end token ์˜ˆ์ธก 

 

 

๐Ÿ’จ Modeling layer 

 

โ—พ gi ๋ฅผ ๋˜ ๋‹ค๋ฅธ bidirectional LSTM ์˜ 2๊ฐœ layer ๋กœ ์ „๋‹ฌ 

โ—พ Context ๋‹จ์–ด๋“ค ์‚ฌ์ด์˜ interation ์„ ๋ชจ๋ธ๋ง 

 

๐Ÿ’จ Output layer 

 

โ—พ start, end ์˜ ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” classifier 

โ—พ Training : start point ์˜ Negative log likelihood ์™€ end point ์˜ Negative log likelihood ์˜ ํ•ฉ 

 

 

 

โ‘ฃ Attention visualization 

 

 

 

 


 

 

4. ELMO, BERT ๐Ÿ‘‰ 2021 version ๊ฐ•์˜ 

 

 

 

 

 

 

 

728x90

๋Œ“๊ธ€