๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐Ÿ“— NLP

[cs224n] 3๊ฐ• ๋‚ด์šฉ ์ •๋ฆฌ

by isdawell 2022. 3. 14.
728x90

๐Ÿ’ก ์ฃผ์ œ : Word Window Classification, NN and Matrix Calculus 

 

๐Ÿ“Œ ํ•ต์‹ฌ

  • Task : ๋ถ„๋ฅ˜ - ๊ฐœ์ฒด๋ช… ๋ถ„๋ฅ˜ (Named Entity Recognition) 

 

๐Ÿ“Œ ๋ชฉ์ฐจ ์ •๋ฆฌ 

1. Classification Review / introduction 

  • NLP ์—์„œ์˜ ๋ถ„๋ฅ˜ ๋ฌธ์ œ 

   ๐Ÿ‘‰ input data : ๋‹จ์–ด, ๋ฌธ์žฅ, ๋ฌธ์„œ ๋“ฑ 

   ๐Ÿ‘‰ Class : ๊ฐ์ •๋ถ„๋ฅ˜, ๊ฐœ์ฒด๋ช… ๋ถ„๋ฅ˜ (Named entity) , ๊ฐ™์€ ์˜๋ฏธ/ํ’ˆ์‚ฌ์˜ ๋‹จ์–ด๋ผ๋ฆฌ ๋ถ„๋ฅ˜ ๋“ฑ 

   ๐Ÿ‘‰ ๊ฒฐ์ •๊ฒฝ๊ณ„ (decision boundary) ๋ฅผ ๊ฒฐ์ •ํ•  Weight ๋ฅผ ํ•™์Šต 

 

  • ์ง€๋„ํ•™์Šต 

   ๐Ÿ‘‰ Train set → Loss function → Validation / Test set

 

  • ์†์‹คํ•จ์ˆ˜ 

   ๐Ÿ‘‰ ์˜ˆ์ธกํ•œ ๋ฐ์ดํ„ฐ(y hat) ์˜ ํ™•๋ฅ ๋ถ„ํฌ์™€ ์‹ค์ œ ๋ฐ์ดํ„ฐ(y) ์˜ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ๋น„์Šทํ•ด์ง€๋„๋ก ํ•™์Šตํ•œ๋‹ค

        : MLE (๋ชจ๋“  ํ›ˆ๋ จ ์ƒ˜ํ”Œ์˜ ํ™•๋ฅ  ์ตœ๋Œ€ํ™”) 

 

   ๐Ÿ‘‰ Entropy ์™€ cross Entropy 

       โœ” softmax ํ•จ์ˆ˜๋Š” cost ํ•จ์ˆ˜๋ฅผ ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ๋กœ ์ฑ„ํƒํ•˜์—ฌ ๋ชจ๋ธ์„ ์ตœ์ ํ™” ํ•œ๋‹ค. 

       โœ” ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ๋Š” softmax ๊ฐ€ ์ ์šฉ๋œ ์˜ˆ์ธก๊ฐ’ S์™€ ์‹ค์ œ ๊ฐ’ L ์˜ ์ฐจ์ด, ์ฆ‰ ๋น„์šฉ์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•œ ์‹์ด๋‹ค. 

 

 

2. Neural Networks (classifiers) 

  • ๊ฐœ์š” 

  ๐Ÿ‘‰ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ : ๋น„์„ ํ˜•ํ•จ์ˆ˜ Sigmoid, Relu ๋“ฑ์œผ๋กœ ๊ฒฐ์ • ๊ฒฝ๊ณ„๋ฅผ ํ•™์Šตํ•˜๋ฉด ์„ ํ˜•์ด ์•„๋‹Œ ๊ฒฝ๊ณ„๋ฉด๊นŒ์ง€ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

 

  • ๊ฒฐ์ •๊ฒฝ๊ณ„ 

  ๐Ÿ‘‰ NLP์—์„œ๋Š” parameter W ์™€ word vector X ๋ฅผ ๊ฐ™์ด ํ•™์Šตํ•œ๋‹ค. 

       โœ” embedding vector (Word2vec, Glove, ELMO, BERT ๋“ฑ) 

  ๐Ÿ‘‰ Pre-trained word vector ๋ฅผ ์ด์šฉํ•œ๋‹ค → (4์žฅ) 

 

 

3. Neural Networks in NLP (Window classification)

Named Entity Recognition (NER) 

 

1๏ธโƒฃ ๊ฐœ์š”

 

๐Ÿ‘‰ ๋ฌธ์žฅ์—์„œ ๊ฐœ์ฒด๋ช…(๊ณ ์œ ๋ช…์‚ฌ)์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ, ๋ฌธ๋งฅ context ๋ฅผ ๋ณด๊ณ  ๊ฐœ์ฒด๋ช…์„ ๊ฒฐ์ •ํ•œ๋‹ค. 

๐Ÿ‘‰ ๊ฐœ์ฒด๋ช…์„ ๊ธฐ์ค€์œผ๋กœ ๋™์‚ฌ์™€ ๊ธฐํƒ€ ๋‹จ์–ด๋“ค์ด ๊ด€๊ณ„๋ฅผ ๋งบ๋Š” ๊ตฌ์กฐ๋กœ ์ด์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์—, ์ฆ‰ ๋ฌธ๋ฒ•์ ์œผ๋กœ ์ค‘์š”ํ•œ ํ’ˆ์‚ฌ๋กœ ํ•ด์„๋˜์–ด ๋ฌธ๋งฅ์„ ํŒŒ์•…ํ•˜๋Š”๋ฐ ํ•ต์‹ฌ ์—ญํ• ์„ ํ•˜๋ฏ€๋กœ NER ์€ NLP task ์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ๊ฐ„์ฃผ๋œ๋‹ค. 

๐Ÿ‘‰ NER ์— ์ž์ฃผ ์“ฐ์ด๋Š” ๋ชจ๋ธ : CRF, RNN 

 

ORG (์กฐ์ง), O (๊ณ ์œ ๋ช…์‚ฌ๊ฐ€ ์•„๋‹˜), PER (์‚ฌ๋žŒ)

 

 

2๏ธโƒฃ ํ•œ๊ณ„์  

 

 ๐Ÿง ๋ฌธ์žฅ๋‚ด์—์„œ ์‚ฌ์šฉ๋˜๋Š” entity ์˜ ์ •ํ™•ํ•œ ๊ฐœ์ฒด๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์–ด๋ ค์šด ํ•œ๊ณ„์  

      โœจ ๋ฌธ๋งฅ๊นŒ์ง€ ๊ณ ๋ คํ•˜๋Š” window classification ๋ฐฉ๋ฒ•๋ก  ๋“ฑ์žฅ! 

            ๐Ÿ‘€ Idea : ์ค‘์‹ฌ ๋‹จ์–ด์™€ ์ฃผ๋ณ€ ๋‹จ์–ด๋“ค์„ ํ•จ๊ป˜ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ• 

 

 

โœ” Method1. ๋‹จ์–ด๋ฒกํ„ฐ์˜ ํ‰๊ท ์œผ๋กœ ๊ฒŒ์‚ฐํ•œ๋‹ค. but ์œ„์น˜ ์ •๋ณด๋ฅผ ์žƒ์–ด๋ฒ„๋ฆฌ๋Š” ๋‹จ์ ์ด ์กด์žฌ 

โœ” Method2. ๋ฒกํ„ฐ๋ฅผ concatenate ํ•˜์—ฌ Xwindow ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ค๊ณ  ์ด๋ฅผ input ์œผ๋กœ ์ทจํ•˜์—ฌ, ๋‹ค์ธตํผ์…‰ํŠธ๋ก ๊ณผ softmax clssifier ๋ฅผ ํ›ˆ๋ จํ•˜์—ฌ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰

  • NER Location ๋ถ„๋ฅ˜ ์˜ˆ์ œ : X(paris) ๋ฅผ Location ์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ 
    • museums in Paris are amazing ๋ผ๋Š” X(window) ๋ฒกํ„ฐ๋งŒ True ๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋†’์€ ์ ์ˆ˜๋ฅผ, ๊ทธ ์™ธ์— window vector (Not all museums in Pares์™€ ๊ฐ™์€)์€ ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ return ํ•˜๋„๋ก score ๋ฅผ ์ •์˜ 

s = score('museums in Paris are amazing')

score (s) ๊ฐ’์— softmax ๋ฅผ ์ทจํ•ด์ฃผ๊ณ  error ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ W๋ฅผ ์—…๋ฐ์ดํŠธ ์‹œํ‚จ๋‹ค. 

 

 

 

Softmax (Logistic) 

  ๐Ÿ‘‰ ๋”ฅ๋Ÿฌ๋‹ ์‹ ๊ฒฝ๋ง์˜ ์ถœ๋ ฅ์ธต์—์„œ Output ์„ ํ™•๋ฅ ๋กœ ํ‘œํ˜„ํ•˜๋Š”๋ฐ ์“ฐ์ด๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ, ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ํ™œ์šฉ๋œ๋‹ค. (๊ฒฐ์ •๊ฒฝ๊ณ„ ์ƒ์„ฑ์— ๊ธฐ์ค€์ด ๋˜๋Š” ํ•จ์ˆ˜) 

  ๐Ÿ‘‰ Xi ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ ํด๋ž˜์Šค์— ์†ํ•  ํ™•๋ฅ  Pi๋ฅผ ์ถ”์ •ํ•œ๋‹ค. max ํ™•๋ฅ  ๊ฐ’์„ softํ•˜๊ฒŒ ๋ฝ‘์•„๋‚ด์–ด y๋ฅผ ๋ถ„๋ฅ˜ 

  ๐Ÿ‘‰ ํ™•๋ฅ ์˜ ์ดํ•ฉ์„ 1๋กœ ๋งŒ๋“ค์–ด ๊ณ„์‚ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŠน์ • input x ๊ฐ€ ์–ด๋–ค ๋ถ„๋ฅ˜์— ์†ํ•  ํ™•๋ฅ ์ด ๋†’์„์ง€ ์‰ฝ๊ฒŒ ์ธ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค. 

 

  cf. ์‹œ๊ทธ๋ชจ์ด๋“œ๋Š” ์ด์ค‘ ํด๋ž˜์Šค ํŒ๋ณ„ 

 

โž• https://joey09.tistory.com/53 : ์†Œํ”„ํŠธ๋งฅ์Šค, ๋น„์„ ํ˜•ํ•จ์ˆ˜๊ฐ€ ํ•„์š”ํ•œ ์ด์œ  

 

 

Max-margin loss 

 

๐Ÿ‘‰ ์•ž์„  NER ์˜ˆ์ œ์—์„œ score ํ•จ์ˆ˜๋ฅผ ์ง์ ‘ ์ •์˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, Output์„ ๊ทธ๋Œ€๋กœ ์ด์šฉํ•˜๋Š” score ์‚ฐ์ถœ ๋ฐฉ์‹์œผ๋กœ ์ •๋‹ต๊ณผ ์˜ค๋‹ต ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ตœ๋Œ€๋กœ ๋งŒ๋“œ๋Š” margin ์„ ์ฐพ๋Š” ๋ฐฉ์‹์˜ max-margin loss ํ•จ์ˆ˜๋ฅผ ์†์‹คํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•œ๋‹ค. 

input X ์— ๋Œ€ํ•ด ์ •๋‹ต ํด๋ž˜์Šค์™€ ์˜ค๋‹ต ํด๋ž˜์Šค ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ๋Œ€ํ™” ์‹œ์ผœ์ฃผ๋Š” ์†์‹คํ•จ์ˆ˜ J

 

4. Matrix calculus 

โœ”  Jacobian matrix : Generalization of Gradient 

โž• https://angeloyeo.github.io/2020/07/24/Jacobian.html : ์ž์ฝ”๋น„์•ˆ ํ–‰๋ ฌ์‹ 

 

  • Chain Rule

 

 

โœ”  window classification ์—์„œ์˜ ์—ญ์ „ํŒŒ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ ๊ณ„์‚ฐ๊ณผ์ • 

 

ds/dW ์™€ ds/db ๋ฅผ ๊ตฌํ•ด์•ผํ•จ!

1. ds/db ๋ฅผ ๊ตฌํ•ด๋ณด์ž!

chain rule ์„ ์ ์šฉ
์ž์ฝ”๋น„์•ˆ ์—ฐ์‚ฐ ๋ฐฉ์‹์„ ๋„์ž…ํ•ด์„œ ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ

 

2. ds/dw ๋ฅผ ๊ตฌํ•ด๋ณด์ž!

chain rule ์„ ๋‹ค์‹œ ์ ์šฉํ•ด์„œ ๊ตฌํ•˜๋ฉด ๋จ

๐Ÿ‘€ ๊ทธ๋Ÿฐ๋ฐ ds/dh * dh/dz ์˜ ์—ฐ์‚ฐ ๊ณผ์ •์ด ๊ฒน์นœ๋‹ค!

 

๐Ÿ‘€ ์ค‘๋ณต๋œ ๊ณ„์‚ฐ (์—ฐ์‚ฐ๋Ÿ‰) ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ํŒŒ๋ž€์ƒ‰ ๋ถ€๋ถ„ ์—ฐ์‚ฐ์„ local error signal ๋กœ ๋ฌถ์–ด์„œ ์ •์˜

 

์˜ค์ฐจ ์—ญ์ „ํŒŒ๋Š” ๊ณ„์‚ฐํ–ˆ๋˜ ์ง€๋‚œ ๊ณผ์ •๋“ค์ด ๋‹ค์‹œ ์‚ฌ์šฉ๋˜๋ฉฐ ๊ณ„์‚ฐ๋Ÿ‰์„ ๋Š˜๋ฆฌ๋Š” ๋ฌธ์ œ๋ฅผ ๋ง‰์„ ์ˆ˜ ์žˆ์Œ 

 

 

๐Ÿ“Œ ์‹ค์Šต 

 

3) ๊ฐœ์ฒด๋ช… ์ธ์‹(Named Entity Recognition)

์ฝ”ํผ์Šค๋กœ๋ถ€ํ„ฐ ๊ฐ ๊ฐœ์ฒด(entity)์˜ ์œ ํ˜•์„ ์ธ์‹ํ•˜๋Š” ๊ฐœ์ฒด๋ช… ์ธ์‹(Named Entity Recognition)์— ๋Œ€ํ•ด์„œ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ฐœ์ฒด๋ช… ์ธ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด ์ฝ”ํผ์Šค๋กœ๋ถ€ํ„ฐ ์–ด ...

wikidocs.net

 

 

Named Entity Recognition NER using spaCy | NLP | Part 4

Text Processing using spaCy | NLP Library

towardsdatascience.com

 

 

 

GitHub - monologg/KoBERT-NER: NER Task with KoBERT (with Naver NLP Challenge dataset)

NER Task with KoBERT (with Naver NLP Challenge dataset) - GitHub - monologg/KoBERT-NER: NER Task with KoBERT (with Naver NLP Challenge dataset)

github.com

 

 

 

GitHub - billpku/NLP_In_Action: Do NLP tasks with some SOTA methods

Do NLP tasks with some SOTA methods. Contribute to billpku/NLP_In_Action development by creating an account on GitHub.

github.com

 

 

Custom NER using SpaCy

Explore and run machine learning code with Kaggle Notebooks | Using data from Custom NER in spaCy

www.kaggle.com

 

 

 

Feedback Prize - Evaluating Student Writing | Kaggle

 

www.kaggle.com

728x90

๋Œ“๊ธ€