๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐Ÿ“— NLP

[cs224n] 2๊ฐ• ๋‚ด์šฉ ์ •๋ฆฌ

by isdawell 2022. 3. 14.
728x90

๐Ÿ’ก ์ฃผ์ œ : Word vectors and Word Senses 

๐Ÿ“Œ ํ•ต์‹ฌ 

  • Task : ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ - Word2vec (2๊ฐ•) , Glove (3๊ฐ•) 

 

๐Ÿ“Œ ๋ชฉ์ฐจ ์ •๋ฆฌ 

1. ์ตœ์ ํ™” 

  •  Gradient Descent 
  •  Stochastic Gradient Descent 
    • ํ•™์Šต ๋ฐ์ดํ„ฐ ์ค‘์—์„œ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ์„ ํ•œ๊ฐœ์”ฉ ๋ฝ‘์•„ gradient ๋ฅผ ๊ณ„์‚ฐํ•œ ํ›„ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹ 
    • ๊ณ„์‚ฐ๋Ÿ‰์ด ์ ์Œ & ํ•™์Šต์ด ๋น ๋ฆ„ & local min ์— ๋น ์ง€์ง€ ์•Š๊ณ  ํ•™์Šต๋  ์ˆ˜ ์žˆ์Œ 
    • word vector ๊ฐ€ sparse ํ•ด์ง → ๋ถˆํ•„์š”ํ•œ ๊ณ„์‚ฐ ๋ฐœ์ƒ

 

2. Word2vec ์˜ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ ๋†’์ด๊ธฐ (SGD ์™ธ) 

  • Negative Sampling 
    • ๋“ฑ์žฅ ๋ฐฐ๊ฒฝ : softmax ๋ฅผ ์ถœ๋ ฅ์ธต์—์„œ ๊ณ„์‚ฐํ•  ๋•Œ, ์ „์ฒด ๋‹จ์–ด๋ฅผ ๋Œ€์ƒ์œผ๋กœ ๋ฒกํ„ฐ ๋‚ด์ ๊ณผ exp ๊ณ„์‚ฐ์„ ์ทจํ•ด์ฃผ์–ด์•ผ ํ•จ → ์—ฐ์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€ 
    • ํ•ด๊ฒฐ ์•„์ด๋””์–ด :  parameter ๋ฅผ ๊ฐฑ์‹ ์‹œํ‚ฌ negative sample ์„ ๋ฝ‘๋Š” ๊ฒƒ (์ฆ‰, ์ผ๋ถ€ ๋‹จ์–ด๋งŒ์„ ๊ฐ€์ง€๊ณ  ๊ณ„์‚ฐ) โž• ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋ฅผ ์ด์ง„๋ถ„๋ฅ˜ binary logistic regression ๋ฌธ์ œ๋กœ ๋ณ€ํ™˜ ( ์ฃผ๋ณ€ ๋‹จ์–ด๋“ค์„ ๊ธ์ • - positive , ๋žœ๋ค์œผ๋กœ ์ƒ˜ํ”Œ๋ง ๋œ ๋‹จ์–ด๋“ค์„ ๋ถ€์ • - negative ๋กœ ๋ ˆ์ด๋ธ”๋ง ) 
    • ๐Ÿ‘€ Negative sample : ์‚ฌ์šฉ์ž๊ฐ€ ์ง€์ •ํ•œ ์œˆ๋„์šฐ ์‚ฌ์ด์ฆˆ ๋‚ด์— ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด 

pizza ๊ฐ€ negative sample ์— ํ•ด๋‹น

 

 

  • Subsampling 
    • is, the, a ์™€ ๊ฐ™์ด ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋Š” ์ ๊ฒŒ ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋“ค๋ณด๋‹ค ์ •๋ณด์˜ ๊ฐ€์น˜๊ฐ€ ๋–จ์–ด์ง€๋ฏ€๋กœ, ๋ง๋ญ‰์น˜์— ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ๋‹จ์–ด๋Š” ํ•™์Šต๋Ÿ‰์„ ํ™•๋ฅ ์ ์œผ๋กœ ๊ฐ์†Œ์‹œํ‚ค๋Š” ๊ธฐ๋ฒ• 
    • f(wi) : (ํ•ด๋‹น ๋‹จ์–ด ๋นˆ๋„/๋‹จ์–ด ์ „์ฒด ์ˆ˜) 

 

  • Hirerachical softmax 

 

3. Word prediction Methods 

 

  • Count based vs Direct Prediction 

 

4. Glove : global vectors for word representation 

 

โญ Direct Prediction๊ณผ count based ๋ฐฉ์‹์„ ํ•ฉ์นœ ์›Œ๋“œ ๋ฒกํ„ฐํ™” ๋ฐฉ์‹ 

โญ co-occurence Matrix

๐Ÿ’จ ์œ ์‚ฌํ•œ ์“ฐ์ž„/์˜๋ฏธ๋ฅผ ๋ณด์œ ํ•œ ๋‹จ์–ด๋“ค๋ผ๋ฆฌ๋Š” ๋น„์Šทํ•œ ๋ฒกํ„ฐ ๊ตฌ์„ฑ์„ ๋ณด์œ ํ•˜๊ฒŒ ๋œ๋‹ค. ๋น„์Šทํ•œ ๋‹จ์–ด๋“ค์€ ๋น„์Šทํ•œ ํ™˜๊ฒฝ/๋ฌธ๋งฅ์—์„œ ์‚ฌ์šฉ๋˜๋ฏ€๋กœ ๋น„์Šทํ•œ ๋‹จ์–ด๋“ค๊ณผ ์ธ์ ‘ํ•˜๊ฒŒ ๋œ๋‹ค. 

๐Ÿ’จ ๊ทธ๋Ÿฌ๋‚˜ sparse ํ–‰๋ ฌ์„ ํ˜•์„ฑํ•˜๊ฒŒ ๋˜๋Š” + ๋‹จ์–ด์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ํ–‰๋ ฌ ์ฐจ์›์ด ์ปค์ง€๋Š” ๋‹จ์ ์ด ์กด์žฌ → SVD ์ฐจ์›์ถ•์†Œ 

๐Ÿ’จ ๋†’์€ ๋™์‹œ๋“ฑ์žฅ count ๋ฅผ ๊ฐ€์ง€๋Š” ํ–‰๋ ฌ ๊ฐ’์„ ์ค‘์‹ฌ์œผ๋กœ ์ฐจ์›์„ ์ถ•์†Œํ•œ๋‹ค. 

 

 

โญ co-occurence probabilities 

  • P( k | i ) : ๋™์‹œ๋“ฑ์žฅ ํ–‰๋ ฌ๋กœ๋ถ€ํ„ฐ ํŠน์ • ๋‹จ์–ด i ์˜ ์ „์ฒด ๋“ฑ์žฅ ํšŸ์ˆ˜๋ฅผ ์นด์šดํŠธํ•˜๊ณ , ํŠน์ • ๋‹จ์–ด i ๊ฐ€ ๋“ฑ์žฅํ–ˆ์„ ๋•Œ ์–ด๋–ค ๋‹จ์–ด k ๊ฐ€ ๋“ฑ์žฅํ•œ ํšŸ์ˆ˜๋ฅผ ์นด์šดํŠธํ•˜์—ฌ ๊ณ„์‚ฐํ•œ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  

๋™์‹œ๋ฐœ์ƒ ํ™•๋ฅ  ๋น„์œจ์˜ ํŠน์„ฑ์„ ์ด์šฉํ•ด ๋‹จ์–ด๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•ด๋ณด์ž

 

 

โญ objective function 

  • ์ž„๋ฒ ๋”ฉ๋œ ๋‘ ๋‹จ์–ด๋ฒกํ„ฐ ์ค‘์‹ฌ๋‹จ์–ด์™€ ์ฃผ๋ณ€๋‹จ์–ด ๋ฒกํ„ฐ์˜ ๋‚ด์ ์ด corpus ์ „์ฒด์—์„œ์˜ ๋™์‹œ์— ๋“ฑ์žฅํ•˜๋Š” ํ™•๋ฅ ์˜ ๋กœ๊ทธ๊ฐ’์ด ๋˜๋„๋ก ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ ๋‹ค. 

โ—พ Wi : ์ค‘์‹ฌ๋‹จ์–ด i ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ

โ—พ Wj : ์ฃผ๋ณ€ ๋‹จ์–ด k ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ

โ—พ P(i|j) : ์ค‘์‹ฌ๋‹จ์–ด i ๊ฐ€ ๋“ฑ์žฅํ–ˆ์„ ๋•Œ ์œˆ๋„์šฐ ๋‚ด ์ฃผ๋ณ€ ๋‹จ์–ด j ๊ฐ€ ๋“ฑ์žฅํ•  ํ™•๋ฅ 

 

  • ๋ชจ๋ธ ๋“ฑ์žฅ motivation 
  • ๋ชฉ์  ํ•จ์ˆ˜ ์œ ๋„ 
  • ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ์˜ ๊ด€๊ณ„ 
  • ๊ณ„์‚ฐ ๋ณต์žก๋„ 

โญ ์žฅ๋‹จ์   

  • ๋™์‹œ๋“ฑ์žฅ ํ™•๋ฅ ์˜ ๊ฐœ๋…์„ ๋„์ž… → Global statistical information ์˜ ํšจ์œจ์ ์ธ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅ , ๊ทธ๋Ÿฌ๋‚˜ ๋ฉ”๋ชจ๋ฆฌ cost ๋Š” ๋†’์Œ 
  • ๋น ๋ฅธ ํ•™์Šต์†๋„, Big, Small corpus ์— ๋Œ€ํ•ด์„œ ์„ฑ๋Šฅ์ด ์ข‹์Œ 
  • Polysemous word (๋‹ค์˜์–ด) ๋ฌธ์ œ ํ•ด๊ฒฐ X 

โญ Hyperparameter 

 

from glove import Corpus, Glove 
corpus = Corpus() 


# ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ Glove ์—์„œ ์‚ฌ์šฉํ•  ๋™์‹œ ๋“ฑ์žฅ ํ–‰๋ ฌ ์ƒ์„ฑ 
corpus.fit(result, window = 5) 

glove = Glove(no_components = 30, learning_rate=0.05, alpha = 0.75, max_count=100, max_loss = 10.0, random_state = None)

 

Parameter ๋‚ด์šฉ 
no_components word vector ์˜ ์ฐจ์› ํฌ๊ธฐ ์„ค์ •
learning_rate ํ•™์Šต์†๋„ ์„ค์ • (SGD estimation ์‹œ ์‚ฌ์šฉ) 
Alpha, max_count weight ๋ถ€์—ฌํ•  ๋•Œ ์‚ฌ์šฉ
random_state ์ตœ์ ํ™”์‹œ ์ดˆ๊ธฐํ™” ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ์ƒํƒœ 

 

 

 

โญ ์‹œ๊ฐํ™” 

 

๋ฐ˜์˜์–ด ๊ด€๊ณ„์— ์žˆ๋Š” ๋‹จ์–ด์Œ 2์ฐจ์› ์‹œ๊ฐํ™” (๋น„์Šทํ•œ ๊ฐ„๊ฒฉ์œผ๋กœ ๊ณต๊ฐ„ ๋‚ด์— ์œ„์น˜)

 

 

5. word vectors ํ‰๊ฐ€๋ฐฉ์‹ 

 

  • Extrinsic vs Intrinsic 

โญ intrinsic : ์˜ฌ๋ฐ”๋ฅด๊ฒŒ task ๋ฅผ ํ•ด๊ฒฐํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•. word vector analogy

โญ Extrinsic : ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉํ•ด์„œ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•. Named Entity Recognition (NER)

 

→ ๋‘ ํ‰๊ฐ€๋ฐฉ์‹ ๋ชจ๋‘ Glove ๋ชจ๋ธ์—์„œ ๊ฝค ์ข‹์€ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž„ 

 

6. Word senses and Word sense ambiguity 

 

  • ๋™์ผํ•œ ๋‹จ์–ด์˜ ์„œ๋กœ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • Multiple sensors for a word (clustering - re labeling) 
  • Weighted average 

๐Ÿ“Œ ์‹ค์Šต์ฝ”๋“œ

 

 

05) ๊ธ€๋กœ๋ธŒ(GloVe)

๊ธ€๋กœ๋ธŒ(Global Vectors for Word Representation, GloVe)๋Š” ์นด์šดํŠธ ๊ธฐ๋ฐ˜๊ณผ ์˜ˆ์ธก ๊ธฐ๋ฐ˜์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ 2014๋…„์— ๋ฏธ๊ตญ ์Šคํƒ ํฌ๋“œ๋Œ€ ...

wikidocs.net

 

 

GloVe, word representation

GloVe ๋Š” Word2Vec ๊ณผ ๋”๋ถˆ์–ด ์ž์ฃผ ์ด์šฉ๋˜๋Š” word embedding ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Word2Vec ๊ณผ์˜ ์ฐจ์ด์ ์— ๋Œ€ํ•˜์—ฌ ์•Œ์•„๋ณด๊ณ , Python ์˜ ๊ตฌํ˜„์ฒด์ธ glove_python ์˜ ํŠน์ง•์— ๋Œ€ํ•ด์„œ๋„ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  glove_python ์„ ์ด์šฉํ•˜

lovit.github.io

 

 

LSTM ๋ฐ GloVe ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•œ ๊ฐ์ • ๋ถ„์„

๊ฐ์ • ๋ถ„์„์€ ์›น ์‚ฌ์ดํŠธ์˜ ๊ณ ๊ฐ ๊ธฐ๋ฐ˜์— ๋Œ€ํ•œ ์œ ์šฉํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•˜์—ฌ ์˜์‚ฌ ๊ฒฐ์ • ๊ณผ์ •์„ ๋•๋Š” ๋งŽ์€ ์กฐ์ง๊ณผ ํšŒ์‚ฌ์— ์ค‘์š”ํ•œ ๋„๊ตฌ๋กœ ๋ถ€์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜ ์ €๋Š” LSTM ๋ฐ GloVe Word Embeddings๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„

ichi.pro

 

 

๐Ÿ‘€ kaggle ์ฝ”๋“œ์—์„œ glove ๋ฅผ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ  ์•„๋ž˜์˜ ํ˜•ํƒœ๋กœ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐฉ์‹ ์ดํ•ดํ•ด๋ณด๊ธฐ : https://lsjsj92.tistory.com/455 

glove_dir = '../input/glove-global-vectors-for-word-representation/'
embedding_index = {}

f = open(os.path.join(glove_dir,'glove.6B.50d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:],dtype='float32')
    embedding_index[word] = coefs
f.close()

print('found word vecs: ',len(embedding_index))

 

728x90

๋Œ“๊ธ€