π‘ μ£Όμ : Word vectors and Word Senses
π ν΅μ¬
- Task : λ¨μ΄ μλ² λ© - Word2vec (2κ°) , Glove (3κ°)
π λͺ©μ°¨ μ 리
1. μ΅μ ν
- Gradient Descent
- Stochastic Gradient Descent
- νμ΅ λ°μ΄ν° μ€μμ λλ€νκ² μνμ νκ°μ© λ½μ gradient λ₯Ό κ³μ°ν ν μ λ°μ΄νΈ νλ λ°©μ
- κ³μ°λμ΄ μ μ & νμ΅μ΄ λΉ λ¦ & local min μ λΉ μ§μ§ μκ³ νμ΅λ μ μμ
- word vector κ° sparse ν΄μ§ → λΆνμν κ³μ° λ°μ
2. Word2vec μ κ³μ° ν¨μ¨μ± λμ΄κΈ° (SGD μΈ)
- Negative Sampling
- λ±μ₯ λ°°κ²½ : softmax λ₯Ό μΆλ ₯μΈ΅μμ κ³μ°ν λ, μ 체 λ¨μ΄λ₯Ό λμμΌλ‘ λ²‘ν° λ΄μ κ³Ό exp κ³μ°μ μ·¨ν΄μ£Όμ΄μΌ ν¨ → μ°μ°λ μ¦κ°
- ν΄κ²° μμ΄λμ΄ : parameter λ₯Ό κ°±μ μν¬ negative sample μ λ½λ κ² (μ¦, μΌλΆ λ¨μ΄λ§μ κ°μ§κ³ κ³μ°) β λ§μ§λ§ λ¨κ³λ₯Ό μ΄μ§λΆλ₯ binary logistic regression λ¬Έμ λ‘ λ³ν ( μ£Όλ³ λ¨μ΄λ€μ κΈμ - positive , λλ€μΌλ‘ μνλ§ λ λ¨μ΄λ€μ λΆμ - negative λ‘ λ μ΄λΈλ§ )
-
- π Negative sample : μ¬μ©μκ° μ§μ ν μλμ° μ¬μ΄μ¦ λ΄μ λ±μ₯νμ§ μλ λ¨μ΄
- Subsampling
- is, the, a μ κ°μ΄ μμ£Ό λ±μ₯νλ λ¨μ΄λ μ κ² λ±μ₯νλ λ¨μ΄λ€λ³΄λ€ μ 보μ κ°μΉκ° λ¨μ΄μ§λ―λ‘, λ§λμΉμ μμ£Ό λ±μ₯νλ λ¨μ΄λ νμ΅λμ νλ₯ μ μΌλ‘ κ°μμν€λ κΈ°λ²
- f(wi) : (ν΄λΉ λ¨μ΄ λΉλ/λ¨μ΄ μ 체 μ)
- Hirerachical softmax
3. Word prediction Methods
- Count based vs Direct Prediction
4. Glove : global vectors for word representation
β Direct Predictionκ³Ό count based λ°©μμ ν©μΉ μλ 벑ν°ν λ°©μ
β co-occurence Matrix
π¨ μ μ¬ν μ°μ/μλ―Έλ₯Ό 보μ ν λ¨μ΄λ€λΌλ¦¬λ λΉμ·ν λ²‘ν° κ΅¬μ±μ 보μ νκ² λλ€. λΉμ·ν λ¨μ΄λ€μ λΉμ·ν νκ²½/λ¬Έλ§₯μμ μ¬μ©λλ―λ‘ λΉμ·ν λ¨μ΄λ€κ³Ό μΈμ νκ² λλ€.
π¨ κ·Έλ¬λ sparse νλ ¬μ νμ±νκ² λλ + λ¨μ΄μ μκ° μ¦κ°ν μλ‘ νλ ¬ μ°¨μμ΄ μ»€μ§λ λ¨μ μ΄ μ‘΄μ¬ → SVD μ°¨μμΆμ
π¨ λμ λμλ±μ₯ count λ₯Ό κ°μ§λ νλ ¬ κ°μ μ€μ¬μΌλ‘ μ°¨μμ μΆμνλ€.
β co-occurence probabilities
- P( k | i ) : λμλ±μ₯ νλ ¬λ‘λΆν° νΉμ λ¨μ΄ i μ μ 체 λ±μ₯ νμλ₯Ό μΉ΄μ΄νΈνκ³ , νΉμ λ¨μ΄ i κ° λ±μ₯νμ λ μ΄λ€ λ¨μ΄ k κ° λ±μ₯ν νμλ₯Ό μΉ΄μ΄νΈνμ¬ κ³μ°ν μ‘°κ±΄λΆ νλ₯
β objective function
- μλ² λ©λ λ λ¨μ΄λ²‘ν° μ€μ¬λ¨μ΄μ μ£Όλ³λ¨μ΄ 벑ν°μ λ΄μ μ΄ corpus μ 체μμμ λμμ λ±μ₯νλ νλ₯ μ λ‘κ·Έκ°μ΄ λλλ‘ μλ² λ© λ²‘ν°λ₯Ό λ§λ λ€.
βΎ Wi : μ€μ¬λ¨μ΄ i μ μλ² λ© λ²‘ν°
βΎ Wj : μ£Όλ³ λ¨μ΄ k μ μλ² λ© λ²‘ν°
βΎ P(i|j) : μ€μ¬λ¨μ΄ i κ° λ±μ₯νμ λ μλμ° λ΄ μ£Όλ³ λ¨μ΄ j κ° λ±μ₯ν νλ₯
- λͺ¨λΈ λ±μ₯ motivation
- λͺ©μ ν¨μ μ λ
- λ€λ₯Έ λͺ¨λΈκ³Όμ κ΄κ³
- κ³μ° 볡μ‘λ
β μ₯λ¨μ
- λμλ±μ₯ νλ₯ μ κ°λ μ λμ → Global statistical information μ ν¨μ¨μ μΈ μ¬μ©μ΄ κ°λ₯ , κ·Έλ¬λ λ©λͺ¨λ¦¬ cost λ λμ
- λΉ λ₯Έ νμ΅μλ, Big, Small corpus μ λν΄μ μ±λ₯μ΄ μ’μ
- Polysemous word (λ€μμ΄) λ¬Έμ ν΄κ²° X
β Hyperparameter
from glove import Corpus, Glove
corpus = Corpus()
# νλ ¨ λ°μ΄ν°λ‘λΆν° Glove μμ μ¬μ©ν λμ λ±μ₯ νλ ¬ μμ±
corpus.fit(result, window = 5)
glove = Glove(no_components = 30, learning_rate=0.05, alpha = 0.75, max_count=100, max_loss = 10.0, random_state = None)
Parameter | λ΄μ© |
no_components | word vector μ μ°¨μ ν¬κΈ° μ€μ |
learning_rate | νμ΅μλ μ€μ (SGD estimation μ μ¬μ©) |
Alpha, max_count | weight λΆμ¬ν λ μ¬μ© |
random_state | μ΅μ νμ μ΄κΈ°ν λ μ¬μ©λλ μν |
β μκ°ν
5. word vectors νκ°λ°©μ
- Extrinsic vs Intrinsic
β intrinsic : μ¬λ°λ₯΄κ² task λ₯Ό ν΄κ²°νλμ§ νμΈνλ λ°©λ². word vector analogy
β Extrinsic : μ€μ μμ€ν μμ μ¬μ©ν΄μ μ±λ₯μ νμΈνλ λ°©λ². Named Entity Recognition (NER)
→ λ νκ°λ°©μ λͺ¨λ Glove λͺ¨λΈμμ κ½€ μ’μ μ±λ₯ κ²°κ³Όλ₯Ό 보μ
6. Word senses and Word sense ambiguity
- λμΌν λ¨μ΄μ μλ‘ λ€λ₯Έ μλ―Έλ₯Ό νννλ λ°©λ²
- Multiple sensors for a word (clustering - re labeling)
- Weighted average
π μ€μ΅μ½λ
- basic : https://wikidocs.net/22885
05) κΈλ‘λΈ(GloVe)
κΈλ‘λΈ(Global Vectors for Word Representation, GloVe)λ μΉ΄μ΄νΈ κΈ°λ°κ³Ό μμΈ‘ κΈ°λ°μ λͺ¨λ μ¬μ©νλ λ°©λ²λ‘ μΌλ‘ 2014λ μ λ―Έκ΅ μ€ν ν¬λλ ...
wikidocs.net
GloVe, word representation
GloVe λ Word2Vec κ³Ό λλΆμ΄ μμ£Ό μ΄μ©λλ word embedding λ°©λ²μ λλ€. Word2Vec κ³Όμ μ°¨μ΄μ μ λνμ¬ μμλ³΄κ³ , Python μ ꡬνμ²΄μΈ glove_python μ νΉμ§μ λν΄μλ μ΄ν΄λ΄ λλ€. κ·Έλ¦¬κ³ glove_python μ μ΄μ©ν
lovit.github.io
- LSTM, Glove λ₯Ό νμ©ν κ°μ λΆμ _ IMDB μν λ°μ΄ν° : https://ichi.pro/ko/lstm-mich-glove-imbeding-eul-sayonghan-gamjeong-bunseog-168372289928846
LSTM λ° GloVe μλ² λ©μ μ¬μ©ν κ°μ λΆμ
κ°μ λΆμμ μΉ μ¬μ΄νΈμ κ³ κ° κΈ°λ°μ λν μ μ©ν ν΅μ°°λ ₯μ μ 곡νμ¬ μμ¬ κ²°μ κ³Όμ μ λλ λ§μ μ‘°μ§κ³Ό νμ¬μ μ€μν λκ΅¬λ‘ λΆμνμ΅λλ€. μ€λ μ λ LSTM λ° GloVe Word Embeddingsλ₯Ό μ¬μ©νμ¬ κ°
ichi.pro
π kaggle μ½λμμ glove λ₯Ό λΌμ΄λΈλ¬λ¦¬λ₯Ό νμ©νμ§ μκ³ μλμ ννλ‘ κ°μ Έμ€λ λ°©μ μ΄ν΄ν΄λ³΄κΈ° : https://lsjsj92.tistory.com/455
glove_dir = '../input/glove-global-vectors-for-word-representation/'
embedding_index = {}
f = open(os.path.join(glove_dir,'glove.6B.50d.txt'))
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype='float32')
embedding_index[word] = coefs
f.close()
print('found word vecs: ',len(embedding_index))
'1οΈβ£ AIβ’DS > π NLP' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[cs224n] 6κ° λ΄μ© μ 리 (0) | 2022.03.24 |
---|---|
[cs224n] 5κ° λ΄μ© μ 리 (0) | 2022.03.22 |
[cs224n] 4κ° λ΄μ© μ 리 (0) | 2022.03.18 |
NLP deep learning (0) | 2022.03.15 |
[cs224n] 3κ° λ΄μ© μ 리 (0) | 2022.03.14 |
λκΈ