λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
1️⃣ AI•DS/πŸ“˜ GNN

[CS224W] Graph Neural Network

by isdawell 2022. 11. 24.
728x90

 

1️⃣  6κ°• 볡슡 


 

πŸ”Ή Main Topic : Graph Neural Networks 

 

 

β‘  볡슡 : Node embedding 

 

• κ·Έλž˜ν”„μ—μ„œ μœ μ‚¬ν•œ λ…Έλ“œλ“€μ΄ ν•¨μˆ˜ f λ₯Ό 거쳐 d μ°¨μ›μœΌλ‘œ μž„λ² λ”© λ˜μ—ˆμ„ λ•Œ, μž„λ² λ”© 곡간 λ‚΄μ—μ„œ κ°€κΉŒμ΄ μœ„μΉ˜ν•˜λ„λ‘ λ§Œλ“œλŠ” 것 

 

β†ͺ Encoder : 각 λ…Έλ“œλ₯Ό 저차원 λ²‘ν„°λ‘œ 맀핑 

β†ͺ Similarity function : μ›λž˜ κ·Έλž˜ν”„ λ‚΄μ—μ„œμ˜ λ…Έλ“œ κ°„ μœ μ‚¬λ„μ™€ μž„λ² λ”© κ³΅κ°„μ—μ„œ λ…Έλ“œ λ²‘ν„°μ˜ 내적값이 μœ μ‚¬ν•˜λ„λ‘ λ§Œλ“œλŠ” ν•¨μˆ˜ 

 

 

• Shallow Encoding (embedding lookup) : μž„λ² λ”© ν–‰λ ¬μ—μ„œ λ…Έλ“œμ˜ μž„λ² λ”© 벑터λ₯Ό 각 μΉΌλŸΌμ— λ‹΄μ•„, λ‹¨μˆœνžˆ 벑터λ₯Ό μ½μ–΄μ˜€λŠ” 방식 → 🀨 λ…Έλ“œ 간에 νŒŒλΌλ―Έν„°λ₯Ό κ³΅μœ ν•˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ— λ…Έλ“œμ˜ κ°œμˆ˜κ°€ 증가할 수둝 ν–‰λ ¬μ˜ 크기가 계속 λŠ˜μ–΄λ‚˜κ²Œ 되며, ν›ˆλ ¨ κ³Όμ •μ—μ„œ 보지 λͺ»ν•œ λ…Έλ“œλŠ” μž„λ² λ”©μ„ 생성할 수 μ—†λ‹€. λ˜ν•œ λ…Έλ“œμ˜ feature μ •λ³΄λŠ” ν¬ν•¨λ˜μ§€ μ•ŠλŠ”λ‹€. 

 

 

 

β‘‘ GNN 

 

• λ‹¨μˆœνžˆ look up ν•˜λŠ” μž„λ² λ”© λ°©μ‹μ˜ ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κ³ μž 닀쀑 λ ˆμ΄μ–΄λ‘œ κ΅¬μ„±λœ encoder λ₯Ό ν™œμš© 

Task : Node classification, Link prediction, Community detection, network similarity 

 

 

λ„€νŠΈμ›Œν¬λ₯Ό DNN ꡬ쑰에 ν†΅κ³Όμ‹œμΌœ λ…Έλ“œ μž„λ² λ”©μ„ 생성

 

 

πŸ‘€ κ·ΈλŸ¬λ‚˜ λ¬Έμ œκ°€ μžˆλ‹€

 

β†ͺ λ„€νŠΈμ›Œν¬λŠ” μž„μ˜μ˜ 크기λ₯Ό 가지고 있으며 λ³΅μž‘ν•œ topological ꡬ쑰λ₯Ό 가진닀. 

β†ͺ νŠΉμ •ν•œ κΈ°μ€€μ μ΄λ‚˜ 정해진 μˆœμ„œκ°€ μ—†λ‹€. 

β†ͺ 동적이며 multimodal feature λ₯Ό 가진닀. 

 

 

 

 

 

πŸ”Ή Deep learning for Graphs 

 

 

β‘  Notation 

 

 

• V : λ…Έλ“œμ§‘ν•© 

• A : 인접행렬 (μ—°κ²° μ—¬λΆ€λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 방식 : binary) 

• X : node feature ν–‰λ ¬ 

• N(v) : v 의 μ΄μ›ƒλ…Έλ“œ 집합 

 

 

β‘‘ Convolutional Neworks 

 

β€» 정리 μ°Έκ³  : https://manywisdom-career.tistory.com/71 

 

[인곡지λŠ₯] GNN

Summary ✨ Idea for deep learning for graphs β—Ύ Multiple layers of embedding transformation β—Ύ At every layer, use the embedding at previous layer as the input β—Ύ ⭐⭐ Aggregation of neighbors ✨ Graph convolutional network β—Ύ Mean aggregaton → p

manywisdom-career.tistory.com

 

 

•  Convolutional μ—°μ‚° : Sliding window λ₯Ό 톡해 얻은 정보λ₯Ό λͺ¨λ‘ 더해 output 을 λ„μΆœ 

 

 

•  μ΄μ›ƒλ…Έλ“œμ˜ 정보λ₯Ό λ³€ν™˜ν•˜κ³  κ²°ν•©ν•˜μ—¬ νŠΉμ • λ…Έλ“œλ₯Ό μž„λ² λ”©ν•œλ‹€. 

 

λͺ¨λ“  λ…Έλ“œλŠ”, μ΄μ›ƒλ…Έλ“œμ— κΈ°λ°˜ν•˜μ—¬ computation graph λ₯Ό μ •μ˜ν•œλ‹€.

 

 

•  Layer-k embedding : k hop 만큼의 μ΄μ›ƒλ…Έλ“œμ˜ 정보λ₯Ό 가져와 μž„λ² λ”© ν–ˆλ‹€λŠ” 의미 

 

layer 0 μ—μ„œλŠ” λ…Έλ“œ v 의 feature κ°’μœΌλ‘œ 초기 μž„λ² λ”©μ„ μ„€μ •ν•œλ‹€.

 

 

•  Neighborhood aggregation : μ΄μ›ƒλ…Έλ“œλ‘œλΆ€ν„° 정보λ₯Ό μ§‘κ³„ν•˜λŠ” 방식은 λ„€νŠΈμ›Œν¬λ§ˆλ‹€ λ‹€λ₯΄λ‹€. μ΄λ•Œ μ§‘κ³„ν•˜λŠ” ν•¨μˆ˜λŠ” permutation invariant (μž…λ ₯ λ°μ΄ν„°μ˜ μˆœμ„œμ— 영ν–₯을 받지 μ•ŠλŠ”) ν•¨μˆ˜μ—¬μ•Ό ν•˜λ©°, 기본적으둜 많이 μ‚¬μš©ν•˜λŠ” 방식은 정보λ₯Ό average (평균) ν•˜λŠ” 기법을 많이 μ‚¬μš©ν•œλ‹€. 

 

β‘’ μˆ˜ν•™ 곡식 

 

 

πŸ‘‰ Wk, Bk : ν•™μŠ΅ν•  κ°€μ€‘μΉ˜ νŒŒλΌλ―Έν„°λ‘œ Wk λŠ” μ΄μ›ƒλ…Έλ“œλ“€λ‘œλΆ€ν„° μ§‘κ³„ν•œ 정보에 λŒ€ν•΄ λΆ€μ—¬ν•˜λŠ” κ°€μ€‘μΉ˜μ΄κ³  Bk λŠ” ν˜„μž¬ 계산쀑인 λ…Έλ“œμ˜ 이전 λ ˆμ΄μ–΄μ—μ„œμ˜ (자기 μžμ‹ μ˜) μž„λ² λ”© 정보에 λŒ€ν•œ κ°€μ€‘μΉ˜μ΄λ‹€. 이 두 값을 톡해 이웃정보에 집쀑할지, 자기 μžμ‹ μ˜ κ°’μ˜ λ³€ν™˜μ— 집쀑할지 κ²°μ •ν•œλ‹€. 이 νŒŒλΌλ―Έν„°λŠ” νŠΉμ • λ…Έλ“œλ₯Ό μž„λ² λ”©ν•  λ•Œ λͺ¨λ“  λ…Έλ“œμ— λŒ€ν•΄ κ³΅μœ λ˜λŠ” 값이기 λ•Œλ¬Έμ—, μƒˆλ‘œμš΄ λ…Έλ“œλ‚˜ κ·Έλž˜ν”„μ— λŒ€ν•΄μ„œλ„ μΌλ°˜ν™”μ‹œν‚¬ 수 μžˆλ‹€. 

 

 

β‘£ GNN ν›ˆλ ¨λ°©μ‹ 

 

Goal : Node embedding Zv 

input : Graph 

 

β†ͺ Unsupervised setting : κ·Έλž˜ν”„ ꡬ쑰λ₯Ό supervision으둜 μ‚¬μš© 

 

β†ͺ Supervised setting [Node classification] : node label y 에 λŒ€ν•΄, μ‹€μ œ 라벨과 λ…Έλ“œ μž„λ² λ”© κ²°κ³Όκ°’ 기반의 예츑 라벨값 μ‚¬μ΄μ˜ loss function 을 μ •μ˜ν•˜μ—¬ ν›ˆλ ¨μ„ 진행 

 

 

 

 

 

 

 

 

 

2️⃣  μ½”λ“œλ¦¬λ·° 


https://colab.research.google.com/drive/1DsdBei9OSz4yRZ-KIGEU6iaHflTTRGY-?usp=sharing 

 

cs224w 6κ°• 볡슡과제.ipynb

Colaboratory notebook

colab.research.google.com

 

 

πŸ”Ή Dataset 

 

• Cora dataset 

 

  • λ‹€λ₯Έ 논문을 μΈμš©ν•˜λŠ” 연결ꡬ쑰λ₯Ό ν‘œν˜„ν•œ 것 : Citation Network
  • 2708 개의 κ³Όν•™ λΆ„μ•Ό λ…Όλ¬Έ μΆœκ°„μ— λŒ€ν•œ λ°μ΄ν„°λ‘œ, 각 논문은 7개 class λΆ„λ₯˜ 쀑 ν•˜λ‚˜μ— μ†ν•œλ‹€.
  • 5429 개의 링크 (엣지)둜 κ΅¬μ„±λ˜μ–΄ μžˆλ‹€. 
  • 각 λ…Έλ“œλŠ” 단어사전을 기반으둜 0 (ν•΄λ‹Ή 단어가 μ‘΄μž¬ν•˜μ§€ μ•ŠμŒ) ν˜Ήμ€ 1 (ν•΄λ‹Ή 단어가 μ‘΄μž¬ν•¨) binary 값을 가진 λ‹¨μ–΄λ²‘ν„°λ‘œ 이루어져 μžˆλ‹€. 단어사전은 1433개의 λ‹¨μ–΄λ‘œ κ΅¬μ„±λ˜μ–΄ μžˆλ‹€ πŸ‘‰ node_features = 1433
  • Main Task : node classification (CrossEntropyLoss - 닀쀑뢄λ₯˜)

 

 

β‘  Data Normalization 

 

 

→ GCN κ³Όμ •μ—μ„œ λ…Έλ“œ 차수둜 μ •κ·œν™” ν•˜λŠ” κ³Όμ • 

 

dataset = Planetoid("/tmp/Cora", name="Cora")
print(f'μ •κ·œν™” 없이 ν–‰λ ¬μ˜ 각 ν–‰μ˜ κ°’ ν•©μ‚° κ²°κ³Ό : {dataset[0].x.sum(dim=-1)}')

dataset = Planetoid("/tmp/Cora", name="Cora", transform = T.NormalizeFeatures()) #🐾
print(f'μ •κ·œν™”λ₯Ό μ μš©ν•΄ ν–‰λ ¬μ˜ 각 ν–‰μ˜ κ°’ ν•©μ‚° κ²°κ³Ό : {dataset[0].x.sum(dim=-1)}') # dim = axis

 

 

β‘‘ GCN model architecture 

 

class GCN(torch.nn.Module) : 

  def __init__(self, num_node_features : int,  num_classes : int, hidden_dim : int = 16, dropout_rate : float = 0.5) : 
      super().__init__()
      self.dropout1 = torch.nn.Dropout(dropout_rate) 
      self.conv1 = GCNConv(num_node_features, hidden_dim) 
      # (conv1): GCNConv(1433, 16)
      self.relu = torch.nn.ReLU(inplace = True) 
      self.dropout2 = torch.nn.Dropout(dropout_rate) 
      self.conv2 = GCNConv(hidden_dim, num_classes)
      # (conv2): GCNConv(16, 7)
  
  def forward(self, x : Tensor, edge_index : Tensor) -> torch.Tensor : 
    x = self.dropout1(x) 
    x = self.conv1(x, edge_index) 
    x = self.relu(x) 
    x = self.dropout2(x) 
    x = self.conv2(x, edge_index) 
    return x

hidden units = 16

 

 

β‘’ Training and Evaluation 

 

def train_step(model : torch.nn.Module, data : Data, optimizer : torch.optim.Optimizer,loss_fn : LossFn) : 
  Tuple[float, float] 
  model.train() 
  optimizer.zero_grad() 
  mask = data.train_mask 
  logits = model(data.x, data.edge_index)[mask] 
  preds = logits.argmax(dim=1) 
  y = data.y[mask] 
  loss = loss_fn(logits, y) 
  acc = (preds == y).sum().item() / y.numel() #βœ” numel : torch tensor 크기λ₯Ό λ°˜ν™˜ 
  loss.backward() 
  optimizer.step() 
  return loss.item(), acc



@torch.no_grad() 
def eval_step(model : torch.nn.Module, data : Data, loss_fn : LossFn, stage : Stage) : 
  model.eval() 
  mask = getattr(data, f'{stage}_mask') 
  logits = model(data.x, data.edge_index)[mask] 
  preds = logits.argmax(dim=1) 
  y = data.y[mask] 
  loss = loss_fn(logits, y) 
  acc = (preds == y).sum().item() / y.numel() #βœ”
  return loss.item(),acc

 

•  optimizer.zero_grad() : νŒŒλΌλ―Έν„° μ΄ˆκΈ°ν™” 

•  preds = logits.argmax(dim=1) : μ˜ˆμΈ‘κ°’ 인덱슀 λ°˜ν™˜ πŸ‘‰ 7개의 class 인덱슀 쀑 ν•˜λ‚˜λ₯Ό λ°˜ν™˜ 

 

 

β‘£ Train function define and Training 

 

SEED = 42
MAX_EPOCHS = 200
LEARNING_RATE = 0.01
WEIGHT_DECAY = 5e-4
EARLY_STOPPING = 10


def train(model : torch.nn.Module, data : Data, optimizer : torch.optim.Optimizer, 
          loss_fn : LossFn = torch.nn.CrossEntropyLoss(), max_epochs : int = 200,
          early_stopping : int = 10, print_interval : int = 20, verbose : bool = True) :

          history = {'loss':[],'val_loss' : [], 'acc' : [], 'val_acc' : []} 

          for epoch in range(max_epochs) : 
            loss, acc = train_step(model, data, optimizer, loss_fn) 
            val_loss, val_acc = eval_step(model, data, loss_fn, 'val')
            history['loss'].append(loss) 
            history['acc'].append(acc) 
            history["val_loss"].append(val_loss)
            history['val_acc'].append(val_acc) 

            if epoch > early_stopping and val_loss > np.mean(history['val_loss'][-(early_stopping +1) : -1]) : 
              if verbose : 
                print('\n ealry stopping ...') 
              
              break 
            
            if verbose and epoch % print_interval == 0 : 
              print(f'\nEpoch : {epoch} \n----------- ')
              print(f'Train loss : {loss:.4f} | Train acc : {acc:.4f}')
              print(f'Val loss : {val_loss : .4f} | Val acc : {val_acc : .4f}')

          test_loss, test_acc = eval_step(model, data, loss_fn, "test")
          if verbose:
              print(f"\nEpoch: {epoch}\n----------")
              print(f"Train loss: {loss:.4f} | Train acc: {acc:.4f}")
              print(f"  Val loss: {val_loss:.4f} |   Val acc: {val_acc:.4f}")
              print(f" Test loss: {test_loss:.4f} |  Test acc: {test_acc:.4f}")
            
          return history

 

•  loss function μ •μ˜ 및 ν•˜μ΄νΌνŒŒλΌλ―Έν„° μ •μ˜ (max_epoch, early stopping) 

•  accuracy, loss 좜λ ₯ ν•¨μˆ˜ μ •μ˜ 

 

 

torch.manual_seed(SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = GCN(dataset.num_node_features, dataset.num_classes).to(device) 
data = dataset[0].to(device) 
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE, weight_decay = WEIGHT_DECAY) 
history = train(model, data, optimizer, max_epochs = MAX_EPOCHS, early_stopping = EARLY_STOPPING) 

plt.figure(figsize = (12,4)) 
plot_history(history, 'GCN')

 

epoch 120μ—μ„œ μ‘°κΈ°μ’…λ£Œ

 

 

728x90

'1️⃣ AIβ€’DS > πŸ“˜ GNN' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€

[cs224w] Frequent Subgraph Mining with GNNs  (0) 2023.01.27
[cs224w] Theory of Graph Neural Networks  (0) 2023.01.06
[CS224W] Message Passing and Node classification  (0) 2022.11.17
[CS224W] PageRank  (0) 2022.11.02
[CS224W] 1κ°• Machine Learning With Graphs  (0) 2022.10.11

λŒ“κΈ€