๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AIโ€ขDS/๐Ÿ“˜ GNN

[CS224W] Graph Neural Network

by isdawell 2022. 11. 24.
728x90

 

1๏ธโƒฃ  6๊ฐ• ๋ณต์Šต 


 

๐Ÿ”น Main Topic : Graph Neural Networks 

 

 

โ‘  ๋ณต์Šต : Node embedding 

 

โ€ข ๊ทธ๋ž˜ํ”„์—์„œ ์œ ์‚ฌํ•œ ๋…ธ๋“œ๋“ค์ด ํ•จ์ˆ˜ f ๋ฅผ ๊ฑฐ์ณ d ์ฐจ์›์œผ๋กœ ์ž„๋ฒ ๋”ฉ ๋˜์—ˆ์„ ๋•Œ, ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„ ๋‚ด์—์„œ ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ๊ฒƒ 

 

โ†ช Encoder : ๊ฐ ๋…ธ๋“œ๋ฅผ ์ €์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋งคํ•‘ 

โ†ช Similarity function : ์›๋ž˜ ๊ทธ๋ž˜ํ”„ ๋‚ด์—์„œ์˜ ๋…ธ๋“œ ๊ฐ„ ์œ ์‚ฌ๋„์™€ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์—์„œ ๋…ธ๋“œ ๋ฒกํ„ฐ์˜ ๋‚ด์ ๊ฐ’์ด ์œ ์‚ฌํ•˜๋„๋ก ๋งŒ๋“œ๋Š” ํ•จ์ˆ˜ 

 

 

โ€ข Shallow Encoding (embedding lookup) : ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ์—์„œ ๋…ธ๋“œ์˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋ฅผ ๊ฐ ์นผ๋Ÿผ์— ๋‹ด์•„, ๋‹จ์ˆœํžˆ ๋ฒกํ„ฐ๋ฅผ ์ฝ์–ด์˜ค๋Š” ๋ฐฉ์‹ โ†’ ๐Ÿคจ ๋…ธ๋“œ ๊ฐ„์— ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณต์œ ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ํ–‰๋ ฌ์˜ ํฌ๊ธฐ๊ฐ€ ๊ณ„์† ๋Š˜์–ด๋‚˜๊ฒŒ ๋˜๋ฉฐ, ํ›ˆ๋ จ ๊ณผ์ •์—์„œ ๋ณด์ง€ ๋ชปํ•œ ๋…ธ๋“œ๋Š” ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๋‹ค. ๋˜ํ•œ ๋…ธ๋“œ์˜ feature ์ •๋ณด๋Š” ํฌํ•จ๋˜์ง€ ์•Š๋Š”๋‹ค. 

 

 

 

โ‘ก GNN 

 

โ€ข ๋‹จ์ˆœํžˆ look up ํ•˜๋Š” ์ž„๋ฒ ๋”ฉ ๋ฐฉ์‹์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ ์ž ๋‹ค์ค‘ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋œ encoder ๋ฅผ ํ™œ์šฉ 

โ€ข Task : Node classification, Link prediction, Community detection, network similarity 

 

 

๋„คํŠธ์›Œํฌ๋ฅผ DNN ๊ตฌ์กฐ์— ํ†ต๊ณผ์‹œ์ผœ ๋…ธ๋“œ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑ

 

 

๐Ÿ‘€ ๊ทธ๋Ÿฌ๋‚˜ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค

 

โ†ช ๋„คํŠธ์›Œํฌ๋Š” ์ž„์˜์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ๋ณต์žกํ•œ topological ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค. 

โ†ช ํŠน์ •ํ•œ ๊ธฐ์ค€์ ์ด๋‚˜ ์ •ํ•ด์ง„ ์ˆœ์„œ๊ฐ€ ์—†๋‹ค. 

โ†ช ๋™์ ์ด๋ฉฐ multimodal feature ๋ฅผ ๊ฐ€์ง„๋‹ค. 

 

 

 

 

 

๐Ÿ”น Deep learning for Graphs 

 

 

โ‘  Notation 

 

 

โ€ข V : ๋…ธ๋“œ์ง‘ํ•ฉ 

โ€ข A : ์ธ์ ‘ํ–‰๋ ฌ (์—ฐ๊ฒฐ ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐฉ์‹ : binary) 

โ€ข X : node feature ํ–‰๋ ฌ 

โ€ข N(v) : v ์˜ ์ด์›ƒ๋…ธ๋“œ ์ง‘ํ•ฉ 

 

 

โ‘ก Convolutional Neworks 

 

โ€ป ์ •๋ฆฌ ์ฐธ๊ณ  : https://manywisdom-career.tistory.com/71 

 

[์ธ๊ณต์ง€๋Šฅ] GNN

Summary โœจ Idea for deep learning for graphs โ—พ Multiple layers of embedding transformation โ—พ At every layer, use the embedding at previous layer as the input โ—พ โญโญ Aggregation of neighbors โœจ Graph convolutional network โ—พ Mean aggregaton โ†’ p

manywisdom-career.tistory.com

 

 

โ€ข  Convolutional ์—ฐ์‚ฐ : Sliding window ๋ฅผ ํ†ตํ•ด ์–ป์€ ์ •๋ณด๋ฅผ ๋ชจ๋‘ ๋”ํ•ด output ์„ ๋„์ถœ 

 

 

โ€ข  ์ด์›ƒ๋…ธ๋“œ์˜ ์ •๋ณด๋ฅผ ๋ณ€ํ™˜ํ•˜๊ณ  ๊ฒฐํ•ฉํ•˜์—ฌ ํŠน์ • ๋…ธ๋“œ๋ฅผ ์ž„๋ฒ ๋”ฉํ•œ๋‹ค. 

 

๋ชจ๋“  ๋…ธ๋“œ๋Š”, ์ด์›ƒ๋…ธ๋“œ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ computation graph ๋ฅผ ์ •์˜ํ•œ๋‹ค.

 

 

โ€ข  Layer-k embedding : k hop ๋งŒํผ์˜ ์ด์›ƒ๋…ธ๋“œ์˜ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์™€ ์ž„๋ฒ ๋”ฉ ํ–ˆ๋‹ค๋Š” ์˜๋ฏธ 

 

layer 0 ์—์„œ๋Š” ๋…ธ๋“œ v ์˜ feature ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐ ์ž„๋ฒ ๋”ฉ์„ ์„ค์ •ํ•œ๋‹ค.

 

 

โ€ข  Neighborhood aggregation : ์ด์›ƒ๋…ธ๋“œ๋กœ๋ถ€ํ„ฐ ์ •๋ณด๋ฅผ ์ง‘๊ณ„ํ•˜๋Š” ๋ฐฉ์‹์€ ๋„คํŠธ์›Œํฌ๋งˆ๋‹ค ๋‹ค๋ฅด๋‹ค. ์ด๋•Œ ์ง‘๊ณ„ํ•˜๋Š” ํ•จ์ˆ˜๋Š” permutation invariant (์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š”) ํ•จ์ˆ˜์—ฌ์•ผ ํ•˜๋ฉฐ, ๊ธฐ๋ณธ์ ์œผ๋กœ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์€ ์ •๋ณด๋ฅผ average (ํ‰๊ท ) ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค. 

 

โ‘ข ์ˆ˜ํ•™ ๊ณต์‹ 

 

 

๐Ÿ‘‰ Wk, Bk : ํ•™์Šตํ•  ๊ฐ€์ค‘์น˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ Wk ๋Š” ์ด์›ƒ๋…ธ๋“œ๋“ค๋กœ๋ถ€ํ„ฐ ์ง‘๊ณ„ํ•œ ์ •๋ณด์— ๋Œ€ํ•ด ๋ถ€์—ฌํ•˜๋Š” ๊ฐ€์ค‘์น˜์ด๊ณ  Bk ๋Š” ํ˜„์žฌ ๊ณ„์‚ฐ์ค‘์ธ ๋…ธ๋“œ์˜ ์ด์ „ ๋ ˆ์ด์–ด์—์„œ์˜ (์ž๊ธฐ ์ž์‹ ์˜) ์ž„๋ฒ ๋”ฉ ์ •๋ณด์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜์ด๋‹ค. ์ด ๋‘ ๊ฐ’์„ ํ†ตํ•ด ์ด์›ƒ์ •๋ณด์— ์ง‘์ค‘ํ• ์ง€, ์ž๊ธฐ ์ž์‹ ์˜ ๊ฐ’์˜ ๋ณ€ํ™˜์— ์ง‘์ค‘ํ• ์ง€ ๊ฒฐ์ •ํ•œ๋‹ค. ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํŠน์ • ๋…ธ๋“œ๋ฅผ ์ž„๋ฒ ๋”ฉํ•  ๋•Œ ๋ชจ๋“  ๋…ธ๋“œ์— ๋Œ€ํ•ด ๊ณต์œ ๋˜๋Š” ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ƒˆ๋กœ์šด ๋…ธ๋“œ๋‚˜ ๊ทธ๋ž˜ํ”„์— ๋Œ€ํ•ด์„œ๋„ ์ผ๋ฐ˜ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. 

 

 

โ‘ฃ GNN ํ›ˆ๋ จ๋ฐฉ์‹ 

 

โ€ข Goal : Node embedding Zv 

โ€ข input : Graph 

 

โ†ช Unsupervised setting : ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋ฅผ supervision์œผ๋กœ ์‚ฌ์šฉ 

 

โ†ช Supervised setting [Node classification] : node label y ์— ๋Œ€ํ•ด, ์‹ค์ œ ๋ผ๋ฒจ๊ณผ ๋…ธ๋“œ ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๊ฐ’ ๊ธฐ๋ฐ˜์˜ ์˜ˆ์ธก ๋ผ๋ฒจ๊ฐ’ ์‚ฌ์ด์˜ loss function ์„ ์ •์˜ํ•˜์—ฌ ํ›ˆ๋ จ์„ ์ง„ํ–‰ 

 

 

 

 

 

 

 

 

 

2๏ธโƒฃ  ์ฝ”๋“œ๋ฆฌ๋ทฐ 


https://colab.research.google.com/drive/1DsdBei9OSz4yRZ-KIGEU6iaHflTTRGY-?usp=sharing 

 

cs224w 6๊ฐ• ๋ณต์Šต๊ณผ์ œ.ipynb

Colaboratory notebook

colab.research.google.com

 

 

๐Ÿ”น Dataset 

 

โ€ข Cora dataset 

 

  • ๋‹ค๋ฅธ ๋…ผ๋ฌธ์„ ์ธ์šฉํ•˜๋Š” ์—ฐ๊ฒฐ๊ตฌ์กฐ๋ฅผ ํ‘œํ˜„ํ•œ ๊ฒƒ : Citation Network
  • 2708 ๊ฐœ์˜ ๊ณผํ•™ ๋ถ„์•ผ ๋…ผ๋ฌธ ์ถœ๊ฐ„์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋กœ, ๊ฐ ๋…ผ๋ฌธ์€ 7๊ฐœ class ๋ถ„๋ฅ˜ ์ค‘ ํ•˜๋‚˜์— ์†ํ•œ๋‹ค.
  • 5429 ๊ฐœ์˜ ๋งํฌ (์—ฃ์ง€)๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. 
  • ๊ฐ ๋…ธ๋“œ๋Š” ๋‹จ์–ด์‚ฌ์ „์„ ๊ธฐ๋ฐ˜์œผ๋กœ 0 (ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š์Œ) ํ˜น์€ 1 (ํ•ด๋‹น ๋‹จ์–ด๊ฐ€ ์กด์žฌํ•จ) binary ๊ฐ’์„ ๊ฐ€์ง„ ๋‹จ์–ด๋ฒกํ„ฐ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ๋‹จ์–ด์‚ฌ์ „์€ 1433๊ฐœ์˜ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค ๐Ÿ‘‰ node_features = 1433
  • Main Task : node classification (CrossEntropyLoss - ๋‹ค์ค‘๋ถ„๋ฅ˜)

 

 

โ‘  Data Normalization 

 

 

โ†’ GCN ๊ณผ์ •์—์„œ ๋…ธ๋“œ ์ฐจ์ˆ˜๋กœ ์ •๊ทœํ™” ํ•˜๋Š” ๊ณผ์ • 

 

dataset = Planetoid("/tmp/Cora", name="Cora")
print(f'์ •๊ทœํ™” ์—†์ด ํ–‰๋ ฌ์˜ ๊ฐ ํ–‰์˜ ๊ฐ’ ํ•ฉ์‚ฐ ๊ฒฐ๊ณผ : {dataset[0].x.sum(dim=-1)}')

dataset = Planetoid("/tmp/Cora", name="Cora", transform = T.NormalizeFeatures()) #๐Ÿพ
print(f'์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•ด ํ–‰๋ ฌ์˜ ๊ฐ ํ–‰์˜ ๊ฐ’ ํ•ฉ์‚ฐ ๊ฒฐ๊ณผ : {dataset[0].x.sum(dim=-1)}') # dim = axis

 

 

โ‘ก GCN model architecture 

 

class GCN(torch.nn.Module) : 

  def __init__(self, num_node_features : int,  num_classes : int, hidden_dim : int = 16, dropout_rate : float = 0.5) : 
      super().__init__()
      self.dropout1 = torch.nn.Dropout(dropout_rate) 
      self.conv1 = GCNConv(num_node_features, hidden_dim) 
      # (conv1): GCNConv(1433, 16)
      self.relu = torch.nn.ReLU(inplace = True) 
      self.dropout2 = torch.nn.Dropout(dropout_rate) 
      self.conv2 = GCNConv(hidden_dim, num_classes)
      # (conv2): GCNConv(16, 7)
  
  def forward(self, x : Tensor, edge_index : Tensor) -> torch.Tensor : 
    x = self.dropout1(x) 
    x = self.conv1(x, edge_index) 
    x = self.relu(x) 
    x = self.dropout2(x) 
    x = self.conv2(x, edge_index) 
    return x

hidden units = 16

 

 

โ‘ข Training and Evaluation 

 

def train_step(model : torch.nn.Module, data : Data, optimizer : torch.optim.Optimizer,loss_fn : LossFn) : 
  Tuple[float, float] 
  model.train() 
  optimizer.zero_grad() 
  mask = data.train_mask 
  logits = model(data.x, data.edge_index)[mask] 
  preds = logits.argmax(dim=1) 
  y = data.y[mask] 
  loss = loss_fn(logits, y) 
  acc = (preds == y).sum().item() / y.numel() #โœ” numel : torch tensor ํฌ๊ธฐ๋ฅผ ๋ฐ˜ํ™˜ 
  loss.backward() 
  optimizer.step() 
  return loss.item(), acc



@torch.no_grad() 
def eval_step(model : torch.nn.Module, data : Data, loss_fn : LossFn, stage : Stage) : 
  model.eval() 
  mask = getattr(data, f'{stage}_mask') 
  logits = model(data.x, data.edge_index)[mask] 
  preds = logits.argmax(dim=1) 
  y = data.y[mask] 
  loss = loss_fn(logits, y) 
  acc = (preds == y).sum().item() / y.numel() #โœ”
  return loss.item(),acc

 

โ€ข  optimizer.zero_grad() : ํŒŒ๋ผ๋ฏธํ„ฐ ์ดˆ๊ธฐํ™” 

โ€ข  preds = logits.argmax(dim=1) : ์˜ˆ์ธก๊ฐ’ ์ธ๋ฑ์Šค ๋ฐ˜ํ™˜ ๐Ÿ‘‰ 7๊ฐœ์˜ class ์ธ๋ฑ์Šค ์ค‘ ํ•˜๋‚˜๋ฅผ ๋ฐ˜ํ™˜ 

 

 

โ‘ฃ Train function define and Training 

 

SEED = 42
MAX_EPOCHS = 200
LEARNING_RATE = 0.01
WEIGHT_DECAY = 5e-4
EARLY_STOPPING = 10


def train(model : torch.nn.Module, data : Data, optimizer : torch.optim.Optimizer, 
          loss_fn : LossFn = torch.nn.CrossEntropyLoss(), max_epochs : int = 200,
          early_stopping : int = 10, print_interval : int = 20, verbose : bool = True) :

          history = {'loss':[],'val_loss' : [], 'acc' : [], 'val_acc' : []} 

          for epoch in range(max_epochs) : 
            loss, acc = train_step(model, data, optimizer, loss_fn) 
            val_loss, val_acc = eval_step(model, data, loss_fn, 'val')
            history['loss'].append(loss) 
            history['acc'].append(acc) 
            history["val_loss"].append(val_loss)
            history['val_acc'].append(val_acc) 

            if epoch > early_stopping and val_loss > np.mean(history['val_loss'][-(early_stopping +1) : -1]) : 
              if verbose : 
                print('\n ealry stopping ...') 
              
              break 
            
            if verbose and epoch % print_interval == 0 : 
              print(f'\nEpoch : {epoch} \n----------- ')
              print(f'Train loss : {loss:.4f} | Train acc : {acc:.4f}')
              print(f'Val loss : {val_loss : .4f} | Val acc : {val_acc : .4f}')

          test_loss, test_acc = eval_step(model, data, loss_fn, "test")
          if verbose:
              print(f"\nEpoch: {epoch}\n----------")
              print(f"Train loss: {loss:.4f} | Train acc: {acc:.4f}")
              print(f"  Val loss: {val_loss:.4f} |   Val acc: {val_acc:.4f}")
              print(f" Test loss: {test_loss:.4f} |  Test acc: {test_acc:.4f}")
            
          return history

 

โ€ข  loss function ์ •์˜ ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ •์˜ (max_epoch, early stopping) 

โ€ข  accuracy, loss ์ถœ๋ ฅ ํ•จ์ˆ˜ ์ •์˜ 

 

 

torch.manual_seed(SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = GCN(dataset.num_node_features, dataset.num_classes).to(device) 
data = dataset[0].to(device) 
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE, weight_decay = WEIGHT_DECAY) 
history = train(model, data, optimizer, max_epochs = MAX_EPOCHS, early_stopping = EARLY_STOPPING) 

plt.figure(figsize = (12,4)) 
plot_history(history, 'GCN')

 

epoch 120์—์„œ ์กฐ๊ธฐ์ข…๋ฃŒ

 

 

728x90

'1๏ธโƒฃ AIโ€ขDS > ๐Ÿ“˜ GNN' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[cs224w] Frequent Subgraph Mining with GNNs  (0) 2023.01.27
[cs224w] Theory of Graph Neural Networks  (0) 2023.01.06
[CS224W] Message Passing and Node classification  (0) 2022.11.17
[CS224W] PageRank  (0) 2022.11.02
[CS224W] 1๊ฐ• Machine Learning With Graphs  (0) 2022.10.11

๋Œ“๊ธ€