π μλ λΈλ‘κ·Έμ μ½λμ Pytorch Geometric λΌμ΄λΈλ¬λ¦¬ μ€λͺ μ μ°Έκ³ ν΄ κ³΅λΆνμ΅λλ€.
https://baeseongsu.github.io/posts/pytorch-geometric-introduction/
https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#
1οΈβ£ Data Handling of Graphs
πΉ κ·Έλν
• Node μ μ΄λ₯Ό μ°κ²°νλ Edge λ₯Ό νλλ‘ λͺ¨μ μλ£κ΅¬μ‘°
• G = (V,E)
πΉ Pytorch Geometric
• νλμ κ·Έλν → torch_geometric.data.Data λΌλ ν΄λμ€λ‘ νν
• μμ±
data.x | • λ
Έλ νΉμ§ νλ ¬ • [num_nodes, num_node_features] |
data.edge_index | • κ·Έλνμ μ°κ²°μ± • [2, num_edges] |
data.edge_attr | • μ£μ§ νΉμ§ νλ ¬ • [num_deges, num_edge_features] |
data.y | • νμ΅νκ³ μΆμ λμ (target) • κ·Έλν λ 벨 → [num_nodes, *] • λ Έλ λ 벨 → [1, *] |
data.pos | • λ
Έλ μμΉ νλ ¬ • [num_nodes, num_dimensions] |
→ μμ μμ±λ€μ μ΅μ μΌλ‘ ꡬμ±νκ³ μΆμ μμ±μ μ νν΄ λ€μνκ² λͺ¨λΈλ§μ΄ κ°λ₯νλ€.
β κ·Έλν λ°μ΄ν° μμ± μ½λ μμ
import torch
from torch_geometric.data import Data
edge_index = torch.tensor([[0,1,1,2],
[1,0,2,1]], dtype = torch.long) # (2,4) ν¬κΈ°μ νλ ¬ : 4κ°μ μ£μ§
x = torch.tensor([[-1],[0],[1]], dtype = torch.float) # (3,1) ν¬κΈ°μ νλ ¬ : 3κ°μ λ
Έλ
data = Data(x=x, edge_index = edge_index)
β edge_index : (2,4) ν¬κΈ°μ νλ ¬ → 4κ°μ μ£μ§λ€ (μλ°©ν₯ κ·Έλν)
β x : (3,1) ν¬κΈ°μ νλ ¬ → 3κ°μ λ Έλμ κ° λ Έλλ λ¨μΌκ°μ κ°μ§
β‘ κ·Έλν λ°μ΄ν° μμ± μ½λ μμ : μ£μ§λ₯Ό λ Έλμ μμμμΌλ‘ λνλΈ κ²½μ°
- (v1, v2) μ κ°μ μλ£ν ꡬ쑰 ννλ‘ μ λ ₯ν κ²½μ° contiguous() λ₯Ό μ¬μ©ν΄ νννλ€.
# μ£μ§λ₯Ό λ
Έλμ μμμμΌλ‘ ννν κ²½μ°
edge_index = torch.tensor([[0,1],
[1,0],
[1,2],
[2,1]], dtype = torch.long)
x = torch.tensor([[-1],[0],[1]], dtype = torch.float)
data = Data(x=x, edge_index = edge_index.t().contiguous()) # π‘
• ν¨μ
data.keys | • ν΄λΉ μμ± μ΄λ¦ |
data.num_nodes | • λ Έλμ μ΄ κ°μ |
data.num_edges | • μ£μ§μ μ΄ κ°μ |
data.contains_isolated_nodes() | • κ³ λ¦½ λ Έλ μ¬λΆ νμΈ |
data.contains_self_loops() | • μ
ν 루ν ν¬ν¨ μ¬λΆ νμΈ |
data.is_directed() | • κ·Έλνμ λ°©ν₯μ± μ¬λΆ νμΈ |
# ν¨μ
print(data.keys) # ν΄λΉ μμ± μ΄λ¦
π ['edge_index', 'x']
print(data['x']) # λ
Έλ κ°
π tensor([[-1.],
[ 0.],
[ 1.]])
for key, item in data :
print(f'{key} found in data')
print(f'{item} found in data')
print()
π κ²°κ³Ό
x found in data
tensor([[-1.],
[ 0.],
[ 1.]]) found in data
edge_index found in data
tensor([[0, 1, 1, 2],
[1, 0, 2, 1]]) found in data
'edge_attr' in data # μ£μ§ νΉμ§ νλ ¬μ΄ μλ - μλ€.
π False
data.num_nodes # λ
Έλμ κ°μ
π 3
data.num_edges # μ£μ§μ κ°μ
π 4
data.num_node_features # λ
Έλ νΉμ±μ κ°μ
π 1
data.has_isolated_nodes() # κ³ λ¦½ λ
Έλ μλμ§ μ¬λΆ
π False
data.has_self_loops() # μκΈ° μμ μΌλ‘ νμ΄νκ° λμμ€λ λ
Έλκ° μλμ§ μ¬λΆ
π False
data.is_directed() # λ¨λ°©ν₯ κ·ΈλνμΈμ§ μ¬λΆ
π False
# Transfer data object to GPU.
device = torch.device('cuda')
data = data.to(device) # GPU μ¬μ©μΌλ‘ λ³κ²½
2οΈβ£ Common Benchmark Datasets
πΉ λ°μ΄ν°μ
• PyTorch Geometric μ λ€μν κ³΅ν΅ λ²€μΉλ§ν¬ λ°μ΄ν°μ μ ν¬ν¨νλ€.
• κ° λ°μ΄ν°μ λ§λ€ κ·Έλν λ°μ΄ν° μμ±μ΄ λ¬λΌ, μ¬μ©λλ ν¨μκ° λ€λ₯Ό μ μλ€.
https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html
→ λ°μ΄ν° λͺ©λ‘
πΉ ENZYMES λ°μ΄ν°μ μμ
• ν¨μ λ°μ΄ν°λ² μ΄μ€μμ μ»μ 600κ°μ λ¨λ°±μ§ 3μ°¨ ꡬ쑰 λ°μ΄ν°μ : ν¨μμ λͺ λͺ λ²κ³Ό κ΄λ ¨λ μ λ³΄κ° μ μ₯λμ΄ μλ€.
• 6κ°μ ν¨μμ κ΄λ ¨λ λ°μ΄ν°κ° ν¬ν¨λμ΄ μλ€.
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root = '/tmp/ENZYMES', name = 'ENZYMES')
dataset
print(len(dataset)) # κ·Έλν κ°μ
print(dataset.num_classes) # κ·Έλν ν΄λμ€ μ
print(dataset.num_node_features) # λ
Έλμ νΉμ§ μ
600
6
3
→ 6μ’ λ₯μ ν΄λμ€λ₯Ό κ°μ§ 600κ°μ κ·Έλν
• μΈλ±μ€ μ¬λΌμ΄μ±μ ν΅ν΄ λ°μ΄ν° νμΈ
data = dataset[0] # μΈλ±μ€ μ¬λΌμ΄μ±μ ν΅ν΄ λ°μ΄ν° νμΈνκΈ°
print(data)
π κ²°κ³Ό
Data(edge_index=[2, 168], x=[37, 3], y=[1])
# edge_index=[2, 168] : 84 κ°μ μ£μ§ (2 : μλ°©ν₯, μ£μ§ κ°μ : 84κ°)
# x=[37, 3] : 37κ°μ λ
Έλμ 3κ°μ λ
Έλ νΉμ± (νλμ λ
Έλκ° 3κ°μ κ°μ κ°μ§)
# y=[1] : κ·Έλν λ 벨 νκ²
β edge_index=[2, 168] : 84 κ°μ μ£μ§ (2 : μλ°©ν₯, μ£μ§ κ°μ : 84κ°)
β x=[37, 3] : 37κ°μ λ Έλμ 3κ°μ λ Έλ νΉμ± (νλμ λ Έλκ° 3κ°μ κ°μ κ°μ§)
β y=[1] : κ·Έλν λ 벨 νκ²
data.is_undirected() # μλ°©ν₯ κ·ΈλνμΈκ°μ - λ€
True
train_dataset = dataset[:540]
test_dataset = dataset[540:]
print(train_dataset)
print(test_dataset)
π κ²°κ³Ό
ENZYMES(540)
ENZYMES(60)
dataset = dataset.shuffle() # λ°μ΄ν°μ
μ
ν
print(dataset)
π κ²°κ³Ό
ENZYMES(600)
πΉ Cora λ°μ΄ν°μ μμ
• 2708 κ°μ κ³Όν λ Όλ¬Έλ€λ‘ ꡬμ±λ λ°μ΄ν°μ
• λ Όλ¬Έμ λ€λ₯Έ λ Όλ¬Έλ€μ μΈμ©νκΈ°λ νλλ°, μ΄ μ°κ²°κ΅¬μ‘°λ₯Ό ννν κ²μ΄ λ°λ‘ Citation Network μ΄λ€.
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root = 'tmp/Cora', name = 'Cora')
print(len(dataset)) # λ°μ΄ν°μ
μ μ²΄κ° νλμ κ·Έλνμ
1
print(dataset.num_classes) # 7κ°μ ν΄λμ€
7
print(dataset.num_node_features) # 1433κ°μ λ
Έλ νΉμ± (νλμ λ
Έλμ 1433κ°μ κ°μ΄ μ‘΄μ¬)
1433
β λ°μ΄ν°μ μ μ²΄κ° νλμ κ·Έλνμ΄κ³ , 7κ°μ λ Έλ ν΄λμ€κ° μ‘΄μ¬νλ©°, νλμ λ Έλμ 1433κ°μ κ°μ΄ μ μ₯λμ΄ μλ ꡬ쑰
data = dataset[0]
data
π κ²°κ³Ό
Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])
β edge_index=[2, 10556] : 5,278 κ°μ μ£μ§ (2 : μλ°©ν₯, μ£μ§ κ°μ : 5,278κ°)
β x=[2708, 1433] : 2708κ°μ λ Έλμ 1433κ°μ λ Έλ νΉμ± (νλμ λ Έλκ° 1433κ°μ κ°μ κ°μ§)
β y=[2708] : κ·Έλν λ 벨 νκ²
print(data.is_undirected()) # μλ°©ν₯ κ·Έλν
True
data.train_mask.sum().item() # νμ΅νκΈ° μν΄ μ¬μ©νλ λ
Έλ
140
data.val_mask.sum().item() # κ²μ¦ μ μ¬μ©νλ λ
Έλ
500
data.test_mask.sum().item() # ν
μ€νΈ μ μ¬μ©νλ λ
Έλ
1000
3οΈβ£ Mini-batches
πΉ λ°°μΉ λ¨μμ νμ΅
• PyTorch Geometric μ sparse block diagonal adjacency matrices λ₯Ό ν΅ν΄ λ―Έλλ°°μΉ ννλ‘ λ§λ€κ³ , λ³λ ¬νμ²λ¦¬λ₯Ό μννλ€.
• feature νλ ¬κ³Ό target νλ ¬λ λ Έλ κΈ°μ€μΌλ‘ λμΌν ννλ‘ κ΅¬μ±ν΄μΌ νλ€.
• torch_geometric.data.DataLoader λ₯Ό ν΅ν΄ λ°°μΉ λ¨μλ‘ λ°μ΄ν°λ₯Ό μ²λ¦¬
from torch_geometric.datasets import TUDataset
from torch_geometric.data import DataLoader
dataset = TUDataset(root = '/tmp/ENZYMES', name = 'ENZYMES',use_node_attr=True)
loader = DataLoader(dataset, batch_size = 32, shuffle = True) # β
for batch in loader :
print(batch)
print(batch.num_graphs)
...
• batch = [984] : 984 κ°μ λ Έλμ λν΄ 32κ°μ λ°°μΉλ₯Ό λΆμ¬
4οΈβ£ Data Transforms
πΉ λ°μ΄ν° λ³ν
• torch_geometric.transforms λ‘ λ°μ΄ν° λ³νμ μμ½κ² ν μ μλ€.
• torch_geometric.transforms.Compose λ₯Ό ν΅ν΄ λ€μν λ³νν¨μλ€μ μμ½κ² ꡬμ±ν μ μλ€.
πΉ ShapeNet dataset μ λ°μ΄ν° λ³νμ μ μ©ν μμ
• 17,000 건μ 3D ννμ μ κ΅¬λ¦ (point clouds) λ°μ΄ν°λ₯Ό ν¬ν¨νκ³ μμΌλ©° μ΄ 16κ°μ μΉ΄ν κ³ λ¦¬λ‘ κ΅¬μ±λμ΄ μλ€.
from torch_geometric.datasets import ShapeNet
dataset = ShapeNet(root = '/tmp/ShapeNet', categories = ['Airplane'])
dataset[0]
π κ²°κ³Ό
Data(x=[2518, 3], y=[2518], pos=[2518, 3], category=[1])
β pos = [2518, 3] : 2518 κ°μ μ λ°μ΄ν°μ 3μ°¨μ
β edge_index κ° μμ → μ°κ²°κ΄κ³κ° μλ λ°μ΄ν°
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet
# κ·Έλν λ³ν
dataset = ShapeNet(root = '/tmp/ShapeNet', categories = ['Airplane'],
pre_transform = T.KNNGraph(k=6),
transform = T.RandomTranslate(0.01))
β pre_transform = T.KNNGraph(k=6) : KNN μΌλ‘ λ°μ΄ν°λ₯Ό κ·Έλν ννλ‘ λ³ννλ€. μλμ μΆλ ₯ κ²°κ³Όλ₯Ό 보면 edge_index κ° μΆκ°λ κ²μ λ³Ό μ μλ€. (μ°κ²°μνκ° μμ±λ¨)
β transform = T.RandomTranslate(0.01) : κ° λ Έλμ μμΉλ₯Ό μ‘°κΈ μ΄λμν¨λ€.
dataset[0]
π κ²°κ³Ό
Data(x=[2518, 3], y=[2518], pos=[2518, 3], category=[1])
5οΈβ£ Learning Methods on Graphs
πΉ κ·Έλνλ‘ νμ΅νκΈ°
• κ·Έλν λ°μ΄ν° νΈλ€λ§, dataloader μμ±, transforms λ₯Ό ν΅ν΄ λ°μ΄ν° λ³ν π μ΄μ κ·Έλνλ₯Ό νμ΅μμΌλ³΄μ
πΉ GCN layer λ₯Ό ꡬμ±νμ¬ Cora λ°μ΄ν°μ μ μ μ©νλ μμ
• Task : Graph node classification (λ Έλ λΆλ₯ λ¬Έμ ) : λ Όλ¬Έ λ΄ λ±μ₯ν λ¨μ΄λ€κ³Ό μΈμ© κ΄κ³λ§μ ν΅νμ¬ μ΄λ€ μ’ λ₯μ λ Όλ¬ΈμΈμ§ λ§νλ λ¬Έμ
(1) λ°μ΄ν° λ€μ΄λ‘λ
from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='/tmp/Cora', name='Cora')
• Citation Network π Node = λ Όλ¬Έ , Edge = μΈμ©κ΄κ³
• λ Όλ¬Έμμ λ±μ₯νλ 1433κ°μ νΉμ λ¨μ΄λ₯Ό λͺ¨μ λ¨μ΄ μ¬μ μΌλ‘ λ§λ€κ³ , λ Όλ¬Έλ§λ€ λ¨μ΄λ€μ λ±μ₯ μ¬λΆλ₯Ό feature vector λ‘ λ§λ€μ΄ λ Έλμ νΉμ§μ λ§λ€μ΄μ€λ€.
(2) GNN μμ±νκΈ°
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class Net(torch.nn.Module) :
def __init__(self) :
super(Net, self).__init__()
self.conv1 = GCNConv(dataset.num_node_features, 16)
self.conv2 = GCNConv(16, dataset.num_classes)
# 2κ°μ GCNConv layer
def forward(self, data) :
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training = self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
(3) νμ΅νκΈ°
# GPU μ€μ
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)
data = dataset[0].to(device)
# μ΅ν°λ§μ΄μ μμ±
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01, weight_decay = 5e-4)
# νλ ¨
model.train() # νμ΅ μ€λΉ
for epoch in range(200) :
optimizer.zero_grad() # νλΌλ―Έν° μ΄κΈ°ν
out = model(data) # μμΈ‘κ°
loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask]) # μμ€ν¨μ κ³μ°
loss.backward() # μμ ν
optimizer.step() # νλΌλ―Έν° μ
λ°μ΄ν°
(3) ν μ€νΈ λ°μ΄ν°λ‘ λͺ¨λΈ νκ°
# λͺ¨λΈ νκ°
model.eval()
_, pred = model(data).max(dim=1)
correct = float (pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / data.test_mask.sum().item()
print('μ νλ : {:.4f}'.format(acc))
https://colab.research.google.com/drive/1HDtOE5sZUPvA93ZT-OzXHuGgrSeU7hNk?usp=sharing
'1οΈβ£ AIβ’DS > π GNN' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[CS224W] Graph Neural Network (0) | 2022.11.24 |
---|---|
[CS224W] Message Passing and Node classification (0) | 2022.11.17 |
[CS224W] PageRank (0) | 2022.11.02 |
[CS224W] 1κ° Machine Learning With Graphs (0) | 2022.10.11 |
[CS224W] NetworkX , pytorch geometric Tutorial (2) | 2022.10.07 |
λκΈ