λ³Έλ¬Έ λ°”λ‘œκ°€κΈ°
1️⃣ AI•DS/πŸ“˜ GNN

[cs224w] Frequent Subgraph Mining with GNNs

by isdawell 2023. 1. 27.
728x90

 

1️⃣  12κ°• 볡슡 


 

πŸ”Ή Main Topic : Subgraph Mining - Identifying and Counting Motfits in Networks 

 

• Subgraph 와 motifs λŠ” κ·Έλž˜ν”„ ꡬ쑰에 λŒ€ν•œ μΈμ‚¬μ΄νŠΈλ₯Ό μ–»κ²Œ ν•΄μ£ΌλŠ” μš”μ†Œμ΄λ‹€. 

• νŠΉμ • κ·Έλž˜ν”„λ“€ μ‚¬μ΄μ—μ„œ Subgraph Isomorphism 관계에 μžˆλŠ”μ§€ μ˜ˆμΈ‘ν•˜λŠ” Task 에 λŒ€ν•΄ 신경망을 ν™œμš©ν•œλ‹€. 

• Subgraph 관계λ₯Ό νŒλ³„ν•˜λŠ” 방법 : Order embedding 

 

 

πŸ”Ή Subgraph and Motifs

 

• Building Blocks of Networks → Subgraph (=Mofits) λŠ” λ„€νŠΈμ›Œν¬μ˜ function μ΄λ‚˜ behavior κ³Ό 같은 μš”μ†Œλ₯Ό κ²°μ •ν•œλ‹€. 이λ₯Ό 톡해 κ·Έλž˜ν”„κ°€ μ–΄λ–»κ²Œ μž‘λ™ν•˜λŠ”μ§€ μ•Œ 수 있고, κ·Έλž˜ν”„ ν˜•νƒœμ— 따라 μ˜ˆμΈ‘μ„ 진행할 수 μžˆλ‹€. 

 

     πŸ‘‰ νŠΉμ • mofits κ°€ 전체 κ·Έλž˜ν”„μ—μ„œ μ–Όλ§ˆλ‚˜ λΉˆλ²ˆν•˜κ²Œ (recurring) λ‚˜νƒ€λ‚˜λŠ”μ§€, μ–Όλ§ˆλ‚˜ μ€‘μš”ν•œμ§€μ— (significant) 집쀑 

 

 

mofits 은 μš°λ¦¬κ°€ νŠΉμ •ν•˜κ²Œ 보고싢은 subgraph ν˜•νƒœλ₯Ό 의미

 

 

Graph Isomorphism : 두 κ·Έλž˜ν”„κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ, identical ν•œμ§€ νŒλ³„ν•˜λŠ” 것 

 

 

Subgraph Isomorphism : G1 κ³Ό G2 λŠ” ꡬ쑰적으둜 λ™μΌν•œ subgraph 뢀뢄이 μ‘΄μž¬ν•œλ‹€. 

 

 

 Subgraph Frequency : subgraph 의 λΉˆλ„λ₯Ό μ„ΈλŠ” λ°©μ‹μ˜ μ ‘κ·Όμ—λŠ” Graph-level , Node-level, Multiple graphs κ°€ μžˆλ‹€. 

 

   πŸ‘‰ Graph - level 

 

   πŸ‘‰ Node - level 

 

node anchor λ₯Ό κΈ°μ€€μœΌλ‘œ μ‚΄νŽ΄λ΄„

 

   πŸ‘‰ Multiple graphs

 

 

 

 

 Subgraph Significance : subgraph 의 μ€‘μš”λ„λ₯Ό νŒλ‹¨ν•˜κΈ° μœ„ν•΄, λžœλ€ν•˜κ²Œ μƒμ„±λœ κ·Έλž˜ν”„ λ„€νŠΈμ›Œν¬μ—μ„œ mofit λΉˆλ„λ³΄λ‹€ real network 의 mofit λΉˆλ„κ°€ 더 λ§Žμ„ κ²ƒμ΄λΌλŠ” 가정을 λ„μž…ν•΄ μ‹€ν—˜ν•œλ‹€. 

 

   πŸ‘‰ Random graph λ₯Ό μƒμ„±ν•˜λŠ” 방법 : ER, Configuration model, Switching

   πŸ‘‰ μ‹€μ œ κ·Έλž˜ν”„μ™€ λžœλ€ν•˜κ²Œ μƒμ„±λœ κ·Έλž˜ν”„μ—μ„œ 각 motif κ°€ μ–Όλ§ˆλ‚˜ μ€‘μš”λ„λ₯Ό κ°€μ§€λŠ”μ§€ Z-score λ₯Ό μ‚¬μš©ν•΄ ν‰κ°€ν•œλ‹€. 

 

 

 

 

πŸ”Ή Subgraph Matching 

 

• Query κ·Έλž˜ν”„κ°€ Target κ·Έλž˜ν”„μ˜ subgraph isomorphism 인지 ν™•μΈν•˜λŠ” Task → Use GNN and embedding space 

 

β‘  node anchor λ₯Ό μ •μ˜ν•˜μ—¬ μž„λ² λ”©μ„ 비ꡐ 

 

 

 

β‘‘ anchor node 의 μ£Όλ³€ 이웃 λ…Έλ“œλ“€μ„ μ‚΄νŽ΄λ΄„

n-hop 내에 μžˆλŠ” μ΄μ›ƒλ…Έλ“œλ“€μ˜ μž„λ² λ”©μ„ 비ꡐ

 

 

Order embedding Space : μž„λ² λ”© κ³΅κ°„μ—μ„œ 두 κ·Έλž˜ν”„ 쀑 ν•˜λ‚˜κ°€ λ‹€λ₯Έ ν•˜λ‚˜μ˜ μ’Œν•˜λ‹¨μ— μœ„μΉ˜ν•˜λ©΄, μ„œλΈŒκ·Έλž˜ν”„λΌ λ³Ό 수 μžˆλ‹€. 

 

 

  πŸ‘‰ Loss function : max-margin loss 

 

값이 0이면 Gqκ°€ Gt 의 subgraph

 

 

 

πŸ”Ή Finding Frequent subgraphs

 

• 자주 λ“±μž₯ν•˜λŠ” mofit 을 μ°ΎκΈ° → Representation learning 을 톡해 μž„λ² λ”© 곡간 λ‚΄μ—μ„œ GNN 을 ν™œμš© 

 

 

 

 

 

 

 

 

 

2️⃣  gSpan 


 

• algorithm for Mining frequent subgraphs 

• Supports undirected graphs

DFS Lexicographic order : use lexicographic order (사전식 μˆœμ„œ) embedding and explore depth-first search (DFS) method 

 

 

(b) and (d) are isomorphic

 

 

DFS 방법을 톡해 subgraph λ₯Ό μ°ΎλŠ” κ³Όμ •

 

 

• Using chemical compound datasets, it finds the common chemical compound substructures 

 

이전에 μ œμ•ˆλœ FSG λͺ¨λΈλ³΄λ‹€, gSpan 이 훨씬 더 λΉ λ₯Έ 속도λ₯Ό 보이고 있음

 

 

728x90

'1️⃣ AIβ€’DS > πŸ“˜ GNN' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€

[cs224w] Theory of Graph Neural Networks  (0) 2023.01.06
[CS224W] Graph Neural Network  (0) 2022.11.24
[CS224W] Message Passing and Node classification  (0) 2022.11.17
[CS224W] PageRank  (0) 2022.11.02
[CS224W] 1κ°• Machine Learning With Graphs  (0) 2022.10.11

λŒ“κΈ€