๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
1๏ธโƒฃ AI•DS/๐Ÿ“˜ GNN

[CS224W] PageRank

by isdawell 2022. 11. 2.
728x90

 

1๏ธโƒฃ  PageRank Citation Ranking ๋…ผ๋ฌธ 


 

๐Ÿ”น Summary 

 

The PageRank Citation Ranking.pdf
0.76MB

 

 

•  ์ž์‹ ์˜ ๊ฒ€์ƒ‰์–ด์™€ ๊ฐ€์žฅ ์ž˜ ์–ด์šธ๋ฆฌ๋Š” ํŽ˜์ด์ง€๋ฅผ ์ฐพ๊ธฐ์œ„ํ•œ PageRank 

•  ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์‚ฌ์ดํŠธ๋“ค์ด ์ฐธ์กฐํ–ˆ๋Š”์ง€ + ์ฐธ์กฐํ•œ ์‚ฌ์ดํŠธ๋“ค์˜ ์˜ํ–ฅ๋ ฅ์€ ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ = Relative importance of Web pages 

 

•  PageRank : ์›น ํŽ˜์ด์ง€๋“ค์˜ ์ˆœ์œ„๋ฅผ ๋งค๊ธฐ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ from link structure (backlinks)

โ†ช  helps search engines and users quickly make sense of the vast heterogeneity of the WWW

โ†ช  Applications : Search, browsing, traffic estimation 

 

 

rank ๊ณ„์‚ฐ ๊ณต์‹

 

 

 

โ†ช  ๊ฐ ํŽ˜์ด์ง€์˜ rank ๋Š” ๊ณ ๋ฅด๊ฒŒ ๋ถ„๋ฐฐ๋œ๋‹ค. 

 

 

•  Random surfer model : ์›นํŽ˜์ด์ง€๋ฅผ ํƒ๋ฐฉํ•˜๋Š” ์œ ์ €์˜ ํ–‰๋™์„ random ํ•˜๊ฒŒ ๋ณด๋Š” ๊ฒƒ์œผ๋กœ, loop ์— ๋น ์ง€๊ฑฐ๋‚˜ ๋” ์ด์ƒ ์‚ดํŽด๋ณผ ํŽ˜์ด์ง€๊ฐ€ ์—†์œผ๋ฉด ๋žœ๋คํ•˜๊ฒŒ ๋‹ค๋ฅธ ํŽ˜์ด์ง€๋ฅผ ์ฐพ์•„๋‚˜์„œ๋Š” ํ–‰๋™์„ ๋ชจ๋ธ์— ๋ฐ˜์˜ 

 

 

 

• Personalized PageRank : PageRank๋ฅผ ๊ฐœ์ธํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ž„์˜์˜ ์„œํผ ๋ชจ๋ธ์—์„œ teleport ํ™•๋ฅ ๊ณผ transition ํ™•๋ฅ ์„ ๊ฒฐ์ •ํ•˜๋Š” ๋…ธ๋“œ ๊ฐ€์ค‘์น˜ ๋˜๋Š” ์—ฃ์ง€ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ • (๋…ธ๋“œ ๊ทผ์ ‘์„ฑ์„ ์ด์šฉ) 

 

 

 

 

 

 

2๏ธโƒฃ  ์ฝ”๋“œ ๋ฆฌ๋ทฐ 


 

https://colab.research.google.com/drive/1g6DStJfwq_IhTROWX2zZPTafeuSofJaX?usp=sharing 

 

4์ฃผ์ฐจ ๋ณต์Šต๊ณผ์ œ.ipynb

Colaboratory notebook

colab.research.google.com

 

 

 

 

๐Ÿ”น ๋„˜ํŒŒ์ด ๊ตฌํ˜„ 

 

(1) ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด๋ฅผ ํ†ตํ•ด ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•

 

 ๊ฐ ์‚ฌ์ดํŠธ์—์„œ ๋‹ค๋ฅธ ์‚ฌ์ดํŠธ๋กœ ์ด๋™ํ•  ํ™•๋ฅ ๊ฐ’์„ ๊ฐ€์ง„ ํ–‰๋ ฌ์— ๋Œ€ํ•ด ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ๊ตฌํ•˜๊ณ  ์ฒซ๋ฒˆ์งธ๋กœ ํฐ ๊ณ ์œ ๊ฐ’์— ๋Œ€์‘ํ•˜๋Š” ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ ๊ฐ ์‚ฌ์ดํŠธ์˜ ์ค‘์š”๋„๋ฅผ ๊ตฌํ•œ๋‹ค. 

 

import numpy as np 
L = np.array([
    [0,1/2,1/3,0,0,0], 
    [1/3,0,0,0,1/2,0], 
    [1/3,1/2,0,1,0,1/2],
    [1/3,0,1/3,0,1/2,1/2],
    [0,0,0,0,0,0],
    [0,0,1/3,0,0,0]
]) # 6๊ฐœ์˜ ์‚ฌ์ดํŠธ๋งŒ ์กด์žฌํ•˜๋Š” ๋„คํŠธ์›Œํฌ๋ง์ด๋ผ ๊ฐ€์ • 

# L ์— ๋Œ€ํ•œ eigenvalue, eigenvector ๊ณ„์‚ฐ 
eigen_value , eigen_vector = np.linalg.eig(L)

# eigen value ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ 
order = np.absolute(eigen_value).argsort()[::-1] 

# ์ •๋ ฌ ์ˆœ์„œ์— ๋”ฐ๋ผ ์žฌ์ •๋ ฌ 
eigen_value = eigen_value[order] 
eigen_vector = eigen_vector[:,order] 

# ์ฒซ๋ฒˆ์งธ eigen value ์— ๋Œ€ํ•œ eigenvector ์ถ”์ถœ ๋ฐ ๋น„์ค‘ํ™•์ธ 
r = eigen_vector[:,0]  # 0๋ฒˆ์งธ ์—ด 
print(100*np.real(r/np.sum(r))) ## np.real : ๋ณต์†Œ์ˆ˜ ์ธ์ˆ˜์˜ ์‹ค์ˆ˜๋ถ€๋ฅผ ๋ฐ˜ํ™˜

# ์„ธ๋ฒˆ์งธ ์‚ฌ์ดํŠธ์˜ ์ค‘์š”๋„๊ฐ€ ๊ฐ€์žฅ ๋†’๊ฒŒ ๋‚˜์˜ด

 

 

 

(2) Power iteration ์œผ๋กœ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ• 

 

์ž„์˜ ์ ‘์†์ž๋“ค์ด ์‚ฌ์ดํŠธ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ ‘์†ํ•œ๋‹ค ๊ฐ€์ •ํ•˜๊ณ  ์ฃผ์–ด์ง„ ๊ฒฝ๋กœ๋กœ ์ด๋™ํ•  ํ™•๋ฅ  ํ–‰๋ ฌ์„ ํ†ตํ•ด ์žฌ๊ท€์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฐ ์‚ฌ์ดํŠธ์˜ ์ ‘์†์ž ์ˆ˜๊ฐ€ ํŠน์ • ์ž„๊ณ„์น˜ ์ดํ•˜๋กœ ์ˆ˜๋ ดํ• ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณตํ•ด ์‚ฌ์ดํŠธ๋ณ„ ์ค‘์š”๋„๋ฅผ ๊ทผ์‚ฌํ•œ๋‹ค.

 

  • r' = L*r

 

# 100๋ช… ์ ‘์† ๊ฐ€์ •, ์ดˆ๊ธฐ ์ ‘์†๋ฅ ์€ ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ ๊ตฌ์„ฑ 

r = 100*np.ones(6)/6 
lastR = r 
r = L@r

i = 0 
while np.linalg.norm(lastR - r) > 0.01 : # ์ž„๊ณ„์ ๊นŒ์ง€ ์ˆ˜ํ–‰ 
  lastR = r 
  r = L@r 
  i+=1 

print(f'{str(i)} iterations to convergence') 
print(r)

 

 

 

 

โ€ป ์ฐธ๊ณ  : https://big-dream-world.tistory.com/49 

 

 

๐Ÿ”น networkX ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ตฌํ˜„ 

 

 

pagerank(G, alpha = 0.85, personalization = None, max_iter = 100,tol=1e-06, nstart=None, weight='weight', dangling=None)

 

•  G : ๊ทธ๋ž˜ํ”„ 

•  alpha : Damping parameter (for teleport) 

•  max_iter : Maximum number of iterations in power method eigenvalue solver

•  Personalization : personalized pagerank 

 

 

(1) ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑํ•˜๊ธฐ 

 

G = nx.DiGraph() 

[G.add_node(k) for k in ["A", "B", "C", "D", "E", "F", "G"]]  

G.add_edges_from([
    ('G','A'), ('A','G'),('B','A'),
    ('C','A'),('A','C'),('A','D'),
    ('E','A'),('F','A'),('B','D'),
    ('D','F')
])

 

 

 

(2) pagerank ์‹คํ–‰ 

 

pr1 = nx.pagerank(G) 
print(pr1)

 

{'A': 0.40001520046189115, 'B': 0.021428571428571432, 'C': 0.1347663991011727, 'D': 0.14387354195831553, 'E': 0.021428571428571432, 'F': 0.1437213165203047, 'G': 0.134766399101172}

 

 

 

 

 

(3) dampening ์กฐ์ •ํ•˜๊ธฐ (teleports) 

 

•  alpha

 

pr_09 = nx.pagerank(G, alpha = 0.9) 
pr_08 = nx.pagerank(G, alpha = 0.8) 
res = pd.DataFrame({"alpha=0.9": pr_09, "alpha=0.8": pr_08}) 
res

 

 

 

 

(4) Personal pagerank 

 

•  personalization 

 

pr_e = nx.pagerank(G, alpha = 0.9, personalization = {'E' : 1})  
print(pr_e)

 

{'A': 0.4147475495142858, 'B': 0.0, 'C': 0.12442358835485172, 'D': 0.12442358835485172, 'E': 0.09999999999999998, 'F': 0.11198168542115904, 'G': 0.1244235883548517}

 

 

 

๐Ÿ‘‰ ๋…ธ๋“œ E ๋กœ๋ถ€ํ„ฐ ๋…ธ๋“œ B ๋Š” ์ ‘๊ทผ์ด ์•„์˜ˆ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— E์˜ ๊ด€์ ์—์„œ ์ค‘์š”๋„๊ฐ€ 0์ด ๋จ

 

 

 

(5) ์˜ํ™” ์ถ”์ฒœ์‹œ์Šคํ…œ ๊ตฌํ˜„ ํ”„๋กœ์ ํŠธ 

 

•  Movielens data ์‚ฌ์šฉ (๋ถˆ๋Ÿฌ์˜ค๊ธฐ๋Š” ์ƒ๋žต) 

 

 

 

 

•  ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ ๋ฐ ๋…ธ๋“œ์ถ”๊ฐ€ (์‚ฌ์šฉ์ž, ์˜ํ™”) , ์—ฃ์ง€ ๊ฐ€์ค‘์น˜ (ํ‰์ ) ์ถ”๊ฐ€ 

 

# ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑํ•˜๊ธฐ 

G = nx.Graph() 

# ๋…ธ๋“œ ์ถ”๊ฐ€ํ•˜๊ธฐ 
G.add_nodes_from(rdata.userId, bipartite = 0) 
G.add_nodes_from(rdata.movieId, bipartite = 1) 

# ์—ฃ์ง€ ๊ฐ€์ค‘์น˜ ์ถ”๊ฐ€ํ•˜๊ธฐ 
G.add_weighted_edges_from([(uId, mId, rating) for (uId, mId, rating) in rdata[['userId', 'movieId', 'rating']].to_numpy()])

 

 

•  ํŽ˜์ด์ง€๋žญํฌ ์‹คํ–‰ ๋ฐ ์ƒ์œ„ ์˜ํ™” ์ถœ๋ ฅ 

 

# ํŽ˜์ด์ง€๋žญํฌ ์‹คํ–‰ 
movie_rank = nx.pagerank(G, alpha=0.85)

# ์ƒ์œ„ ์˜ํ™” ์ถœ๋ ฅ 

def return_top_movies(movie_rank) : 
  movie_rank = dict(sorted(movie_rank.items(), key = lambda item : item[1], reverse = True)) 
  top_10_movies = [] 
  for key, value in movie_rank.items() : 
    if 'i' in key : 
      top_10_movies.append(key) 
      if len(top_10_movies) == 10 : 
        break 
  return item_data.loc[top_10_movies]


return_top_movies(movie_rank)

์ธ๊ธฐ์˜ํ™”๋“ค์ด ์ถœ๋ ฅ

 

 

•  Personalized Recommend : 196๋ฒˆ ์œ ์ €์˜ ๊ด€์ ์—์„œ ์ค‘์š”๋„๋ฅผ ๋‹ค์‹œ ์„ ์ •ํ•จ 

 

user_query = 'u196' # 196๋ฒˆ ์œ ์ €์˜ ๊ด€์ ์—์„œ ์ค‘์š”๋„๋ฅผ ๋‹ค์‹œ ์‚ฐ์ • 
recommendation = nx.pagerank(G, alpha=0.85, personalization={user_query:1})
return_top_movies(recommendation)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

728x90

'1๏ธโƒฃ AIโ€ขDS > ๐Ÿ“˜ GNN' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[CS224W] Graph Neural Network  (0) 2022.11.24
[CS224W] Message Passing and Node classification  (0) 2022.11.17
[CS224W] 1๊ฐ• Machine Learning With Graphs  (0) 2022.10.11
[CS224W] NetworkX , pytorch geometric Tutorial  (2) 2022.10.07
Pytorch Geometric Basic code  (1) 2022.09.30

๋Œ“๊ธ€