8 Link Analysis
Pagerank—link analysis Early web search: extract keywords-->queries match -->page ranked by occurrences of keywords Term Spam: (TS) Disguise a page as sth it is not about Pagerank: vs TS Rank pages by linkage too How many pages point to a page= # incoming links => importance Link spam (fade incoming pointers) Manually identify some important pages and assign them page ranks , then let ranks propagate to other nodes. Random surfer model Random surfer start from any page follow its outgoing links randomly 来,去,都是随机的 Pagerank = Prob ( a random surfer lands on the page ) More robust than manual approach , a collective voting scheme Limiting distribution Graph is strongly connected , a node can reach any other node in the graph But cannot have dead end & spider traps A--->B, C, D; B--->A, D,(A,D->C); C--->A(->B,C,D); ...