Community detection

Using library you can easily use state-of-the-art methods for community detection.

SCAN (PSCAN)

Implementation is based on [Zhao]. PSCAN bject implements the whole logic of algorithm. Method computeConnectedComponents(<graph>,<epsilon>), takes two parameters:

  • graph - on with algorithm will be executed
  • \(\epsilon\) - used for graph pruning based on similarity measure of edges.

Mentioned similarity is computed as follows:

\(sim(v,u)=\frac{|N(v)\cap{} N(u)|}{\sqrt{|N(v)||N(u)|}}\)

where \(N(v)\) is neighbours set of vertex \(v\) . Edeges with similarity lower than \(\epsilon\) (\(sim(v,u)<\epsilon\)) are removed from graph before main part of community detection.

Main part is based on label propagation and is implemented using apropriate data structures and PREGEL operator

import ml.sparkling.graph.operators.OperatorsDSL._
import ml.sparkling.graph.operators.algorithms.pscan.PSCAN.ComponentID
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph

implicit ctx:SparkContext=???
// initialize your SparkContext as implicit value
val graph =???
// load your graph (for example using Graph loading API)

val components: Graph[ComponentID, Int] = PSCAN.computeConnectedComponents(graph)
// Graph where each vertex is associated with its component identifier

You can also use more readable method using DSL

import ml.sparkling.graph.operators.OperatorsDSL._
import ml.sparkling.graph.operators.algorithms.pscan.PSCAN.ComponentID
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph

implicit ctx:SparkContext=???
// initialize your SparkContext as implicit value
val graph =???
// load your graph (for example using Graph loading API)

val components: Graph[ComponentID, Int] = graph.PSCAN(epsilon=0.5)
// Graph where each vertex is associated with its component identifier

References:

[Zhao]Zhao, W., Martha, V., & Xu, X. (2013, March). PSCAN: a parallel Structural clustering algorithm for big networks in MapReduce. In Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on (pp. 862-869). IEEE. PDF