Mirko Kämpf looks at using GraphFrames on Spark:
Next, we’ll define a DataFrame by loading data from a CSV file, which is stored in HDFS.
Our datafile
facebook_combined.txt
contains two columns to represent links between network nodes. The first column is called source (src
), and the second is the destination (dst
) of the link. (Some other systems, such as Gephi, use “source” and “target” instead.)First we define a custom schema, and than we load the DataFrame, using
SQLContext
.
It sounds like Spark graph database engines are early in their lifecycle, but they might already be useful for simple analysis.
Comments closed