Next, we’ll define a DataFrame by loading data from a CSV file, which is stored in HDFS.
facebook_combined.txtcontains two columns to represent links between network nodes. The first column is called source (
src), and the second is the destination (
dst) of the link. (Some other systems, such as Gephi, use “source” and “target” instead.)
First we define a custom schema, and than we load the DataFrame, using
It sounds like Spark graph database engines are early in their lifecycle, but they might already be useful for simple analysis.