Graph Analytics With Spark

Mirko Kämpf looks at using GraphFrames on Spark:

Next, we’ll define a DataFrame by loading data from a CSV file, which is stored in HDFS.

Our datafile facebook_combined.txt contains two columns to represent links between network nodes. The first column is called source (src), and the second is the destination (dst) of the link. (Some other systems, such as Gephi, use “source” and “target” instead.)

First we define a custom schema, and than we load the DataFrame, using SQLContext.

It sounds like Spark graph database engines are early in their lifecycle, but they might already be useful for simple analysis.

Related Posts

Databricks Runtime 5.2 Released

Nakul Jamadagni announces Databricks Runtime 5.2: Delta Time TravelTime Travel, released as an Experimental feature, adds the ability to query a snapshot of a table using a timestamp string or a version, using SQL syntax as well as DataFrameReader options for timestamp expressions.Sample codeSELECT count() FROM events TIMESTAMP AS OF timestamp_expressionSELECT count() FROM events VERSION AS OF version Time travel looks a bit like temporal tables in SQL Server.

Read More

Kafka And The Differing Aims Of Data Professionals

Kai Waehner argues that there is an impedence mismatch between data engineers, data scientists, and ML production engineers: Data scientists love Python, period. Therefore, the majority of machine learning/deep learning frameworks focus on Python APIs. Both the stablest and most cutting edge APIs, as well as the majority of examples and tutorials use Python APIs. […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31