Things To Know About Databricks UAP

Kara Annanie has five things you should know about the Databricks Unified Analytics Platform:

4.     A Spark Dataframe is not the same as a Pandas/R Dataframe
Spark Dataframes are specifically designed to use distributed memory to perform operations across a cluster whereas Pandas/R Dataframes can only run on one computer. This means that you need to use a Spark Dataframe to realize the benefits of the cluster when coding in Python or R within Databricks.

This is a nice set of tips for people getting started with Spark and Databricks.

Related Posts

Dropping Database Objects with Aplomb

Pamela Mooney has a two-part series on dropping database objects. Part one includes a big setup script: Some months ago, a fellow DBA came to me and expressed her concern over the normal housecleaning process that occurred at her company.  Developers or product owners would submit changes for objects which were no longer used, only […]

Read More

Contrasting Flink with Kafka Streams

Sourabh Verma contrasts Apache Flink with Kafka Streams: Initially, I would like you all to focus on a few questions before comparing the frameworks:1. Is there any comparison or similarity between Flink and the Kafka?2. What could be better in Flink over the Kafka?3. Is it the problem or system requirement to use one over […]

Read More

Categories

February 2019
MTWTFSS
« Jan Mar »
 123
45678910
11121314151617
18192021222324
25262728