Things To Know About Databricks UAP

Kara Annanie has five things you should know about the Databricks Unified Analytics Platform:

4.     A Spark Dataframe is not the same as a Pandas/R Dataframe
Spark Dataframes are specifically designed to use distributed memory to perform operations across a cluster whereas Pandas/R Dataframes can only run on one computer. This means that you need to use a Spark Dataframe to realize the benefits of the cluster when coding in Python or R within Databricks.

This is a nice set of tips for people getting started with Spark and Databricks.

Related Posts

Managing Database Changes with Scripts

Ed Elliott continues a series on automated database deployments with a quick change process: In my blog post here https://the.agilesql.club/blogs/ed-elliott/2019-06-10/steps-to-automated-database-deployments I described the steps you need to go through so you can build up your confidence that you are capable of deploying databases using automation. I mean, afterall, knowing that it is possible to automate your deployments […]

Read More

Analyzing Data by the Numbers

I am close to wrapping up my series on forensic accounting techniques: Round number analysis focuses on the final digits of a number, specifically looking at how many 0s there are at the end after rounding off the decimal. We’re interested in the percentage of records in each rank (or Type). For example, $10,820 is […]

Read More

Categories

February 2019
MTWTFSS
« Jan Mar »
 123
45678910
11121314151617
18192021222324
25262728