Working with Columns in Spark

Achilleus has a two-parter on working with columns in Spark. Part 1 covers some of the basic syntax and several functions:

Also, we can have typed columns which is basically a column with an expression encoder specified for the expected input and return type.

scala> val name = $"name".as[String]
name: org.apache.spark.sql.TypedColumn[Any,String] = name
scala> val name = $"name"
name: org.apache.spark.sql.ColumnName = name

There are more than 50 methods(67 the last time I counted ) that can be used for transformations on the column object. We will be covering some of the important methods that are generally used.

Part 2 covers other functions including window functions:

17) over
This is one of the most important function that is used in many of the window operations.We can talk about the window function in detail when discuss about aggregation in spark but for now, it will be fair enough to say that over method provides a way to apply an aggregation over a window specification which in turn can be used to specify partition, order and frame boundaries of the aggregation.

Check out both of these posts for useful tidbits.

Related Posts

Comparing Performance: HBase1 vs HBase2

Surbhi Kochhar takes us through performance improvements between HBase version 1 and HBase version 2: We are loading the YCSB dataset with 1000,000,000 records with each record 1KB in size, creating total 1TB of data. After loading, we wait for all compaction operations to finish before starting workload test. Each workload tested was run 3 […]

Read More

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark: When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit […]

Read More

Categories

March 2019
MTWTFSS
« Feb Apr »
 123
45678910
11121314151617
18192021222324
25262728293031