AWS Glue Now Supports Scala

Mehul Shah, et al, announce that AWS Glue officially supports Scala:

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

Functional languages tend to be very good for ETL tasks, and Scala is a great choice due to its relationship with Spark.

Related Posts

The Basics Of Bash: Writing Data

Mark Wilkinson hits us with some basic Bash output management: If you have experience with PowerShell, some properties of Bash variables will feel familiar. In Bash, variables are denoted with a $ just like in PowerShell, but unlike PowerShell the $ is only needed when they are being referenced. When you are assigning a value to a variable, the $ is […]

Read More

Using The Power Query SDK

Chris Webb shows how to build M queries in Visual Studio: Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031