AWS Glue Now Supports Scala

Mehul Shah, et al, announce that AWS Glue officially supports Scala:

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

Functional languages tend to be very good for ETL tasks, and Scala is a great choice due to its relationship with Spark.

Related Posts

Selecting a List of Columns from Spark

Unmesha SreeVeni shows us how we can create a list of column names in Scala to pass into a Spark DataFrame’s select function: Now our example dataframe is ready.Create a List[String] with column names.scala> var selectExpr : List[String] = List("Type","Item","Price") selectExpr: List[String] = List(Type, Item, Price) Now our list of column names is also created.Lets […]

Read More

Case Classes In Scala

Shubham Dangare explains what case classes are in Scala: Case class is scale way to allow pattern matching on an object without requiring a large amount of boilerplate. All you need to do is add a single case keyword modifier to each class that you want to pattern matching using such modifier makes scala compiler […]

Read More


January 2018
« Dec Feb »