Press "Enter" to skip to content

Author: Kevin Feasel

Wrapping up a Spark Advent Calendar

Tomaz Kastrun did it: 25 posts in 25 days on Spark. Part 23 looks at Delta Live Tables:

Delta Live Tables is a framework for building reliable, maintainable, and testable data processing pipelines. User defines the transformations to be performed on the datasources and data, and the framework manages all the data engineering tasks: task orchestrations, cluster management, monitoring, data quality, and event error handling.

Delta Live Tables framework helps and manages how data is being transformed with help of target schema and can is a slight different experience with Databricks Tasks (with Apache Spark tasks in the background).

Part 24 takes us through a bit of visualization:

You can use any of the popular Python packages to do the visualisation; Plotly, Dash, Seaborn, Matplotlib, Bokeh, Leather, Glam, to name the couple and many others. Once the data is persisted in dataframe, you can use any of the packages. With the use of PySpark, plugin the Matplotlib. Here is an example

And part 25 wraps things up with links to additional resources:

To wrap up this year’s Advent of Spark 2021 – series of blogposts on Spark – it is essential to look at the list of additional learning resources for you to continue with this journey. Let’s divide this list not by type of the resource (book, on-line documentation, on-line courses, articles, Youtube channels, Discord channels, and others) but rather divide them by language flavour. Scala/Spark, R, and Python.

Great job on Tomaz’s part for gutting it out.

Comments closed

Functional Programming in Python

Mehreen Saeed tries to tempt me into liking Python:

Python is a fantastic programming language. It is likely to be your first choice for developing a machine learning or data science application. Python is interesting because it is a multi-paradigm programming language that can be used for both object-oriented and imperative programming. It has a simple syntax that is easy to read and comprehend.

In computer science and mathematics, the solution of many problems can be more easily and naturally expressed using the functional programming style. In this tutorial, we’ll discuss Python’s support for the functional programming paradigm, and Python’s classes and modules that help you program in this style.

Click through to see how you can write functional-friendly programs in Python.

Comments closed

Hybrid Tables in Power BI

Paul Turley is excited about the latest Power BI update:

The December 2021 Power BI Desktop update introduced a long-awaited upgrade to the partitioning and Incremental Refresh feature set. The update introduces Hybrid Tables, a new Premium feature that combines the advantages of in-memory Import Mode storage with real-time DirectQuery data access; this is a big step forward for large model management and real-time analytic reporting.

Read on to see why.

Comments closed

When Query Store Plan Forcing Fails

Deepthi Goguri explains why that forced query plan isn’t behaving the way you might expect:

One of the advantages of using Query store is the ability to force a plan. Once you force a plan, there may be reasons when the optimizer still chose to create a new execution plan without using the forced plan. During this time, a failure reason is stored in the Query Store. When the forced plan cannot be used, the optimizer goes through the regular compilation and optimization phases to create a new plan but it doesn’t actually fail the query itself. It executes in a normal way with the new plan being created.

But what causes these failures and how to know the reason behind these forced plan failures?

Read on to find out.

Comments closed

Using containerd as a Kubernetes Container Runtime

Anthony Nocentino has a guide for us:

In this post, I’m going to show you how to install containerd as the container runtime in a Kubernetes cluster. I will also cover setting the cgroup driver for containerd to systemd which is the preferred cgroup driver for Kubernetes. In Kubernetes version 1.20 Docker was deprecated and will be removed after 1.22. containerd is a CRI compatible container runtime and is one of the supported options you have as a container runtime in Kubernetes in this post Docker Kubernetes world. I do want to call out that you can use containers created with Docker in containerd. This post was previously published in February of 2021, this is an updated version with the latest installation and configuration steps.

Click through for instructions on installation and configuration.

Comments closed

Spark in Azure Databricks

Tomaz Kastrun starts winding down a series on Apache Spark. Part 22 covers Spark in Azure Databricks:

Azure Databricks is a platform build on top of Spark based analytical engine, that unifies data, data manipulation, analytics and machine learning.

Databricks uses notebooks to tackle all the tasks and is therefore made easy to collaborate. Let’s dig in and start using a Python API on top of Spark API.

Read on for that primer.

Comments closed