Press "Enter" to skip to content

Day: May 18, 2023

Query Snowflake Data from Spark

The Big Data in Real World team crosses data platforms:

If your organization is working with lots of data you might be leveraging Spark to compute distribution. You could also potentially have some or all your data in a Snowflake data warehouse.

In a situation like this, you might have to expose data in Snowflake to the processes that run on Spark. This is made possible using the Spark Connector for Snowflake.

In this post, we will see what is Spark connector for Snowflake and how to use it from Spark to connect to Snowflake and access data from Snowflake in your Spark cluster.

Read on for a high-level architecture of how it works and the configuration you’ll need to do to get it running.

Comments closed

Common Date and Time Operations in R

Steven Sanderson works with dates:

Dates and times are essential components in many programming tasks, and R provides various functions and packages to handle them effectively. In this post, we’ll explore some common operations using both the base R functions and the lubridate package, comparing their simplicity and ease of understanding.

I personally prefer the lubridate style of date operation, but it’s nice to have options.

Comments closed

Importing Code into Polyglot Notebooks

Matt Eland brings some code to the party:

We’ve seen that Polyglot Notebooks allow you to mix together markdown and code (including C# code) in an interactive notebook and these notebooks allow you to share data between cells and between languages. However, frequently in programming you want to reference code that others have written without having to redefine everything yourself.

In this article we’ll explore how Polyglot Notebooks allows you to import dotnet code from stand-alone files, DLLs, and NuGet packages so your notebooks can take advantage of external code files and the same libraries that you can work with from your code in Visual Studio.

The syntax, by the way, is very similar to the F# Interactive (and the short-lived C# Interactive) tool, particularly #i and #r.

Comments closed

Recursive Common Table Expressions in Snowflake

Kevin Wilkie is too fancy for simple joins:

Today, I want to talk about that fun edge case when you’re having to join a table to itself in Snowflake. Does it happen often? Not unless your architect just hates you.

Let’s use the normal pieces of data that everyone uses for this kind of thing – employee/manager relationships. We have our employee table that we’ve been working off that we’ll play with for this example.

The syntax is a bit different from T-SQL, but the concept is still the same.

Comments closed

Encrypting SQL Server Backups

Matthew McGiffen lays out the requirements:

When we talk about protecting our at-rest data, the item that we are likely to be most concerned about is the security of our backups. Backups are generally – and should be – stored off the server itself, and often we will ship copies offsite to a third party where we don’t have control over who can access the data, even if we trust that that will be well managed.

From SQL Server 2014 the product has included the ability to encrypt data while creating a backup. This feature is available in both the standard and enterprise editions of SQL Server, so it is something you can use even when TDE may not be a feature that is available to you.

Click through for a primer on the topic.

Comments closed

Against Keys in Fact Tables

Marc Lelijveld searches for keys under the lamppost:

Another blog post based on recent client experiences. Last week, I visited a client where we had extensive discussions on data model optimization. As you might know, data modeling in Power BI is one of my favorite topics, so I had an excellent day. It’s also not the first time that I blog about anything data modeling and optimization. If you haven’t read it yet, I recommend reading my previous blog on this topic.

This blog will focus on the need of keys in your tables and primarily your fact tables in your data model. I keep running into data models at customers which are flooded with keys in all tables. For each of them you should ask, do I really need this and could I save it in a different data type for further optimization. In this blog, I will further elaborate on keys in your data model, typical use cases and how these cases can be solved in different manners.

Read the whole thing. The really short version is classic Kimball-style advice: keys for dimensions, not for facts. And in Power BI, removing a unique column from a fact table can speed things up by shrinking the compressed fact table size.

Comments closed