Press "Enter" to skip to content

Month: May 2023

Analyzing Big-O Notation in Polyglot Notebooks

Matt Eland brings me back to college:

Polyglot Notebooks is a great way of running interactive code experiments mixed together with rich markdown documentation.

In this short article I want to introduce you to the #!time magic command and show you how you can easily measure the execution time of a block of code.

This can be helpful for understanding the rough performance characteristics of a block of code inside of your Polyglot Notebook.

In fact, we’ll use this to explore the programming concepts behind Big O notation and how code performance changes based on the number of items.

I like this for two reasons. First, because a visual indicator of Big-O notation is helpful for students learning about the topic. Second, because that’s not the only thing you can do with the #time magic.

Comments closed

Managing Database Privileges in Postgres

Ryan Booz limits database access:

We discussed how this can be done manually with a GRANT command each time an object is created, however, that is time consuming to manage and easy to miss a detail.

Instead, PostgreSQL provides a method for setting default privileges which are granted on behalf of the object owner as database objects are created. Using default privileges, a role can prepare the database ahead of time to ensure that consistent access privileges are applied while easing the management burden over time.

But how do you go about creating a set of roles and default privileges that will provide the right level of control and access? Let’s dig a little deeper.

RBAC on groups is definitely the way to go, and Ryan shows us how.

Comments closed

ALL and ALLSelected in DAX

Reza Rad compares and contrasts:

These two functions in DAX are often used instead of each other; ALL and ALLSELECTED. Their behavior can be similar in some contexts, but it can also be different in other contexts. In this article and video, I’ll explain the difference between these two functions and when to use each in DAX for Power BI, Analysis Services, or Power Pivot.

Reza has a video as well as a blog post to describe the differences.

Comments closed

The Myth of the DBA-Free Cloud

Matthew McGiffen lays out an explanation:

I was chatting with a cloud consultant who was advising on a large scale migration to AWS. He told me that one of the advantages of going for a PaaS offering (Platform as a Service) was that DBAs were no longer required as backups and restores were handled for you. PaaS services for SQL Server include AWS RDS and Azure SQL Database or Azure SQL Managed Instance.

I found it quite a funny conversation, partly as I don’t think he realised being a DBA was part of my job role, but also because I don’t know a single DBA who spends a significant amount of their time doing backups and restores.

I still remember (through others—I wasn’t in this space yet) the advertising campaign that SQL Server 2005 would completely eliminate the need for a DBA because everything would just work on its own, even sweet database tuning using the Database Tuning Advisor. The same thing applies today: even those DBA-free databases eventually need somebody to optimize them along various dimensions, ensure they are running smoothly, and correct issues if they are not. Perhaps we could call this role the Administrator of a Database or AoD, so as not to scare the DBA-free database vendors. “No, we don’t have DBAs—we just need you to have a few AoDs on staff.”

Comments closed

Query Snowflake Data from Spark

The Big Data in Real World team crosses data platforms:

If your organization is working with lots of data you might be leveraging Spark to compute distribution. You could also potentially have some or all your data in a Snowflake data warehouse.

In a situation like this, you might have to expose data in Snowflake to the processes that run on Spark. This is made possible using the Spark Connector for Snowflake.

In this post, we will see what is Spark connector for Snowflake and how to use it from Spark to connect to Snowflake and access data from Snowflake in your Spark cluster.

Read on for a high-level architecture of how it works and the configuration you’ll need to do to get it running.

Comments closed

Common Date and Time Operations in R

Steven Sanderson works with dates:

Dates and times are essential components in many programming tasks, and R provides various functions and packages to handle them effectively. In this post, we’ll explore some common operations using both the base R functions and the lubridate package, comparing their simplicity and ease of understanding.

I personally prefer the lubridate style of date operation, but it’s nice to have options.

Comments closed

Importing Code into Polyglot Notebooks

Matt Eland brings some code to the party:

We’ve seen that Polyglot Notebooks allow you to mix together markdown and code (including C# code) in an interactive notebook and these notebooks allow you to share data between cells and between languages. However, frequently in programming you want to reference code that others have written without having to redefine everything yourself.

In this article we’ll explore how Polyglot Notebooks allows you to import dotnet code from stand-alone files, DLLs, and NuGet packages so your notebooks can take advantage of external code files and the same libraries that you can work with from your code in Visual Studio.

The syntax, by the way, is very similar to the F# Interactive (and the short-lived C# Interactive) tool, particularly #i and #r.

Comments closed

Recursive Common Table Expressions in Snowflake

Kevin Wilkie is too fancy for simple joins:

Today, I want to talk about that fun edge case when you’re having to join a table to itself in Snowflake. Does it happen often? Not unless your architect just hates you.

Let’s use the normal pieces of data that everyone uses for this kind of thing – employee/manager relationships. We have our employee table that we’ve been working off that we’ll play with for this example.

The syntax is a bit different from T-SQL, but the concept is still the same.

Comments closed