Press "Enter" to skip to content

Month: September 2023

Creating a Function in Snowflake

Kevin Wilkie creates a function:

Sometimes, you’ll need to create functions that do a particular thing. They’re not always pretty. Some of them look like they should have been thrown out with the bathwater.

Unlike SQL Server, user-defined functions in Snowflake can be done in several different languages. They can be done in Java, JavaScript, Python, Scala, or plain-old SQL.

Which means that Postgres users will be a bit more comfortable here than SQL Server users.

Leave a Comment

Choosing a Data Serialization Format

Rathish Kumar says more than “JSON and Parquet”:

In the world of software, we often work with different types of data like lists, tables, and more. These data structures are designed to be fast and efficient when our computer programs use them. However, sometimes we need to move this data out of our computer’s memory, like when we want to save it to a file or send it over the internet. To do this, we have to change the data into a special format made up of 0s and 1s, which is quite different from data structures. This process is what we call encoding or serialization. 

In this article, we’ll explore the world of encoding and decoding, which is the reverse process of turning that special format back into usable data. We’ll also take a look at different ways to do encoding and decoding, as well as important things to think about when choosing the right method for your software projects.

Sadly, ORC (Optimized Row Columnar) doesn’t make the cut, as Parquet ended up taking over that market.

Leave a Comment

A Primer on Database Constraints in MySQL

Robert Sheldon creates some keys:

MySQL supports six basic types of constraints for ensuring data integrity: PRIMARY KEYNOT NULLDEFAULTCHECKUNIQUE, and FOREIGN KEY. In this article, I introduce you to each constraint type and provide examples for how they work. The examples include a series of CREATE TABLE statements that demonstrate different ways to incorporate constraints into your table definitions. If you’re not familiar with the CREATE TABLE statement or how to create tables in a MySQL database, refer to an earlier article in this series that introduces you to the statement.

In short, they support the same set that SQL Server users are used to. But do read on to see the nuances behind each of these.

Leave a Comment

Don’t Try These with SQL MI and Private Endpoints

Zoran Rilak wraps up a series on Azure SQL Managed Instance and its support for private endpoints:

The first two installments of this mini-series discussed a couple of basic and advanced scenarios involving private endpoints. Today we’ll look at some ways private endpoints cannot be used to implement scenarios where one might expect otherwise.

Read on for four of these in total, laying out things you cannot do via private endpoint to a SQL Managed Instance. In fairness, Zoran also provides what I would consider reasonable work-arounds for each of those: have a VM jumpbox in the same virtual network for DAC connections, peer your virtual networks for replication, and so on.

Leave a Comment

Appropriate Uses of Jitter in Graphs

Steven Sanderson shakes things up:

As an R programmer, one of the most useful functions to know is the jitter function. The jitter function is used to add random noise to a numeric vector, which can be helpful when visualizing data in a scatterplot. By using the jitter function, we can get a better picture of the true underlying relationship between two variables in a dataset.

Read on to get an idea of how to use jitter, though I recommend making it very clear to chart viewers that you are, in fact, using jitter, as it can be easy to misinterpret the jitter as actual value locations.

Leave a Comment

Building a Weierstrass Function in R

Tomaz Kastrun won’t let you take a derivative:

Coming from the simple sine function (remember of Fourier series), German mathematician Karl Weierstrass became the first to publish an example of a continuous, nowhere
differentiable function
. Weierstrass function (originally defined as a Fourier series) was the first instance in which the idea that a continuous function must be differentiable was introduced. This is an example of a fractal in a function (known as a fractal function) and also another of pathological functions (runs counter to some intuition).

Click through for an example of this in R.

Leave a Comment

Troubleshooting a Downed SQL Server

Kevin Hill takes us through an issue:

This is a stand-alone bare metal server located at a hosting facility with some local storage as well as NAS storage.  SQL Server 2016 Standard Edition, 8 cores, 128GB RAM.  The primary database is nearing 1TB and is has multiple filegroups (Active, Archive, Indexes, etc.).   This database and apps that touch it ARE the company.

Read on for Kevin’s process, which was a solid bit of troubleshooting.

Leave a Comment

Creating a Power BI Dataset and Report via DirectLake

Gilbert Quevauvilliers finishes up a series:

In the final part of my series, I am going to be creating the Power BI DirectLake dataset and report from my tables that I had previously loaded into the lakehouse.

In this series I am going to show you all the steps I did to have the successful outcome I had with my client where I created the dataset (measures and fields) and the Power BI report.

Click through for links to the prior posts, as well as a walkthrough on creating a DirectLake asset in Power BI.

Leave a Comment

Transaction Log Files and Instant File Initialization

Erik Darling preps us for SQL Server 2022:

Look, I don’t blame you if you haven’t dug deep into what SQL Server 2022 has to offer just yet. It’s hard enough to keep up with all the problems fixed and caused by cumulative updates.

One thing you may want to pay attention to is how transaction log files are grown and VLFs are created, especially for new databases.

Read on to see what has changed there.

Leave a Comment

Value and Hash Encoding in VertiPaq

Kristyna Hughes looks at column encodings:

Power BI encoding is a powerful optimizing option that is often overlooked because it’s not visible in neither the Power BI Desktop tool nor in Power BI Service. Natively, the VertiPaq engine in Power BI investigates all columns in the data model and determines how it can store that data most efficiently. To achieve maximum compression, the VertiPaq engine starts by encoding each column which determines the method of compression applied to that column. There are a couple types of encoding – value and hash.

Read on to learn the difference, as well as how to push your columns to use a specific type of encoding.

Leave a Comment