Press "Enter" to skip to content

Curated SQL Posts

Forking GitHub Repos and Contributing to Open Source Projects

Rob Sewell takes us through the process of contributing to an open source project:

– Fork the repository into your own GitHub

– Clone the repository to your local machine

– Create a new branch for your changes

– Make some changes and commit them with useful messages

– Push the changes to your repository

– Create a Pull Request from your repository back to the original one

You will need to have git.exe available which you can download and install from https://git-scm.com/downloads if required

For bonus points, we learn that Shane O’Neill doesn’t use the Oxford comma.

Comments closed

Shrinking Your Power BI Dataset Sizes

Gilbert Quevauvilliers wanted to reduce Power BI memory usage:

I had already applied all the best practices in terms of reducing the cardinality, removing unwanted columns and making sure that only the data required is being brought into the dataset. Even at this point the dataset size was consuming 90GB of memory in Azure Analysis Services. With the steps below I got my dataset size down to a whopping 37GB of memory!

I used the awesome tools from SQLBI.COM and DAX Studio to see which columns were consuming the most space, and because my dataset had currency converted values, this meant that the cardinality was very high. (The reason that I decided to store the currency conversion values, is when trying to do it on the fly in a large dataset it is very slow)

Two simple tricks led to a pretty nice reduction in size.

Comments closed

A Git Cheat Sheet

Kendra Little has a cheat sheet for working with Git:

I created a cheat sheet for the Git Command Line Interface to go along with my Git tutorial for SQL Change Automation video. I find the Git CLI to be very friendly and easier to learn than a GUI interface.

Given the number of “How do I extricate myself from this Git mess?” messages in my company chat, I’m not sure I’d call the Git CLI friendly. Nonetheless, Kendra does a great job of putting together most of the common commands in an easy guide.

Comments closed

Creating an Azure Data Factory

Cathrine Wilhelmsen continues a series on Azure Data Factory:

In the introduction to Azure Data Factory, we learned a little bit about the history of Azure Data Factory and what you can use it for. In this post, we will be creating an Azure Data Factory and getting familiar with the user interface.

Spoiler alert! Creating an Azure Data Factory is a fairly quick click-click-click process, and you’re done. But! Before you can do that, you need an Azure Subscription, and the right permissions on that subscription. Let’s get that sorted out first.

This post is all about setup and getting an overview of the ADF canvas.

Comments closed

Common Mistakes When Moving to the Cloud

Dave Wentzel takes us through common issues companies experience when adopting a cloud provider:

Don’t make these mistakes:

– Don’t try to use pricing calculators and expect their answers to be close to what your actual spend will be. Cloud expenses are buried everywhere. Instead, have a rough budget to move ONE app to the cloud. Migrate it. Wait a month and examine the bill. What line items were you NOT expecting to see? Is data egress higher than you thought? That’s common. Now, how can you creatively fix that?

– PaaS is never cheaper, at least initially. I call this The PaaS Tax. It will cost you more to use PaaS than to run the same workload in IaaS. Initially. Remember, the paradigm is different from “datacenter” to “cloud”. PaaS becomes cheaper when you leverage PaaS scaling. Since you can’t really scale something like SQL Server in your data center, most people forget this. But in the cloud you can scale down your SQL Server when it is lightly used. That’s how you save money.

Click through for the full story.

Comments closed

United States Maps in R

Laura Ellis shows how to use the usmap package in R:

Today, I’d like to share the package ‘usmap’ which enables incredibly easy and fast creation of US maps in R.

In honor of US Thanksgiving tomorrow, I’m going to make this blog Thanksgiving themed! In this tutorial, we will use the gTrendsR package to pull US Google search results on the keyword “thanksgiving” and plot the popularity by state.

Click through for that demo, as well as links to more demos on map usage.

Comments closed

Comparing Dates in a WHERE Clause

Erik Darling has been diving into issues with date comparison lately, including comparing date columns in the WHERE clause:

A common dilemma is when you have two date columns, and you need to judge the gap between them for something.

For instance, say you have a table of orders and you want to figure out how long on average it takes an ordered item to ship, or a shipped item to be delivered.

You’re not left with many good ways to write the query to take advantage of indexes.

Read on to see what Erik means, and one way you can speed it up a bit.

Comments closed

Structuring Databricks Notebooks

Paul Andrew has put together a basic structure for Databricks notebooks using titles, markdown, and widgets:

For me, one of the hardest parts of developing anything is when you need to pick up and rework code that has been created by someone else. That said, my preferred Notebook structure shown below is not about technical performance or anything complicated. This is simply for ease of sharing and understanding, as well as some initial documentation for work done.

In my example I created a Scala Notebook, but this could of course apply to any flavour.

This makes good use of markdown capabilities without being too heavy. I like it. The same general principles apply if you’re putting together Jupyter notebooks outside of Databricks.

Comments closed

The Problems with RDPing into SQL Server Hosts

Emanuele Meazzo enumerates the issues with using RDP to connect directly to a server hosting SQL Server (or any other database system):

When there are SERIOUS performance problems, i.e. your instance is totally pinned as far as resources go, the only way to log into the instance to do something about it is using DAC; even if you are on the same machine, if you’re trying to log in, as usual, it won’t make any difference.
Enable Remote DAC and save yourself from future troubles, now.

There are several problems. Better is to run client tools from a client machine and something like SSH or Powershell remoting to do things against a server itself.

Comments closed

Editing Rows in SSMS–Behind the Scenes

Randolph West explains what happens when you use the editor built into SQL Server Management Studio to modify rows directly:

A customer recently brought up an interesting thesis, that if you edit a table’s values using SQL Server Management Studio (SSMS) using the edit feature, that the table is dropped and recreated in the background when you commit the changes.

This is false, but there had to be a good reason why they were under this misapprehension.

Read on to understand what does happen and where the customer might have gotten this idea.

Comments closed