Press "Enter" to skip to content

Curated SQL Posts

“Big” Data

Buck Woody explains that “Big Data” is just data:

A few years ago it was all the rage to talk about “Big Data”. Lots of descriptions of “Big Data” popped up, including the “V’s” (Variety, Velocity, Volume, etc.) that proved very helpful. I even have my own definition:

Big Data is any data you can’t process
in the time you want
with the systems you have

This post is quite reasonable in its depiction of the problem. I extend it a bit further than that and talk about difficulty of processing the data. Nonetheless, read Buck’s full thoughts and check out the Big Data Clusters workshop.

Comments closed

Asymmetric Crosstabs in Power BI

Teo Lachev shows how we can implement asymmetric crosstabs in Power BI:

The Internet column shows the sales amount from FactInternetSales. Then, the matrix pivots on the BusinessType column in the FactResellerSales. Because, Internet sales don’t relate to BusinessType, it doesn’t make sense to pivot it. Instead, we want to show Internet sales in a single static column before the crosstab portion starts.

Implementing such a report in SSRS is easy thanks to its support of adjacent groups and static columns but not so much in Power BI. The issue is that Matrix would happily pivot both measures and the InternetSalesAmount would be repeated for each business type.

The solution isn’t awful, but it does involve knowledge of DAX.

Comments closed

Power BI Practices: Good and Best

Paul Turley has a great document plus checklist on Power BI practices:

I find there there are so many things to remember when starting a project that a checklist is handy. I’ve been collecting the following as notes for some time. Bare with me as I work on consolidating this article into a concise checklist.

The purpose of this article is to outline a set of guidelines and recommended practices for managing Microsoft Power BI projects. This guide is primarily focused on the work performed by the IT-managed BI Solution Developers for formally-managed BI projects.

If you work heavily with Power BI, you’ll really want to read this and review Paul’s checklist at the end.

Comments closed

On the Certification Debate

Grant Fritchey has thoughts on certifications:

However, you frequently see people, usually the ones with 47 certificates, going on and on about how, just one more cert, just one more, this time, I’ll get the job I want.

Nope.

Let me break the news. Experience, a proven track record, and knowledge are what get you jobs. And yes, I understand, how do you get experience without first getting a job? That is indeed the hurdle. I’m just telling you that certifications are not the rocket in your bottom that will throw you over that hurdle.

Also connections (which Grant also points out in the post which you should read). Connections land many more jobs than certifications. Most certifications are as much noise as signal, which greatly dilutes the value of the thing. Once again I lament the loss of the MCM, one of the few certifications with a near-zero percent noise rate due to how difficult it was and how many things you needed to understand to get past it. But even with it, experience and networking will get you much further.

Comments closed

Reviewing the AMD EPYC Line for SQL Server

Glenn Berry takes a look at whether we want to invest in AMD’s latest server-grade processor:

On August 7, 2019, AMD finally unveiled their new 7nm EPYC 7002 Series of server processors, formerly code-named “Rome” at the AMD EPYC Horizon Event in San Francisco. This is the second generation EPYC server processor that uses the same Zen 2 architecture as the AMD Ryzen 3000 Series desktop processors. These new processors are socket compatible with the previous generation AMD EPYC 7001 Series processors, so they will work in existing model servers (with a BIOS update). Despite that, you will need a new model server to be able to use PCIe 4.0 support from the newer processors.

The AMD EPYC 7002 series includes 19 public launch SKUs that have anywhere from 8 to 64 physical cores, plus SMT, for twice the number of logical cores per processor. There are fourteen SKUs that will work in both one-socket and two-socket servers. There are also five less expensive processor SKUs (which have a “P” suffix) that only work in one-socket servers. This processor family has enough compute horsepower, memory bandwidth and capacity, and I/O bandwidth to support large server workloads on a single-socket server.

It certainly looks competitive. And that’s a great thing for consumers, even those who never make the switch, as it will force Intel to up its game.

Comments closed

SQL Server Database Recovery Models

John McCormack goes into the three database recovery models available in SQL Server:

This post is about database recovery models for SQL Server databases. Having the correct recovery model for a database is crucial in terms of your backup and restore strategy for the database. It also defines if you need to do maintenance of the transaction log or if you can leave this task to SQL Server. Let’s look at the various recovery models and how they work.

Click through for a description of each, including recommendations of when to choose each.

Comments closed

Configuring an Azure Databricks Cluster

Jon Gurgul explains cluster settings in Azure Databricks:

We need compute to run our notebooks and this is achieved by creating a cluster. A cluster is merely a number of Virtual Machines behind the scenes used to form this compute resource. The benefit of Azure Databricks is that compute is only chargeable when on.

Let’s now click the Clusters icon and set up a simple cluster. Once you have loaded the page you can use the “Create Cluster” button.

Click through for an explanation of what each of the settings means.

Comments closed

The Basics of Apache Airflow

Divyansh Jain explains what Apache Airflow is and takes us through a sample solution:

Airflow is a platform to programmatically author, schedule & monitor workflows or data pipelines. These functions achieved with Directed Acyclic Graphs (DAG) of the tasks. It is an open-source and still in the incubator stage. It was initialized in 2014 under the umbrella of Airbnb since then it got an excellent reputation with approximately 800 contributors on GitHub and 13000 stars. The main functions of Apache Airflow is to schedule workflow, monitor and author.

It’s another interesting product in the Hadoop ecosystem and has additional appeal outside of that space.

Comments closed

Azure SQL DB Serverless

John Morehouse takes us through the current public preview of Azure SQL Database Serverless:

Microsoft continues to expand the Azure ecosystem rapidly.  One of the most recent delivered improvements was Azure SQL Database Serverless option.    The new serverless model, which is currently in public preview, provides a compute tier for a single database that scales dynamically with the workload.   This new compute tier is optimal for single database that have unpredictable usage patterns.  Previously, you would provision the compute tier to your Azure SQL Database which allowed to have more granular control on scalability. The serverless tier scaling is effectively controlled by the service itself and will scale when needed.

This looks like it’d be good for a dev server, where the occasional startup cost is fine and there can be significant usage requirement differences based on time—after the devs go home, you’re not likely to need much compute.

Comments closed