Press "Enter" to skip to content

Month: April 2021

Star Schemas and Power BI Go Together

Marco Russo and Alberto Ferrari explain why star schemas make so much sense for Power BI:

Why should I have products, sales, date and customers as separate tables? Wouldn’t it be better to store everything in a single table named Sales that contains all the information? After all, every query I will ever run will always start from Sales. By storing everything in a single table, I avoid paying the price of relationships at query time, therefore my model will be faster.

There are multiple reasons why a single, large table is not better than a star schema. Here anyway, the focus is strictly on performance. Is it true that a single table is faster than a star schema? After all, we all know that joining two tables is an expensive operation. So it seems reasonable to think that removing the problem of joins ends up in the model being faster. Besides, with the advent of NOSQL and big data, there are so many so-called data lakes holding information within one single table… Isn’t it tempting to use those data sources without any transformation?

Read on to see why this is not the case.

Comments closed

Running Jupyter Notebooks from Powershell

Rob Farley has a change of heart:

The concept is that if I have a notebook with a bunch of queries in it, I can easily call that using Invoke-SqlNotebook, and get the results of the queries to be stored in an easily-viewable file. But I can also just call Invoke-SqlCmd and get the results stored. Or I can create an RDL to create something that will email me based on a subscription. And I wasn’t sure I needed another method for running something.

Read on to see what changed Rob’s mind.

Comments closed

Reverse ETL in a Modern Data Warehouse

James Serra reverses the polarity:

“Reverse ETL” is the process of moving data from a modern data warehouse into third party systems to make the data operational. Traditionally data stored in a data warehouse is used for analytical workloads and business intelligence (i.e. identify long-term trends and influencing long-term strategy), but some companies are now recognizing that this data can be further utilized for operational analytics. Operational analytics helps with day-to-day decisions with the goal of improving the efficiency and effectiveness of an organization’s operations. In simpler terms, it’s putting a company’s data to work so everyone can make better and smarter decisions about the business. As examples, if your MDW ingested customer data which was then cleaned and mastered, that customer data can then by copied into multiple SaaS systems such as Salesforce to make sure there is a consistent view of the customer across all systems. Customer info can also be copied to a customer support system to provide better support to that customer by having more info about that person, or copied to a sales system to give the customer a better sales experience. As a last example, you can identify at-risk customers by surfacing customer usage data in a CRM.

Click through for more details, including information on a few startups working on reverse ETL projects.

Comments closed

Schema Modification Lock-Driven Deadlocks

Taryn Pratt diagnoses a deadlocking issue:

As expected, with the announcement that Stack Overflow for Teams is free for up to 50 users, we saw an incredible spike in sign-ups. Along with the spike in sign-ups, I started to receive a huge increase in alerts about deadlocks on the primary SQL Server for Teams. In the two weeks it took us to resolve the deadlocks, we hit at least 200 deadlocks.

Click through for the low-down on how they discovered and corrected the issue.

Comments closed

Predicting Insurance Prices with ML.NET

Chandra Kudumula shows off ML.NET:

There are three ways to begin with ML.NET

– API Model: You can start ML.NET through a Framework API and write code in C# or F#
– GUI Model: Use ML.NET Model builder in Visual Studio.
– CLI Model: For cross-platform development like Mac and Linux, use ML.NET CLI.

Let’s get started with API Model for predicting the insurance premium using ML.NET Framework.

I’m using Microsoft (MS) Visual Studio 2019 and creating a Console Application. Be sure that you have the latest version of VS and that .NET 5 SDK is installed.

Click through for the demo in Visual Studio using C#.

Comments closed

Simulating Prediction Intervals

Bryan Shalloway continues a series:

Part 1 of my series of posts on building prediction intervals used data held-out from model training to evaluate the characteristics of prediction intervals. In this post I will use hold-out data to estimate the width of the prediction intervals directly. Doing such can provide more reasonable and flexible intervals compared to analytic approaches.

Click through for the article, and be sure to check out part 1 if you haven’t already.

Comments closed

Bulk Copying Lots of Rows into SQL Server

Esat Erkec shows us how to use the Bulk Copy Program (BCP) to bulk load data into SQL Server:

If the installed version is older than the last version, we can download and install the latest version from the Microsoft website. The main capability of the SQL Server BCP is not much complex because it can only run with several arguments. The syntax of the BCP is like below:

bcp {table|view|”query”} {out|queryout|in|format} {data_file|nul} {[optional_argument]…}

For example, if we want to export any data of a table to a text file, we have to specify the table name, the out option, and the data file. The following command will export the Production table into the specified text file.

bcp AdventureWorks2017.Production.Product out C:\ExportedData\Product.txt -S localhost -T –w

I don’t know if I’m the only person for which this is true, but the data file format has always been a royal pain for me to get right, to the point where I’d happily build an SSIS package to perform bulk loading over having to use BCP myself.

2 Comments

Finding and Clearing the Recycle Bin with Powershell

Jack Vamvas answers a question:

I use Powershell extensively to manage SQL Server and the Windows OS. A common problem is to identify location of the  Windows Recycle Bin and clear the contents down – particuarly if there is a space issue. 

How can I locate the Windows Recycle Bin and clear it down?

Read on for the Powershell v5 solution as well as the solution which works for earlier versions.

Comments closed

Equalizing Matrix Column Widths in Power BI

Nick Edwards has bad news and a workaround:

Have you ever come across an issue where your Power BI matrix column widths just aren’t the same width and visually just don’t look right?

Unfortunately (as of April 21’) there is no easy way to make all column widths equal in the format pane of a matrix visual.

However there is a hack to set the width of your all columns in a matrix so that they are all equal and pixel perfect with DAX!

Click through to see how. It’s a hack in all the finest ways: clever, unexpected, and a bit unwieldy.

Comments closed