Press "Enter" to skip to content

Curated SQL Posts

Monitoring Power BI On-Premises Gateways

Jeff Pries shows us how we can monitor Power BI Enterprise Gateways:

To tie all of these steps together, I put together a brief C# console application which is capable of using stored credentials or prompting for credentials interactively, then performs the 4 steps listed above. Finally, the application receives the JSON response and de-serializes it and outputs it in a friendly way to the screen (or optionally inserts it into a SQL database table.

For the basics on how I perform the authentication and API query in the application, see my blog post here.

There are a few more steps than I would have anticipated.

Comments closed

Investigating the Big Data Cluster Data Pool

Mohammad Darab takes us through Big Data Cluster data pools:

Data pools enable the creation of scale-out data marts. Whether your data is being ingested from Spark jobs or SQL, it is stored into the data pool. Data is distributed across one, or two, SQL Server instances running queries against it is more efficient.

Whether the data is being ingested from IoT device, Kafka, another relational data source (like Oracle or Teradata), it all is stored into the data pool instances and are available as “data marts” for the consumer to work with. There is no need to go back out to the original data source each time you want to query the data. It is all available inside the data pool instances.

This lets you cache data brought in via PolyBase and spread it across a number of instances. That’s pretty powerful.

Comments closed

Don’t Miss These Settings in Azure SQL DB

Arun Sirpal takes us through a few things administrators tend to miss in Azure SQL Database:

2. Allow Azure Services and resources to access this server setting set to on/off?

I always set this to off. I do not like it ON.

Why? Because I like to control things via vnets (maybe IPs if really needed – it depends on your solution). Nowadays you can use private endpoint connections which allow connections from within a vnet to a private IP.  Sure, you may want to use IP addresses, if you do then I suggest database level firewall rules over server level, especially if you use failover groups.

There are several good ones here.

Comments closed

Static Analysis of Hadoop Libraries

Maxim Stefanov ran a static analysis of several Hadoop libraries and here are the findings:

After the analysis was completed, I chose the most interesting warnings and noticed that I had the same number of warnings in production code and in tests. Normally, I don’t consider analyzer warnings from tests. But when I divided them, I couldn’t leave ‘tests’ warnings unattended. “Why not take a look at them,” I thought, “because bugs in tests might also have adverse consequences.” They can lead to incorrect or partial testing, or even to mishmash. 

After I selected the most intriguing warnings, I divided them by the following groups: production, test and the four main Hadoop modules. And now I’m glad to offer the review of analyzer warnings.  

Read on for the list. Hopefully Maxim submitted a few pull requests or at least Jira tickets for the projects.

Comments closed

Differences Between Kaggle and Real Life

Sergii Makarevych explains the differences between a Kaggle competition and a business-world data science project:

There are some very important differences between a Kaggle competition and real-life project which beginner Data Scientists should know about. Kaggle creates a fantastic competition spirit. Its leaderboard drives people to deliver better and better solutions pushing accuracy to the limit. Kaggle’s Notebooks and Discussions make it easy to share knowledge and learn. However real-life projects are somewhat different. I hope this article will be helpful for people who consider moving into Data Science starting with Kaggle competitions. I remember I was a little bit overwhelmed when on my first real-life project all the models, that typically worked well on Kaggle, miserably failed. I wish I was prepared for this.

It’s a sensible list of differences. Kaggle emphasizes one part of the data science process, but businesses end up needing the whole thing.

Comments closed

Getting a List of Power BI Pro Licensed Users

Brent Powell shares a Powershell script to retrieve Power BI Pro licensed users:

Per the Power BI licensing documentation, a pro license is required for publishing and editing content in app workspaces. If the app workspace is not assigned to a premium capacity, even the users viewing/consuming the content will require a pro license.

Pro license assignments are also very important from a governance perspective. An organization that has provisioned premium capacity would generally want to limit the number users with pro licenses to users who A) have a clear need for developing and publishing Power BI artifacts (dashboards, reports, dataflows, datasets) on an ongoing basis and B) have received some form of training or certification on using Power BI effectively as well as the organization’s policies for using Power BI.

As one (very) simple example for an organization with premium capacity, two users in a department of ten could be determined to be the content creators for their department – perhaps one will build datasets and the other will build reports and dashboards. These two users, along with maybe a backup user, could be assigned pro licenses. Other users on the team without a pro license could still make development and test related contributions to their team’s projects via Power BI Desktop and the Viewer workspace role but they would rely on the pro users in their department for publishing and distributing content.

Click through for the script and a detailed explanation.

Comments closed

Backing Up Extended Events Sessions

Jason Brimhall shows us how to back up Extended Events sessions using Powershell:

Quite some time ago, I shared a few articles that peeled back the top layer of how to use PowerShell (PoSh) with Extended Events (XEvents). Among those articles, I showed how to retrieve the metadatapredicates and actions, and targets (to mention a few). Those are prime examples of articles showing some of the basics, which means there is plenty of room for some deeper dive articles involving both PoSh and XEvents. One topic that can help us bridge to the deeper end of the XEvents pool is how to generate scripts for our XEvent Sessions.

In this article, I will venture to show how to generate good backup scripts of our sessions using PoSh. That said, there are some caveats to using PoSh to generate these scripts and I will share those as well.

Read the whole thing, especially because there is one doozy of a caveat at the end.

Comments closed

Labeling Queries in Azure Synapse Analytics

Niko Neugebauer touches on something I want for on-premises SQL Server:

In Azure Synapse Analytics (Azure SQL DW) we have a tool that can help us – the query labels. Firing up the same analytical query, but this time with the OPTION (LABEL = ‘QueryLabelIdentification’) can help us with the identification of the processing. So for the test example I have simply included the format QL – [Query Pupose] where QL stands for Query Labelling:

I think this would have a lot of value on-prem, especially if you are using Query Store.

Comments closed

sqlcmd and Complex Passwords

Randolph West hits one of my bugbears with respect to the Windows command shell:

Using accepted good practice, the password and script were escaped with double quotes. (note that instancepassword and database are the replacement values in question):

sqlcmd -S instance -U maintenanceUser -P "password" -Q "dbcc checkdb ('database') with DATA_PURITY, NO_INFOMSGS;"

Unfortunately, one of the passwords started with a double quotation mark which led to the command failing for one specific Express Edition instance.

Read on to see the mess as well as a way to extricate yourself from the mess.

Comments closed