Press "Enter" to skip to content

Author: Kevin Feasel

Static Analysis of Hadoop Libraries

Maxim Stefanov ran a static analysis of several Hadoop libraries and here are the findings:

After the analysis was completed, I chose the most interesting warnings and noticed that I had the same number of warnings in production code and in tests. Normally, I don’t consider analyzer warnings from tests. But when I divided them, I couldn’t leave ‘tests’ warnings unattended. “Why not take a look at them,” I thought, “because bugs in tests might also have adverse consequences.” They can lead to incorrect or partial testing, or even to mishmash. 

After I selected the most intriguing warnings, I divided them by the following groups: production, test and the four main Hadoop modules. And now I’m glad to offer the review of analyzer warnings.  

Read on for the list. Hopefully Maxim submitted a few pull requests or at least Jira tickets for the projects.

Comments closed

Backing Up Extended Events Sessions

Jason Brimhall shows us how to back up Extended Events sessions using Powershell:

Quite some time ago, I shared a few articles that peeled back the top layer of how to use PowerShell (PoSh) with Extended Events (XEvents). Among those articles, I showed how to retrieve the metadatapredicates and actions, and targets (to mention a few). Those are prime examples of articles showing some of the basics, which means there is plenty of room for some deeper dive articles involving both PoSh and XEvents. One topic that can help us bridge to the deeper end of the XEvents pool is how to generate scripts for our XEvent Sessions.

In this article, I will venture to show how to generate good backup scripts of our sessions using PoSh. That said, there are some caveats to using PoSh to generate these scripts and I will share those as well.

Read the whole thing, especially because there is one doozy of a caveat at the end.

Comments closed

Getting a List of Power BI Pro Licensed Users

Brent Powell shares a Powershell script to retrieve Power BI Pro licensed users:

Per the Power BI licensing documentation, a pro license is required for publishing and editing content in app workspaces. If the app workspace is not assigned to a premium capacity, even the users viewing/consuming the content will require a pro license.

Pro license assignments are also very important from a governance perspective. An organization that has provisioned premium capacity would generally want to limit the number users with pro licenses to users who A) have a clear need for developing and publishing Power BI artifacts (dashboards, reports, dataflows, datasets) on an ongoing basis and B) have received some form of training or certification on using Power BI effectively as well as the organization’s policies for using Power BI.

As one (very) simple example for an organization with premium capacity, two users in a department of ten could be determined to be the content creators for their department – perhaps one will build datasets and the other will build reports and dashboards. These two users, along with maybe a backup user, could be assigned pro licenses. Other users on the team without a pro license could still make development and test related contributions to their team’s projects via Power BI Desktop and the Viewer workspace role but they would rely on the pro users in their department for publishing and distributing content.

Click through for the script and a detailed explanation.

Comments closed

sqlcmd and Complex Passwords

Randolph West hits one of my bugbears with respect to the Windows command shell:

Using accepted good practice, the password and script were escaped with double quotes. (note that instancepassword and database are the replacement values in question):

sqlcmd -S instance -U maintenanceUser -P "password" -Q "dbcc checkdb ('database') with DATA_PURITY, NO_INFOMSGS;"

Unfortunately, one of the passwords started with a double quotation mark which led to the command failing for one specific Express Edition instance.

Read on to see the mess as well as a way to extricate yourself from the mess.

Comments closed

Labeling Queries in Azure Synapse Analytics

Niko Neugebauer touches on something I want for on-premises SQL Server:

In Azure Synapse Analytics (Azure SQL DW) we have a tool that can help us – the query labels. Firing up the same analytical query, but this time with the OPTION (LABEL = ‘QueryLabelIdentification’) can help us with the identification of the processing. So for the test example I have simply included the format QL – [Query Pupose] where QL stands for Query Labelling:

I think this would have a lot of value on-prem, especially if you are using Query Store.

Comments closed

Using dbatools to Manage Client Aliases

Chrissy LeMaire takes us through client aliases and how to manage them in dbatools:

SQL Client Aliases allow you to connect to a SQL Server instance using another name. This is especially useful during migrations. Want your servers to connect to the new SQL Server without modifying connection strings within your application? Or what if you could use easy-to-remember names for your docker containers? SQL Client Aliases can help.

Click through for the commands and some quick demonstration.

Comments closed

Working with R and the Windows Command Line

Tomaz Kastrun takes us through calling CMD commands from R:

From time to time, when developing in R, working and wrangling data , preparing for machine learning projects, it comes the time, one would still need to access the operating system commands from/in R.

In this blog post, let’s take a look at some most useful cmd commands when using R.  Please note, that the cmd commands apply only to windows environment, for Linux/MacOS, the system commands should be slightly changed, but the wrapper R code should remains the same.

The need does come up, so it’s good to have that knowledge at hand.

Comments closed

2020 Data Professional Salary Survey Results

Brent Ozar has another year of salary data for us:

A few things to know about it:

– The data is public domain. The license tab makes it clear that you can use this data for any purpose, and you don’t have to credit or mention anyone.
– The spreadsheet includes the results for all 4 years (2017-2020.) We’ve gradually asked different questions over time, so if a question wasn’t asked in a year, the answers are populated with Not Asked.
– The postal code field was totally optional, and may be wildly unreliable. Folks asked to be able to put in small portions of their zip code, like the leading numbers.
– Frankly, anytime you let human beings enter data directly, the data can be pretty questionable – for example, there were 14 folks this year who entered annual salaries below $500. If you’re doing analysis on this, you’re going to want to discard some outliers.

It’s on my agenda (somewhere…probably a bit further back than I’d like) to dig into this year’s data and try to come up with something a little more comprehensive now that there are four years of data.

Comments closed

Not All Cursors are Bad

Erik Darling doesn’t want to mess with your cursors (that much):

Read the code. Understand the requirements.

I tune queries all day long. The number of times someone has said THIS CURSOR IS A REAL BIG PROBLEM and been right is pretty small.

Often, there was a tweak to the cursor options, or a tweak to the query the cursor was calling (or the indexes available to it) that made things run in a more immediate fashion. I want to tune queries, not wrestle with logic that no one understands. Old code is full of that.

I’ll grant the premise (and add my own case where a cursor was necessary to solve the problem), though I did work at one company where the entire product logic was driven by nested cursors 5 or 6 levels deep. Those were really big problems. I think you’ll find the problem most frequently in shops with a heavy dose of Oracle, as Oracle cursors do perform well.

Comments closed