March 2022 – Curated SQL

Automating Remote Execution via Powershell and SQL Server

Published 2022-03-31 by Kevin Feasel

As part of my journey to get better at Powershell, I gave myself a task: Create a project utilizing Powershell at its core, to solve a complicated problem as elegantly as possible.
A complicated problem indeed presented itself: How to remotely control multiple computers without having to log into them? Make them perform any task that could possibly be needed? Control them from one central location? And make the solution as easily scalable as possible?

Read on to see what Eitan has learned and applied.

Comments closed

Conditionally Formatting Multi-Stat Visuals in ADX

Published 2022-03-31 by Kevin Feasel

Hiram Fleitas looks at visual formatting in Azure Data Explorer:

Intro
Start with a free database at aka.ms/adx.free & run this demo query.
let mytable = datatable(key:string, number:int)
[
'one', 1,
'two', 2
];
mytable

Once you have that query, read on to see how you can visualize and format it.

Comments closed

Determining Access to Power BI Reports

Published 2022-03-31 by Kevin Feasel

Gilbert Quevauvilliers continues a series on determining who has access to what reports in Power BI:

This is the second part in my blog post series showing you how I created the Power BI Reports list.
In this blog post I am going to show you how I used PowerShell to get all the information of the App Names, reports and users that have permissions in the different Apps.
Users can get access to Power BI reports directly via the Share method, as well as via an App. I did this to ensure that I did not miss any reports that a user did have access to, and I could not show it!

Click through for the script, as well as an an explanation of how it all works.

Comments closed

Power BI Table Storage Modes and Model Types

Published 2022-03-31 by Kevin Feasel

Shabnam Watson puts together a compendium (and explanation) of the different table storage modes and model types in Power BI:

I still get a lot of questions from various Power BI developers about table storage modes and how table storage modes affect an entire model’s type. Here is a post to summarize all table storage modes / model types.
The following table storage options apply when creating a Power BI model.

There’s a brief summary in the tables, as well as additional notes below them.

Comments closed

Troubleshooting Out-of-Memory Errors in SQL Server’s Database Engine

Published 2022-03-31 by Kevin Feasel

Dimitri Furman shows off a DMV:

As part of our efforts to improve database engine supportability, we have added a new dynamic management view (DMV), sys.dm_os_out_of_memory_events. It is now available in Azure SQL Database and Azure SQL Managed Instance and will become available in a future version of SQL Server. If an out-of-memory (OOM) event occurs in the database engine, this view will provide details to help you troubleshoot the problem, including the OOM cause, memory consumption by the database engine components at the time of event, potential memory leaks (if any), and other relevant information.

Read on to learn more about it, as well as a corresponding Extended Event.

Comments closed

KQL Series

Published 2022-03-31 by Kevin Feasel

Hamish Watson does a document dump:

So what did we do here?
It searched our stored security events in the SecurityEvent table for all Accounts that had a successful login in the last 3 hours and we chose to display only the Account and number of log off events per Account in numerical order with the highest at the top.
So far I’ve introduced some new operators and things – but what is a really quick way to learn KQL?

Start with this post and just keep navigating forward. Hamish has ten posts in total.

Comments closed

Right to Be Forgotten in Delta Lake

Published 2022-03-30 by Kevin Feasel

Milos Colic, et al, tackle a tricky problem:

With Delta, we have one more tool at our disposal to address GDPR compliance and, in particular, “the right to be forgotten” – VACUUM. Vacuum operation removes the files that are no longer needed and that are older than a predefined retention period. The default retention period is 30 days to align with GDPR definition of undue delay. Our earlier blog on a similar topic explains in detail how you can find and delete personal information related to a consumer by running two commands:

The part I’m finding tricky here is, how does this handle “time travel” scenarios in which you’re looking at prior iterations of data? I haven’t run through all of the scenarios so this is just speculation, but it seems that even with all of these changes, you’d still have to worry about historical data containing that sensitive information.

Comments closed

Zero-Rename Writes in ElasticMapReduce Hive

Published 2022-03-30 by Kevin Feasel

Suthan Phillips, et al, show off some updates to the way Hive transactions commit in AWS’s ElasticMapReduce:

Our customers use Apache Hive on Amazon EMR for large-scale data analytics and extract, transform, and load (ETL) jobs. Amazon EMR Hive uses Apache Tez as the default job execution engine, which creates Directed Acyclic Graphs (DAGs) to process data. Each DAG can contain multiple vertices from which tasks are created to run the application in parallel. Their final output is written to Amazon Simple Storage Service (Amazon S3).
Hive initially writes data to staging directories and then move it to the final location after a series of rename operations. This design of Hive renames supports task failure recovery, such as rescheduling the failed task with another attempt, running speculative execution, and recovering from a failed job attempt. These move and rename operations don’t have a significant performance impact in HDFS because it’s only a metadata operation when compared to Amazon S3 where the performance can degrade significantly based on the number of files written.
This post discusses the new optimized committer for Hive in Amazon EMR and also highlights its impressive performance by running a TPCx-BB performance benchmark and comparing it with the Hive default commit logic.

Read on for a description of how commit operations work in general and how the updated Hive committer can help with certain types of queries.

Comments closed

Making Daily Standups Worthwhile

Published 2022-03-30 by Kevin Feasel

Amit Nair has some advice:

A Big No to technical discussions
This is the first reason why Daily scrum meetings take more than 15 mins. We can understand that Scrum Team needs to discuss the technical issue when they face one during the execution of a task. These technical issues need to be reported to Scrum Master as an obstacle or impediment, which need to be resolved later on not when the meeting is in progress.

This is generally sound advice, especially because the idea of a daily standup is to together, quickly discuss plans and activities relating to the sprint, and make the team aware of blockers. Going beyond that is typically unnecessary. As teams get smaller, you can be a bit more lax with the rules; as you get closer to that 8-10 person team, you have to be pretty ruthless about keeping on time and on topic. Save the more detailed discussions for relevant meetings afterward.

1 Comment

Grouping Sets and Groupings

Published 2022-03-30 by Kevin Feasel

Kevin Wilkie has fun with grouping sets:

Let’s look at our dbo.Person1 table that we worked with earlier. Today, I want to find a count of all of the persons in each of the following categories: ZipCode, Gender, and Email Domains. And just for fun, let’s add in there where each of those categories cross – for example, Zipcode and Gender, ZipCode and Email Domain, etc…
Most people would think all kinds of awful thoughts at this point about all of the GROUP BY statements you’ll have to write. For anyone wondering – this is one way to do it. Notice all kinds of UNION statements and I’m sure someone is wondering if that’s truly all of the combinations. And we don’t want to go into the maintenance on this if things do happen to change…

And don’t forget about the GROUPING() function:

Let’s say our business partner asks us to determine which fields are aggregated together. Since we only have 2 fields and a grand total of 15 rows, we could determine this by eye. But, like all good developers, we want to do this programmatically.
Here’s where our friend – the GROUPING() function – comes into play.

GROUPING SETS is an extremely useful operator in the ANSI SQL standard. Definitely worth learning how to use.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Month: March 2022