May 2023 – Page 13 – Curated SQL

Charts and Color Over-Use

Published 2023-05-04 by Kevin Feasel

Rita Fainshtein shows examples of how over-usage of color makes charts harder to read:

Both graphs convey a message of ranking and grouping into categories.
The categories are shown in both cases in a color-coded manner instead of in a hierarchical format. As graph creators, why do we tend to create graphs with color categories?
1. The fear of being boring, one color seems uninteresting, and here we have both colors and icons. This is an “excellent” attribute for a storyteller.
2. Visually representing a group with similar characteristics makes sense.
But can such graphs tell us anything about groups? Are they easy to understand?
Let’s discuss a few aspects of those cases together:

Click through for the full story, including an alternative to using color as a way to categorize data.

Comments closed

Deploying Azure SQL Edge

Published 2023-05-04 by Kevin Feasel

Kevin Chant takes us to the edge:

Azure SQL Edge is a version of the SQL database engine that is designed to be deployed on IoT (Internet of Things) devices.
It is based on SQL Server 2019. Which means that by default all new databases are created using the SQL Server 2019 compatibility level. You can lower the compatibility level all the way down to SQL Server 2008 if required.

There was some nice functionality in Azure SQL Edge, some of which (like DATE_BUCKET() and DATETRUNC() made it into SQL Server 2022).

Comments closed

Being a Better DBA with the SPIN Model

Published 2023-05-04 by Kevin Feasel

Eitan Blumin takes us to a seminar:

The SPIN sales strategy is a selling technique that was developed by Neil Rackham in the 1980s. SPIN is an acronym that stands for Situation, Problem, Implication, and Need-payoff. This strategy is based on the idea that asking the right questions can help you understand your customer’s needs and provide the best solution for them.
When I first heard about the SPIN sales strategy, I was attending a lecture that was delivered to us by a sales and marketing specialist during one of our company meetings several years ago. As a DBA, I initially assumed this strategy wouldn’t be relevant to my job. But as I listened to the presenter explain the SPIN model, I began to see its potential for use in my daily work:

It’s an interesting approach and I like the way Eitan ties it back to database administration. Of course, we could tie it to application development or any of a number of other fields. I, meanwhile, use the Colombo method, in which I ask a series of seemingly-dumb questions, but just before I leave, I say “Oh, just one more question,” and hit the person with the question proving I know that person committed the crime and have enough evidence to make an arrest.

Comments closed

Diffify Updates

Published 2023-05-03 by Kevin Feasel

Myles Mitchell celebrates a year of diffify:

We’ve just passed an important milestone for diffify: our app for tracking Python and R package releases has just turned 1 year old! To mark this exciting occasion we are delighted to announce an “anniversary update” featuring numerous quality of life improvements. This post will outline the latest changes and tease at some exciting developments in the works…

Check out these recent changes and a little bit of what’s on the horizon.

Comments closed

PayPal’s Data Contract Template Open Sourced

Published 2023-05-03 by Kevin Feasel

Jean-Georges Perrin makes an announcement:

A data contract is a binding agreement between the consumers and producers of data. You can see it as a data schema on steroids or data schema++. The goal of the contract is to set expectations between the parties. It can be built as fit-for-purpose where the consumers and producer agree on what it should contain or can serve as a brochure for any consumer willing to access the data offered by this (data) product.

Click through to learn more about data contracts and then check out the contract template itself on PayPal’s GitHub repo.

Comments closed

Documenting Group Policy Objects with Powershell

Published 2023-05-03 by Kevin Feasel

Patrick Gruenauer builds a report:

Active Directory Group Policies (GPO) enables you to control user and computer settings. It is important to document them. In this blog post I am going to show you two PowerShell commands which create a GPO HTML Report. Let’s dive in.
To store all GPO Settings from all GPOs in one file run this command. Don’t forget to provide your domain name and the path of the report file.

Click through for that code snippet, as well as another one which builds an HTML report for each GPO.

Comments closed

Adding Help to Your Powershell Code

Published 2023-05-03 by Kevin Feasel

Robert Cain helps those who help themselves:

Having good help is vital to the construction of a module. It explains not only how to use a function, but the purpose of the module and even more.
Naturally I’ve included good help text in the ArcaneBooks module, but as I was going over the construction of the ArcaneBooks module I realized I’d not written about how to write help in PowerShell. So in this post and the next I’ll address this very topic.

Read on for Robert’s thoughts on the topic, including standard ways to add content comments so Powershell’s built-in Get-Help cmdlet works for you.

Comments closed

Azure Synapse Analytics April 2023 Updates

Published 2023-05-03 by Kevin Feasel

Ryan Majidimehr has an update for us:

Low Shuffle Merge optimization for Delta tables is now available in Apache Spark 3.2 and 3.3 pools. You can now update a Delta table with advanced conditions using the Delta Lake MERGE command. It can update data from a source table, view, or DataFrame into a target table. The current algorithm of the MERGE command is not optimized for handling unmodified rows. With Low Shuffle Merge optimization, unmodified rows are excluded from expensive shuffling execution and written separately.
To learn more about this new command, read Low Shuffle Merge optimization on Delta tables.

Looks like a bit of work on Data Explorer pools and a little bit on Spark pools and Synapse Link to Cosmos DB to round out the month.

Comments closed

WaitTime in Power BI

Published 2023-05-03 by Kevin Feasel

Chris Webb explains what a new metric means:

What does WaitTime represent? Here’s the technical explanation: it’s the wait time on the query thread pool in the Analysis Services engine before the query starts to run. But what does this mean for you as someone trying to tune DAX queries in Power BI?

Chris provides an examplation of exactly that. This tracking of noisy neighbors is interesting, as it would provide insight if you’re noticing variance in dataset refresh times.

Comments closed

Extending a tinyAML and shiny App

Published 2023-05-02 by Kevin Feasel

Steven Sanderson wraps up a series on shiny and tinyAML. Part 3 extends options for regression:

As data science continues to be a sought-after field, creating a reliable and accurate model is essential. While there are various machine learning algorithms available, the process of selecting the correct algorithm can be complex. The {tidyAML} package, part of the tidymodels suite, offers an easy-to-use, consistent interface for building machine learning models. In this post, we will explore a Shiny application that utilizes tidyAML to build a machine learning model.
Today I have updated the tidyAML shiny app to include the ability to set the parameter of the fast_regression() function .parsnip_fns and this is things like linear_reg.

And part 4 includes classification:

This is a Shiny app for building models using the {tidyAML} which is based on the tidymodels package in R. The app allows you to upload your own data or choose from one of two built-in datasets (mtcars or iris) and select the type of model you want to build (regression or classification).
Let’s take a closer look at the code.

This was an interesting series, for sure.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Month: May 2023