Curated SQL – Page 353 – A Fine Slice Of SQL Server

Trying Query Parameterization Settings in SQL Server

Published 2023-05-01 by Kevin Feasel

You have probably seen the recommendation to turn on the “optimize for ad-hoc workloads” setting. You might even have seen a more recent recommendation to set the database setting parameterization to forced (instead of the default which is simple). The aim of this post is to briefly describe each and then do some test with various settings.

Click through for that test. This is a good example of how we need to temper guidance with context. In Tibor’s scenario, forced parameterization is a no-brainer and optimize for ad hoc workloads gives a pretty nice reduction in plan cache utilization. But then, with optimize for ad hoc workloads on, you lose the ability to see the first run of a query in Query Store and lose the opportunity to tune the different variations of a query which only ran once. Pretty much every setting in SQL Server exists because there is a scenario in which that is the most appropriate setting. Except auto-shrink. Auto-shrink delenda est.

Comments closed

Installing SqlPackage for a Deployment Pipeline

Published 2023-05-01 by Kevin Feasel

Kevin Chant uses a deployment tool to install a deployment tool for his deployment tools:

I decided to do this post after some feedback I received about SqlPackage after a series of posts about deploying dacpacs to serverless SQL Pools. For example, my post about deploying a dacpac to a serverless SQL pool.

Because in order to deploy dacpacs to serverless SQL Pools you must update SqlPackage.

With this in mind, I thought I better go through various ways to update SqlPackage if intending to use it to deploy dacpacs to serverless SQL Pools.

Read on to see how you can do this.

Comments closed

Measuring Power BI Dataset Memory and CPU Utilization

Published 2023-05-01 by Kevin Feasel

Chris Webb checks resource utilization:

This post is a follow-up to my recent post on identifying CPU and memory-intensive Power Query queries in Power BI. In that post I pointed out that Profiler and Log Analytics now gives you information on the CPU and memory used by an individual Power Query query when importing data into Power BI. What I didn’t notice when I wrote that post is that there is also now information available in Profiler and Log Analytics that tells you about peak memory and CPU usage across all Power Query queries for a single refresh in the Power BI Service, as well as memory usage for the refresh as a whole.

Click through for a demonstration using Profiler.

Comments closed

Understanding the Kafka Partitioner

Published 2023-04-28 by Kevin Feasel

Bill Bejeck talks partitions:

Apache Kafka is the de facto standard for event streaming today. Part of what makes Kafka so successful is its ability to handle tremendous volumes of data, with a throughput of millions of records per second, not unheard of in production environments. One part of Kafka’s design that makes this possible is partitioning.

Kafka uses partitions to spread the load of data across brokers in a cluster, and it’s also the unit of parallelism; more partitions mean higher throughput. Since Kafka works with key-value pairs, getting records with the same key on the same partition is essential.

Read on to learn a bit about how that partitioning works and why it’s important for application design, especially across multiple programming languages.

Comments closed

Creating a Clickable Word Cloud with Shiny

Published 2023-04-28 by Kevin Feasel

Mandy Norrbo builds a word cloud:

Word clouds are a visual representation of text data where words are arranged in a cluster, with the size of each word reflecting its frequency or importance in the data set. Word clouds are a great way of displaying the most prominent topics or keywords in free text data obtained from websites, social media feeds, reviews, articles and more. If you want to learn more about working with unstructured text data, we recommend attending our Text Mining in R course

Usually, a word cloud will be used solely as an output. But what if you wanted to use a word cloud as an input? For example, let’s say we visualised the most common words in reviews for a hotel. Imagine we could then click on a specific word in the word cloud, and it would then show us only the reviews which mention that specific word. Useful, right?

Read on to see how you can create one of these.

Comments closed

Hybrid ML and Rules-Based Fraud Detection

Published 2023-04-28 by Kevin Feasel

Ayodeji Ogunlami mixes approaches:

In developing this hybrid system, sets of rules are required as well as a machine learning model. I would be making use of a vehicle insurance dataset from Kaggle in this demonstration.

The dataset can be downloaded from this link: https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection

The ML model would be built using a random forest classifier on Azure Databricks using Pyspark.

This seems to be the most sensible approach, especially given how rare actual fraud incidents are and what that imbalance does to classification algorithms.

Comments closed

Knitting R-Markdown Files into Google Docs

Published 2023-04-28 by Kevin Feasel

Benjamin Smith makes a Google Doc:

RMarkdown is a powerful framework for writing a documents that contain a mixture of text, code, and the output of the code. Popular output formats for RMarkdown Documents (.Rmd) include HTML, PDF and Word Documents. It is also possible to output RMarkdown documents as part of a static website using blogdown package and is (still!) possible to publish RMarkdown documents to WordPress sites as well (like this one)!

Recently, I started to look into the possibility of outputting an .Rmd file as a Google Doc, but I was unable to locate any out-of-box solutions. After looking into the issue I developed a small function that makes it possible!

Click through for that function.

Comments closed

Portfolio Management for Creating a Technology Strategy

Published 2023-04-28 by Kevin Feasel

Kevin Sookocheff busts out the 2×2 matrix:

Application Portfolio Management (APM) draws inspiration from financial portfolio management, which has been around since at least the 1970s. By looking at all applications and services in the organization and analyzing their costs and benefits, you can determine the most effective way to manage them as part of a larger overall strategy. This allows the architect or engineering leader to take a more strategic approach to managing their application portfolio backed by data. Portfolio management is crucial for creating a holistic view of your team’s technology landscape and making sure that it aligns with business goals.

This is for C-levels and VPs rather than individual contributors, but acts as a good way of thinking about a portfolio of applications and what to do with each.

Comments closed

Converting JSON to a Relational Schema with KQL

Published 2023-04-28 by Kevin Feasel

Devang Shah does some flattening and moving:

In the world of IoT devices, industrial historians, infrastructure and application logs, and metrics, machine-generated or software-generated telemetry, there are often scenarios where the upstream data producer produces data in non-standard schemas, formats, and structures that often make it difficult to analyze the data contained in these at scale. Azure Data Explorer provides some useful features to run meaningful, fast, and interactive analytics on such heterogenous data structures and formats.

In this blog, we’re taking an example of a complex JSON file as shown in the screenshot below. You can access the JSON file from this GitHub page to try the steps below.

Click through for the example, which is definitely non-trivial.

Comments closed

Common Challenges Implementing PySpark Code

Published 2023-04-27 by Kevin Feasel

Amlan Patnaik looks at some common implementation problems:

Pyspark has become one of the most popular tools for data processing and data engineering applications. It is a fast and efficient tool that can handle large volumes of data and provide scalable data processing capabilities. However, Pyspark applications also come with their own set of challenges that data engineers face on a day-to-day basis. In this article, we will discuss some of the common challenges faced by data engineers in Pyspark applications and the possible solutions to overcome these challenges.

Read on for five such challenges.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts