Curated SQL – Page 154 – A Fine Slice Of SQL Server

Hot and Cold Partitions for Apache Kafka Data

Published 2024-07-01 by Kevin Feasel

At first, data tiering was a tactic used by storage systems to reduce data storage costs. This involved grouping data that was not accessed as often into more affordable, if less effective, storage array choices. Data that has been idle for a year or more, for example, may be moved from an expensive Flash tier to a more affordable SATA disk tier. Even though they are quite costly, SSDs and flash can be categorized as high-performance storage classes. Smaller datasets that are actively used and require the maximum performance are usually stored in Flash.

Cloud data tiering has gained popularity as customers seek alternative options for tiering or archiving data to a public cloud. Public clouds presently offer a mix of object and file storage options. Object storage classes such as Amazon S3 and Azure Blob (Azure Storage) deliver significant cost efficiency and all the benefits of object storage without the complexities of setup and management.

Read on for an architecture that uses hot and cold tiers, as well as how you can set it up on an existing Kafka topic.

Comments closed

SHAP and Additive Models

Published 2024-07-01 by Kevin Feasel

Michael Mayer answers a pair of related questions:

Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.

What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?

Read on for the answers to these two questions.

Comments closed

Exporting XML Files from SQL Server via Powershell

Published 2024-07-01 by Kevin Feasel

Vlad Drumea extracts some data:

SSMS doesn’t really have a way to output XML data into files directly, and doing it manually starts to become a pain after the 3rd or 4th file.

You can do it via SSIS, but not everyone has SSIS installed or the time to figure it out.

Admittedly, I’d start with SSIS myself, but it’s good to have multiple ways of solving a problem.

Comments closed

SQL Server Security Series Wrap-Up

Published 2024-07-01 by Kevin Feasel

Mike Walsh puts a bow on it:

Thanks for tuning into our posts for the 30 SQL Server security checks in 30 days series this month. I want to recap the entire month of posts with a few homework assignments to get you started today.

Read on for those three assignments, including adopting a security mindset, remembering that humans tend to be the weak points of security, and trying out sp_CheckSecurity.

Comments closed

Connecting to Azure SQL DB using a Managed Identity

Published 2024-07-01 by Kevin Feasel

Jaime Garcia de Alba makes a connection:

In this guide I am going to show steps to connect Windows Azure VM to Azure SQL DB using Managed Identity covering create user in Azure SQL DB , connect using SSMS and connect using powershell

Click through to see how you can get this information from an Azure VM and connect to the SQL Server instance either via SSMS or Powershell.

Comments closed

DAX Memory Usage from Filtering on a Table

Published 2024-07-01 by Kevin Feasel

Chris Webb talks Power BI anti-patterns:

Following on from my last post on the Query Memory Limit in Power BI, in this post I want to look at one of the most common DAX antipatterns and its effect on query memory usage: filtering on a whole table, rather than an individual column, in the filter parameters of the Calculate() function. A lot has already been written on this particular antipattern from the point of view of query performance – for example see here and here – but it’s only with the recent addition of the Execution Metrics Profiler/Log Analytics event that you can see how bad it is for memory usage too.

Read on for an example. In Chris’s case, there’s a 3x memory difference between doing the right thing and the wrong thing, so this can easily add up.

Comments closed

Random Walks and Brownian Motion in healthyR.ts

Published 2024-06-28 by Kevin Feasel

Steven Sanderson goes for a walk on the stock exchange:

In the world of time series analysis, Random Walks, Brownian Motion, and Geometric Brownian Motion are fundamental concepts used in various fields, including finance, physics, and biology. Today, we’ll explore these concepts using functions from the healthyR.ts package.

Click through to learn about each of these concepts and some examples of how you can generate time series datasets following each of them.

Comments closed

Random Date Generation in Python

Published 2024-06-28 by Kevin Feasel

Chris LaGreca spits out some dates:

I often work with time series data and find it useful to have a variety of ways to randomly generate dates. This particular example is great for evenly distributed date partitions. Running the script below with the default arguments will output a list of random dates, one for each month of the year.

It looks like this is generating based off of a uniform distribution, which probably makes the most sense for “give me a day of the month” data generation.

Comments closed

Indexing for Deletion

Published 2024-06-28 by Kevin Feasel

Jared Westover needs to delete some rows:

In this article, we’ll examine the importance of indexes in improving DELETE statements. Also, many developers assume that adding a foreign key creates an index, which means they’re typically missing. How can you identify which foreign keys are missing indexes? Don’t worry. A handy DMV helps find them. Starting today, you’ll have the skills to make your DELETE statements faster.

Click through for the full post.

Comments closed

Remove Duplicate Records from CSV Files

Published 2024-06-28 by Kevin Feasel

Chad Callihan gets rid of dupes:

In a recent comment on a my post about combining CSV files using PowerShell, a reader named Oliver asked how duplicate values should be handled when combining multiple CSV records. I added a suggestion to that post but figured, “Why not elaborate more on a post of its own?”

Why not, indeed? Click through for the solution.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts