Curated SQL – Page 293 – A Fine Slice Of SQL Server

In data preprocessing and text manipulation tasks, the strsplit() function in R is incredibly useful for splitting strings based on specific delimiters. However, what if you need to split a string using multiple delimiters? This is where strsplit() can really shine by allowing you to specify a regular expression that defines these delimiters. In this blog post, we’ll dive into how you can use strsplit() effectively with multiple delimiters to parse strings in your data.

Read on for two examples of complex scenarios.

Comments closed

A Primer on Transactional Replication

Published 2024-04-30 by Kevin Feasel

Steve Stedman talks transactional replication:

Ensuring that your databases are synchronized across different locations with minimal delay is not just a convenience—it’s a necessity. This is where transactional replication in SQL Server shines, making it a pivotal strategy for systems that require real-time data replication with high consistency. Our latest video, “Transactional Replication in SQL Server”, dives deep into this topic, offering insights and visual walkthroughs that are invaluable for database administrators and developers.

Click through for the video and how the pieces fit together for transactional replication at a high level.

Comments closed

Understanding the Delta Lake Format

Published 2024-04-30 by Kevin Feasel

Reza Rad has a new post and video combo:

Please don’t get lost in the terminology pit regarding analytics. You have probably heard of Lake Structure, Data Lake, Lakehouse, Delta Tables, and Delta Lake. They all sound the same! Of course, I am not here to talk about all of them; I am here to explain what Delta Lake is.

Delta Lake is an open-source standard for Apache Spark workloads (and a few others). It is not specific to Microsoft; other vendors are using it, too. This open-source standard format stores table data in a way that can be beneficial for many purposes.

In other words, when you create a table in a Lakehouse in Fabric, the underlying structure of files and folders for that table is stored in a structure (or we can call it format) called Delta Lake.

Read on to learn more about this open standard and how it all fits together with Microsoft Fabric.

Comments closed

Data Compression and Data Type Changes

Published 2024-04-30 by Kevin Feasel

Bob Pusateri asks the important questions:

A few different times I have been asked one or more forms of the following question:

Can datatypes be changed faster with data compression enabled?

I’ve always replied that I’m pretty sure compression will help in this situation, because based on my understanding, it should. But I’ve never had any actual data to back up this belief. Until now. I recently set up a demonstration to test this, and I’m very happy to share the results.

If you want to see the results, you’re going to have to read Bob’s article.

Comments closed

Power BI Model Size and Memory Usage

Published 2024-04-30 by Kevin Feasel

Chris Webb lays out the limitations:

You probably know that semantic models in Power BI can use a fixed amount of memory. This is true of all types of semantic model – Import, Direct Lake and DirectQuery – but it’s not something you usually need to worry about for DirectQuery mode. The amount of memory they can use depends on whether you’re using Shared (aka Pro) or a Premium/Fabric capacity, and if you’re using a capacity how large that capacity is. In Shared/Pro the maximum amount of memory that a semantic model can use is 1GB; if you are using a capacity then the amount of memory available for models in each SKU is documented in the table here in the Max Memory column:

Read on to learn more.

Comments closed

Evenly Spacing Month Charts in ggplot2

Published 2024-04-29 by Kevin Feasel

Jameson Marriott fixes a spacing issue:

I recently noticed that ggplot2 spaces date axes literally even when grouped by month. I’ve been using ggplot2 extensively for years and I don’t remember noticing before, so this is not really a big deal, but now that I know it bugs me a lot. Take a look below.

I don’t think I had noticed this before either, though now that Jameson has pointed it out, it certainly is annoying. H/T R-Bloggers.

Comments closed

Monitoring ML Models in production

Published 2024-04-29 by Kevin Feasel

Thomas Sobolik and Leopold Boudard talk model drift:

Regardless of how much effort teams put into developing, training, and evaluating ML models before they deploy, their functionality inevitably degrades over time due to several factors. Unlike with conventional applications, even subtle trends in the production environment a model operates in can radically alter its behavior. This is especially true of more advanced models that use deep learning and other non-deterministic techniques. It’s not enough to track the health and throughput of your deployed ML service alone. In order to maintain the accuracy and effectiveness of your model, you need to continuously evaluate its performance and identify regressions so that you can retrain, fine-tune, and redeploy at an optimal cadence.

In this post, we’ll discuss key metrics and strategies for monitoring the functional performance of your ML models in production […]

Click through for the article. There’s a Datadog pitch at the end, but the info is useful regardless of which tool you’re using for monitoring.

Comments closed

Legacy Power BI Apps Going Away

Published 2024-04-29 by Kevin Feasel

Nicky van Vroenhoven shares a public service announcement:

In case you missed the official blog post 2 months ago, I suggest you read my blog post 🙂

Or if you want you can refer to the official blog here: Announcing the retirement of legacy Power BI Apps (pre-audiences).

Already on March 6, 2023(!), Power BI apps with multiple audiences went Generally Available.

Read on for more information, with the note that these things will disappear soon—May 1, 2004 is the date of retirement.

Comments closed

Switching All SQL Server Databases to Simple Recovery Model

Published 2024-04-29 by Kevin Feasel

Vlad Drumea doesn’t need no steenkin’ transaction log backups:

This brief post contains a script that can help switch a whole SQL Server instance, model and all user databases, to SIMPLE recovery.

The script is useful in case of dev/test/QA/UAT instances that have been left by accident to use the default FULL recovery model, yet do not have or need transaction log backups.

Read on for the script. It also shrinks the transaction log file after the switch-over.

Comments closed

Digging into Azure Elastic Jobs

Published 2024-04-29 by Kevin Feasel

Rod Edwards has a job to do:

After a lengthy period in Public Preview it seems, the boffins at Microsoft have finally pushed Elastic Jobs for SQL Azure DB to general availability. Hooray!

But what are Elastic Jobs? And why would I want to use them in SQL Azure DB?

That’s one of the things you’ll learn in this post.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Curated SQL Posts