June 2024 – Page 12 – Curated SQL

Auditing a SQL Server: Discovery and Documentation

Published 2024-06-06 by Kevin Feasel

Inheriting a server, whether as an inexperienced user or an experienced DBA, has many challenges. It’s very helpful to evaluate the servers, document issues, and record the current configuration. It can also be beneficial to evaluate the current state of servers you have owned since they were built or even in preparation for a formal audit. The discovery and documentation phase of an audit will set you up for later detailed audits, or it may serve as the complete scope of the audit.

This is the first part of a series on evaluating and auditing SQL Server and Azure SQL Database. Auditing SQL is a very broad topic, so I have broken it down into several sections. This section will cover the major categories that should happen in a basic SQL Server discovery audit. An initial examination of your environment is primarily documentation and looking for critical issues. This includes basic server and SQL engine configuration, physical configuration items such as disk and memory, critical items such as backup state, database configuration, basic code smells, application integration, and high-level security configuration.

Read on for some of the things Ben looks at.

Comments closed

UNISTR() and || in Azure SQL Database

Published 2024-06-06 by Kevin Feasel

Abhiman Tiwari announces a new function and a new operator:

We are excited to announce that the UNISTR intrinsic function and ANSI SQL concatenation operator (||) are now available in public preview in Azure SQL Database. The UNISTR function allows you to escape Unicode characters, making it easier to work with international text. The ANSI SQL concatenation operator (||) provides a simple and intuitive way to combine characters or binary strings. These new features will enhance your ability to manipulate and work with text data.

Click through to learn more about both. Honestly, I’d rather stick with CONCAT() versus using || because of how CONCAT() handles NULL without me having to check every operand first.

Comments closed

Getting the Top N Results in a PySpark Notebook

Published 2024-06-06 by Kevin Feasel

Gilbert Quevauvilliers only needs the top 1:

How to get the TopN rows using Python in Fabric Notebooks

When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution.

In today’s blog post I had a situation where I wanted to get a single row with the highest duration.

Gilbert uses the Spark SQL version, specifically the Python function variant. You could also use Spark SQL and write a query using the LIMIT operator.

Comments closed

Environment Variables in SSIS

Published 2024-06-06 by Kevin Feasel

Andy Brownsword continues a series on SSIS:

Yep it’s more SSIS again this week. Here we’ll be looking at using Environment configuration within the SSIS catalog. This allows sets of parameters to be defined and used across multiple projects and packages which share common values.

This approach can either be used as a central point for configuration, or you could use multiple configurations for the same packages.

Read on for some examples of how you might use them, as well as the process to create one.

Comments closed

An Overview of Logistic Regression

Published 2024-06-05 by Kevin Feasel

I have a new video:

In this video, I provide a primer on logistic regression, including a demystification of the name. Is it regression? Is it classification? Find out!

I have a lot of fun with this “Is logistic regression actually a regression technique, or is it secretly a classification technique?” I think this video is the single clearest explanation I’ve given on that question, which probably says something about my prior explanations.

Comments closed

Dual-Write Issues and Kafka

Published 2024-06-05 by Kevin Feasel

Wade Waldron solves a common but difficult problem:

However, the dual-write problem isn’t unique to event-driven systems or Kafka. It occurs in many situations involving different technologies and architectures.

When I started building event-driven systems, I encountered the dual-write problem almost immediately. I eventually learned effective ways to solve it but tripped over some anti-patterns along the way.

I want to break down the details of the dual-write problem so you can understand how it occurs and avoid making the same mistakes I did. I’ll outline a few anti-patterns that might look promising, but don’t solve the problem. Finally, we’ll look at accepted solutions that eliminate the dual-write problem.

Read on for a few techniques that will not work (assuming you are using Apache Kafka to flow events into some external systems) and some that will.

Comments closed

Working with XML in SQL Server

Published 2024-06-05 by Kevin Feasel

Ed Pollack talks XML:

XML is a common storage format for data, metadata, parameters, or other semi-structured data. Because of this, it often finds its way into SQL Server databases and needs to be managed alongside other data types.

Even though a relational database is not the optimal place to store and manage XML data, it is often needed due to application requirements, convenience, or a need to maintain this information in close proximity to other app data.

This article dives into a variety of common XML challenges and the functionality included in SQL Server to help make managing them as simple as possible.

Ed does a good job of walking through what you can do. My general philosophy on XML and JSON in the database is simple: if you simply want a place to store some JSON or XML outputs and retrieve the results exactly as they are without performing any searches or transformations, write as JSON/XML. If you want to use the database to search through JSON/XML records for particular attributes and values, or if you want to reshape the JSON/XML data within the database, create a proper data model for this input.

Comments closed

Circular Foreign Key Dependencies in Postgres

Published 2024-06-05 by Kevin Feasel

Hans-Juergen Schoenig causes and fixes a problem:

In this case, we want to store departments and employees. Every department will need a leader, and every employee will need a department. We cannot have a department without a department leader – but we cannot have an employee without a department either.

Click through to see how you can resolve this kind of paradox with Postgres.

Comments closed

Working with Oracle OCI Object Storage

Published 2024-06-05 by Kevin Feasel

Brendan Tierney peruses buckets:

This blog post will walk you through how to access Oracle OCI Object Storage and explore what buckets and files you have there, using Python and the OCI Python library. There will be additional posts which will walk through some of the other typical tasks you’ll need to perform with moving files into and out of OCI Object Storage.

It looks like the interface for this is substantially similar to AWS’s S3.

Comments closed

Implementing Field Parameters in Power BI

Published 2024-06-05 by Kevin Feasel

Nikola Ilic shows how to set up field parameters in Power BI:

To be honest, I can’t remember that one Power BI feature caused so much hype as the Fields parameter, even though it was introduced exactly 2 years ago (May 2022)! I firmly believe that the Fields parameter is one of the things that will forever change the way we are building user experience in Power BI.

One important disclaimer before we jump into the action: Field params is still a preview feature. That means, don’t be surprised if you open Power BI Desktop and you don’t see an option to use Field params. You first need to enable this feature under Options & Settings -> Options -> Preview features.

Read on to see how you can use field parameters to make reports more dynamic.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Month: June 2024