Press "Enter" to skip to content

Curated SQL Posts

Bulk Inserts and High Unused Space in SQL Server Tables

Vitaly Bruk works through an issue:

High allocated unused space is storage assigned to a SQL Server table that isn’t used. This condition often indicates internal fragmentation. Free space is present within allocated pages. Such fragmentation leads to inefficient storage and can degrade database performance.

Read on for an explanation of the issue, followed by a real-world situation whose ultimate cause was bulk insert operations.

Comments closed

Prompt Flow in Azure AI

Tomaz Kastrun continues a series on Azure AI. First up is an introduction to Prompt Flow:

Prompt flow in Azure AI Foundry is development tool for designing the flows (streamlines) for the complete end-to-end development cycle of LLM’s AI application. You can create, iterate, test, orchestrate, debug, and monitor your flows.

After that, we get a demonstration a Prompt Flow in Python:

Prompty gives you the ability to create an end-to-end solution, like RAG where you can chat with LLM over an article or document, where you can ask to classify the input data (list of URLs,…)

Prompty is a markdown file, structured in YAML and encapsulates a series of metadata fields pivotal for defining the model’s configuration and the inputs. After this front matter is the prompt template, articulated in the Jinja format.

Comments closed

Session, DataFrameWriter, and Table Configurations in Spark

Miles Cole makes a configuration change:

With Spark and Delta Lake, just like with Hudi and Iceberg, there are several ways to enable or disable settings that impact how tables are created. These settings may affect data layout or table format features, but it can be confusing to understand why different methods exist, when each should be used, and how property inheritance works.

While platform defaults should account for most use cases, Spark provides flexibility to optimize various workloads, whether adjusting for read or write performance, or for hot or cold path data processing. Inevitably, the need to adjust configurations from the default will arise. So, how do we do this effectively?

Read on to learn how.

Comments closed

Securing a Kafka Ecosystem

Riya has a breakdown of how to protect your Apache Kafka installation and resources around it:

Apache Kafka is the backbone of many real-time data pipelines, making security an essential aspect of its deployment. Protecting your Kafka ecosystem involves implementing encryption to safeguard data, authentication to verify user identities, and authorization to control access. This guide provides a comprehensive overview of these three pillars of securing Kafka, complete with code examples to help you implement best practices.

Click through for demonstrations of encryption, authentication, and authorization.

Comments closed

The Cost of Everything, Cloud Edition

Kevin Sookocheff noodles on a core concept:

At AWS re:Invent 2023, Amazon CTO Werner Vogels delivered a talk on the laws of frugal architecture. While I initially filed away those insights to review later, a year of cloud architecture experience crystallized a fundamental truth: in cloud computing, cost isn’t just a financial consideration — it is a first-class architectural concern through which we should design and optimize our systems.

Cloud providers charge for every conceivable resource: servers, API calls, data transfer, and computational milliseconds. But cost is more than just a line item on a monthly bill, it is a powerful forcing function that drives better architectural decisions.

Read on for more thoughts on the matter.

Comments closed

Writing Loops in Powershell

Adam Bertram is spinning in circles:

One of PowerShell’s key features is its ability to loop through data collections, enabling you to execute a code block repeatedly.

In this tutorial, we will explore three fundamental loop structures in PowerShell: foreachfor, and do loops.

By the end of this tutorial, you will have a solid understanding of how to use these loops to handle repetitive tasks and process collections of data more effectively.

Click through for the tutorial.

Comments closed

Filter and Parameter Passing from Power BI to Power Apps

Jon Voege runs into a problem:

A simple comment App embedded in Power BI was supposed to show all the comments in a Sharepoint List which matched the filters set in Power BI. But…. Not all comments were showing up, even though the Sharepoint List only held about 10 comments at the time!

Why you ask? Well, the issue stems from the fact that the table of data being sent from Power BI to Power Apps, the PowerBIIntegration.Data object, has a hard limitation of 1000 rows. And those 1000 rows, is all that we get to work with.

Read on to learn more, as well as a workaround to this issue.

Comments closed

Models and Endpoints in Azure AI Foundry

Tomaz Kastrun continues a series on Azure AI:

Models from the model catalog can be deployed using programming languages or using the Foundry studio.

Model deployment has two types: Deploy from the base model or deploy from the fine-tuned model. The difference is that fine-tuned model is model taken from the model catalog and later tuned to an additional dataset, as the base model is the model as it is available in Azure AI Foundry.

Click through for a bit more information on the process.

Comments closed

Switching between Python and PySpark Notebooks in Fabric

Sandeep Pawar wants to save some money:

File this under a test I have been wanting to do for some time. If I am exploring some data in a Fabric notebook using PySpark, can I switch between Python and PySpark engines with minimal code changes in an interactive session? The goal is to use the Python notebook for some exploration or use existing PySpark/SparkSQL or develop the logic in a low compute environment (to save CUs) and scale it in a distributed Spark environment. Understandably, there will be limitations with this approach given the difference in environments, configs etc., but can it be done?

Read on for the answer, as well as plenty of notes around it.

Comments closed

Partitioned Tables and Faster Performance

Henrietta Dombrovskaya provides a warning:

I am unsure whether others have this problem, but I find myself explaining the benefits9and what’s not)of partitioning tables in Postgres over and over again.

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).

Henrietta’s example is specifically around Postgres, but it applies to SQL Server all the same: partitioning is primarily a maintenance benefit rather than a performance benefit.

Comments closed