Press "Enter" to skip to content

Curated SQL Posts

PostgreSQL and Indexing on EXTRACT()

Henrietta Dombrovskaya troubleshoots a performance problem:

It’s Christmas time and relatively quiet in my day job, so let’s make it story time again! One more tale from the trenches: how wrong you can go with one table and one index?

Several weeks ago, a user asked me why one of the queries had an “inconsistent performance.” According to the user, “Sometimes it takes three minutes, sometimes thirty, or just never finishes.” After taking a look at the query, I could tell that the actual problem was not the 30+ minutes, but 3 minutes – when you have a several hundred million row table and your select yields just over a thousand rows, it’s a classical “short query,” so you should be able to get results in milliseconds.

Read on for the problem, as well as how Henrietta was able to coerce the PostgreSQL optimizer into choosing the correct path.

Comments closed

Microsoft Fabric and Power Platform Resources

Jon Voege has a collection of links for us:

This week, to round off the year, we try something different. I wanted to throw a shout out to all the community heroes out there, who also help make the most of Microsoft Fabric, through the use of Microsoft Power Platform (and vice versa).

Also, I wanted to highlight some of their contributions, and hopefully give you all a list of resources to peruse.

Click through for more than 20 links, showing how you can work with Power Automate, Power Apps, Power Pages, and data in Dataverse from Microsoft Fabric.

Comments closed

Mathematical Transformations of Data in R

Steven Sanderson does the math:

Data transformation is a fundamental technique in statistical analysis and data preprocessing. When working with R, understanding how to properly transform data can help meet statistical assumptions, normalize distributions, and improve the accuracy of your analyses. This comprehensive guide will walk you through implementing and visualizing the most common data transformations in R: logarithmic, square root, and cube root transformations, using only base R functions.

Click through for examples.

Comments closed

Azure AI Foundry Notes

Tomaz Kastrun wraps up a series on Azure AI. First up is tracing in Azure AI Foundry:

Tracing is a powerful tool that offers developers an in-depth understanding of the execution process of their generative AI applications. Though still in preview (in the time of writing this post), It provides a detailed view of the execution flow of the application and the essential information for debugging or optimisations.

After that, we can see how to evaluate model results:

With evaluation you performing iterative, systematic evaluations with the right evaluators and measure and address potential response quality, safety, or security concerns throughout the AI development lifecycle, from initial model selection through post-production monitoring.

With the Evaluation in Azure AI Foundry, you can evaluation the GenAI Ops Lifecycle production. In addition, it also gives you the ability to  assess the frequency and severity of content risks or undesirable behavior in AI responses.

Finally, Tomaz wraps up the series with some notes on documentation:

Documentation and material for Azure AI Foundry are plentiful and growing on a daily basis, since the topic on AI and GenAI is evermore so popular.

I appreciate the challenge that Tomaz has of putting together 25 blog posts in a month, especially when they’re all tied to a single theme.

Comments closed

Bulk Inserts and High Unused Space in SQL Server Tables

Vitaly Bruk works through an issue:

High allocated unused space is storage assigned to a SQL Server table that isn’t used. This condition often indicates internal fragmentation. Free space is present within allocated pages. Such fragmentation leads to inefficient storage and can degrade database performance.

Read on for an explanation of the issue, followed by a real-world situation whose ultimate cause was bulk insert operations.

Comments closed

Prompt Flow in Azure AI

Tomaz Kastrun continues a series on Azure AI. First up is an introduction to Prompt Flow:

Prompt flow in Azure AI Foundry is development tool for designing the flows (streamlines) for the complete end-to-end development cycle of LLM’s AI application. You can create, iterate, test, orchestrate, debug, and monitor your flows.

After that, we get a demonstration a Prompt Flow in Python:

Prompty gives you the ability to create an end-to-end solution, like RAG where you can chat with LLM over an article or document, where you can ask to classify the input data (list of URLs,…)

Prompty is a markdown file, structured in YAML and encapsulates a series of metadata fields pivotal for defining the model’s configuration and the inputs. After this front matter is the prompt template, articulated in the Jinja format.

Comments closed

Session, DataFrameWriter, and Table Configurations in Spark

Miles Cole makes a configuration change:

With Spark and Delta Lake, just like with Hudi and Iceberg, there are several ways to enable or disable settings that impact how tables are created. These settings may affect data layout or table format features, but it can be confusing to understand why different methods exist, when each should be used, and how property inheritance works.

While platform defaults should account for most use cases, Spark provides flexibility to optimize various workloads, whether adjusting for read or write performance, or for hot or cold path data processing. Inevitably, the need to adjust configurations from the default will arise. So, how do we do this effectively?

Read on to learn how.

Comments closed

Securing a Kafka Ecosystem

Riya has a breakdown of how to protect your Apache Kafka installation and resources around it:

Apache Kafka is the backbone of many real-time data pipelines, making security an essential aspect of its deployment. Protecting your Kafka ecosystem involves implementing encryption to safeguard data, authentication to verify user identities, and authorization to control access. This guide provides a comprehensive overview of these three pillars of securing Kafka, complete with code examples to help you implement best practices.

Click through for demonstrations of encryption, authentication, and authorization.

Comments closed

The Cost of Everything, Cloud Edition

Kevin Sookocheff noodles on a core concept:

At AWS re:Invent 2023, Amazon CTO Werner Vogels delivered a talk on the laws of frugal architecture. While I initially filed away those insights to review later, a year of cloud architecture experience crystallized a fundamental truth: in cloud computing, cost isn’t just a financial consideration — it is a first-class architectural concern through which we should design and optimize our systems.

Cloud providers charge for every conceivable resource: servers, API calls, data transfer, and computational milliseconds. But cost is more than just a line item on a monthly bill, it is a powerful forcing function that drives better architectural decisions.

Read on for more thoughts on the matter.

Comments closed