Press "Enter" to skip to content

Month: August 2024

Tips for Bringing a Streamlit App into Production

I have wrapped up another series:

In this video, I discuss some of the things you should consider as you transition a Streamlit application from development to production. We will cover four methods of bringing a Streamlit app to production and some thoughts on performance optimization.

This one doesn’t have much in the way of demos, but I do spend a lot of time at the virtual whiteboard, so it’s got that going for it.

Comments closed

Enumerating Causes of Dirty and Incomplete Data

Joe Celko builds a list and checks it twice:

Many years ago, my wife and I wrote an article for Datamation, a major trade publication at the time, under the title, “Don’t Warehouse Dirty Data!” It’s been referenced quite a few times over the decades but is nowhere to be found using Google these days. The point is, if you have written a report using data, you have no doubt felt the pain of dirty data and it is nothing new.

However, what we never got around to defining was exactly how data gets dirty. Let’s look at some of the ways data get messed up.

I am very slowly working up the nerve to build a longer talk (and YouTube series) on data engineering, and part of that involves understanding why our data tends to be such a mess. Joe has several examples and stories, and I’m sure we could come up with dozens of other reasons.

Comments closed

Eliminating Unexpected Whitespace or Null Characters in Columns

Kevin Wilkie has fun with an unexpected character:

The query ran just fine so I happily completed my work for the day. I sent it to my QA team and had them check it out before pushing it to Production. They sent it back with a note saying “We expect a lot more “Insert Garbage Data Here” to show.

Being a good data developer – or someone who just wants to show QA up (take your pick) – decided to dig deep into what FieldName really shows. I found a lot of the following:

In Kevin’s case, he tried to use TRIM() and it didn’t fix anything. That’s because TRIM(), by default, only removes the space character (up to the first non-space character), not all whitespace and not the null character CHAR(0) that Kevin found.

An alternative version of TRIM() that would have worked in this case, plus adding in tabstops as well, would be:

SELECT
    TRIM(' '+char(0)+char(9) FROM FieldName) AS TrimmedFieldName
FROM dbo.SomeTable;
GO

And you could also extend that to include newlines, line feeds, vertical tabs, the line separator character, and whatever else you need.

Comments closed

Loading Scanner API JSON Data from Fabric into Power BI Desktop

Gilbert Quevauvilliers hunts for some output files:

How to get data from a Fabric Lakehouse File into Power BI Desktop – Using Scanner API JSON

In this blog post I am going to show you how I connected to my Scanner API JSON file which is stored in the files section of my Microsoft Fabric Lakehouse.

Full credit on how to complete this comes from Marc’s blog post https://data-marc.com/2023/08/25/access-onelake-files-from-power-bi-desktop/

Click through for the instructions.

Comments closed

Finding Azure Region Pairs via Powershell

Mike Robbins is looking for a match:

In this article, you learn how to add a custom property to the Azure PowerShell Get-AzLocation cmdlet output to display the physical location for paired regions in Azure. This information isn’t available by default but can help you understand the resiliency and redundancy of your Azure resources like geo-redundant storage (GRS) and other Azure services that rely on Azure Storage for replication.

Read on to see how to do this.

Comments closed

Searching for Wildcard Characters in LIKE

Andy Brownsword is looking for a discount:

Performing a wildcard search by throwing a % into a LIKE expression is bread and butter. How do we handle this when we actually want to search for the wildcard though?

This was an issue I first saw early in my career before I was even building database solutions. The business had a back office solution where you could search through offers on the UI. Unfortunately it threw up some strange results.

Searching for ‘30%’ offers would return more than expected.

Read on for an example of the problem, as well as how you can resolve it.

Comments closed

A Primer on Vector Databases

Brendan Tierney gives us an overview of vector databases:

A Vector Database is a specialized database designed to efficiently store, search, and retrieve high-dimensional vectors, which are often used to represent complex data like images, text, or audio. Vector Databases handle the growing need for managing unstructured and semi-structured data generated by AI models, particularly in applications such as recommendation systems, similarity search, and natural language processing. By enabling fast and scalable operations on vector embeddings, vector databases play a crucial role in unlocking the power of modern AI and machine learning applications.

It’s interesting to see this pop up as a standalone database type (e.g., chromadb), though we’re also seeing some existing players like Postgres support vector database functionality via extension.

Comments closed

vCore-Based Subscription Limits for Azure SQL DB and Synapse Dedicated SQL Pools

Raj Tiwari announces a change in subscription limits:

New vCore based limits: The new limits will be based on vCores per Subscription per Region, which will be directly equivalent to DTU and DWU.

Default logical servers limit: The previous limits on Logical Server DTUs have been discontinued. All new and existing subscriptions will now have a default limit of 250 logical servers. 

Configurable vCore limits: Subscription vCore limits can now be easily managed through the support section on the Azure Portal, with approvals typically processed within minutes.

Read on to learn more about these limits and how you could extend them.

Comments closed

Migrating SQL Server Database Files between Storage Subsystems

Andy Yun does a bit of shuffling:

In my role at Pure Storage, I often engage with customers who wish to migrate their SQL Server databases off of their prior storage onto our hardware. And after some digging around for prior-published material, I was surprised to find that there really wasn’t much that was comprehensive. After all, one doesn’t change SANs too often. But when it does happen, it is nice to have some reference material from others who have. So I decided to try and give a good overview of how I’d approach the challenge.

This is meant to be a “food for thought” kind of post. I’m going to keep things somewhat high level, but will provide links to other blogs and material that can help you continue down whatever path you choose. And for simplicity, I’m going to limit this scope to a single SQL Server.

Read on for a few questions you should answer, followed by some notes and preferences. Andy’s filegroups tip is also a really good one.

Comments closed