Press "Enter" to skip to content

Month: September 2024

Discerning a Star Schema from an Existing Report

Kelly Broekstra describes a common flow for business intelligence projects:

I have worked as a business intelligence developer for several years, and I’m always asked: “How do you convert user requirements to a functioning data model?”

I follow the Kimball methodology. For more information, check out the official pages.

But, here are some specific tips on what works for me.

Click through for those tips.

Leave a Comment

Microsoft Fabric Direct Lake and Reframing Operations

Reza Rad changes the frame:

Power BI offers a new type of connection to Microsoft Fabric Lakehouse or Warehouse, called Direct Lake. The Direct Lake connection acts like DirectQuery and won’t need the data to be refreshed. However, the Power BI semantic model has refresh settings that can be turned on or off. In this article and video, you will learn about the Refresh settings for the Power BI semantic model that is connected using a Direct Lake connection, what that is, and why it is called Reframe.

Read on to learn more, or to check out the video.

Leave a Comment

mssql-tools 18 and Two Common Errors

Vlad Drumea covers a pair of errors you might run into with mssql-tools version 18:

In this post I cover the 0A000086 and “command not found” errors that you might encounter with the new version of SQL Server command-line tools, namely sqlcmd and bcp, for Linux.

While the latest version of SQL Server command-line tools, based on Microsoft ODBC 18, brings improvements, it also brings some gotchas that can break your automations.

Read on to learn more about each.

Leave a Comment

The Brokenness of TABLESAMPLE

Paul White walks us through some issues:

Initial testing went well, which was a pleasant surprise. Soon enough though, errors started to appear in the tool’s output. That’s not entirely unexpected since ensuring consistent results under high concurrency tends to expose all sorts of niggly edge cases. It’s still an annoyance because debugging edge cases in trigger code can be tricky and laborious.

What was a surprise though was the nature of the error messages.

Read on for the full story. Paul has also created a feedback issue covering a problem with the function.

Leave a Comment

Microsoft Fabric Capacities and Reserved Instances

Marc Lelijveld shares an experience:

Last week, I had a situation in which a client wanted to purchase a reserved instance Fabric capacity. Me being me, I assumed it would be super straight forward to purchase through Azure. However, at some point I was lost in the process where the official documentation confused even more. In the end, I figured out and managed to get a capacity running based on the Reserved Instance pricing. I didn’t find any other blogs or articles describing this confusion or specific case. Therefore, I decided to write down my thoughts and findings in a blog.

This blog is not only relevant if you work with Microsoft Fabric, but also for anyone currently working with Power BI Premium. Given the deprecation of Power BI Premium capacities, you have to switch to Fabric capacities sooner or later.

Read on to learn more about the differences between pay-as-you-go and reserved instance capacities, the process to make a reservation, and what comes after that before you have a Microsoft Fabric capacity ready to go.

Leave a Comment

From Pandas to Polars

Ari Lamstein explains why it might be worth a switch:

I recently decided to switch from Pandas to Polars for my Python projects that use dataframes. I came to this decision while taking a workshop on Polars last week: I found its syntax to be so intuitive that I couldn’t justify continuing to try to get “better” at Pandas, despite Pandas being the more established library. The fact that Polars is faster (it’s main selling point) was, surprisingly, not a factor in my decision.

A similar transformation recently happened in R. For most of the history of R there was only one way to interact with dataframes: Base R. Then the Tidyverse came along, and offered both performance improvements and easier syntax. Eventually the Tidyverse became the primary way that many people interact with dataframes. I believe that the Tidyverse’s easier syntax is what led to its widespread adoption, and I think that something similar is likely to happen with Polars.

Click through for Ari’s thoughts on the matter. H/T R-Bloggers.

Leave a Comment

Indexing Vector Databases

Brendan Tierney continues a series on vector databases:

In this post on Vector Databases, I’ll explore some of the commonly used indexing techniques available in Databases. I’ll also explore the Vector Indexes available in Oracle 23c. Be sure to check that section towards the end of the post, where I’ll also include links to other articles in this series.

As with most data in a Databases, indexes are used for fast access to data. They create an organised structure (typically B+ tree) for storing the location of certain values within a table. When searching for data, if an index exists on that data, the index will be used for matching and the location of the records is used to quickly retrieve the data.

Read on to get an idea of what kinds of indexing techniques are useful in that space.

Leave a Comment

Data Compression and CPU Utilization

Kendra Little shares some advice:

Every time I share a recommendation to use data compression in SQL Server to reduce physical IO and keep frequently accessed data pages in memory, I hear the same concern from multiple people: won’t this increase CPU usage for inserts, updates, and deletes?

DBAs have been trained to ask this question by many trainings and a lot of online content – I used to mention this as a tradeoff to think about, myself– but I’ve found this is simply the wrong question to ask.

In this post I’ll share the two questions that are valuable to ask for your workload.

Kendra’s advice is very good, and to add my own two cents to the mix: the last place I was at did, in fact, see a pretty reasonable reduction in CPU utilization by performing page-level compression on any index where it made sense—and this was a very busy OLTP environment. The exceptions would be indexes making prominent use of things like Guids or chunks of binary, which don’t compress very well at all. In all my FTE and consulting years, I’ve never run into a circumstance in which compression caused a significant gain in CPU utilization.

Leave a Comment

Finding Basic Table Information via T-SQL

Andy Brownsword has a script for us:

In Management Studio we can view object details by hitting F7 in Object Explorer. It gives us basic metrics but I find it very slow to load for the details I typically need.

For that reason I though I’d share a script to turn to for metrics I commonly need. This query returns:

  • The table details (schema, name, created date)
  • The primary storage (Heap, Clustered, or Columnstore)
  • The numbmer of Nonclustered / Columnstore Indexes
  • The number of records and rough size for data / indexes

Click through for the script and an example of what it looks like.

Leave a Comment