Press "Enter" to skip to content

Curated SQL Posts

New Video: The Naive Bayes Set of Algorithms

I have a new video:

In this video, I cover a class of algorithm that is neither particularly naive nor particularly Bayesian: Naive Bayes.

I am a bit tongue in cheek with that description, as technically I’ll give you that the class of algorithms is “naive.” But I do still have some fun with the name and then show how we can use Naive Bayes to build a quick-and-dirty model that’s at least somewhat effective.

Comments closed

Orphaned Users in SQL Server

David Seis puts the orphans to work:

In SQL Server, a user becomes ‘orphaned’ when it exists within a database but lacks an associated login at the server level. This typically occurs when a database is either moved or restored to a different SQL Server instance. To understand why, it’s important to note that while logins are created at the server level, users are created at the database level. Each login is linked to a unique Security Identifier (SID). Therefore, during the process of moving or restoring a database, the SIDs may not align correctly, resulting in orphaned users.

Read on for a script to find and fix orphaned users.

Comments closed

Microsoft Fabric: Lakehouse or Warehouse?

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

A TDS Security Non-Issue

Sean Gallardy is not amused:

Security must be making the rounds again as I’ve recently had quite a few inquiries about a “vulnerability” in SQL Server connections. I am in the camp of, if it isn’t easy to exploit or is completely theoretical and the exploit is barely an inconvenience such as, “could allow a local admin user to rearrange the startup order”, I don’t get very excited. Enter in the old-fad-turned-new “vulnerability” of version information. Yes, you didn’t misread that. Version. Information. At the very worst it’s potentially information disclosure and even that is a stretch.

Read on for an eminently sensible take.

Comments closed

Stop and Start Fabric via Power Automate

Gilbert Quevauvilliers saves some money:

Stop and start your Fabric Capacity using Power Automate

With Fabric Capacities trial coming to an end, you need to make sure to stop and start Fabric Capacities.

In my blog post below, I am going to show you how I can start or stop my Fabric Capacity by simply sending an email to myself with the details in the Subject Line to start or stop the capacity.

That’s a pretty neat method, especially if you have odd hours you want to run the capacity.

Comments closed

Dataverse and Microsoft Fabric Gotchas

Marc Lelijveld shares some advice:

Recently, I architected a solution for a client for their Microsoft Fabric data platform. The client works with Dynamics Finance & Operations as one of their main ERP system. Fabric offers easy ways to bring data from various standard Microsoft services into the platform, however it is not always as easy as it looks like. In this blog I will elaborate on the gotcha’s encountered in architecting this solution.

Read on for the challenges that Marc ran into along the way.

Comments closed

Choosing between Data Warehouses, Lakes, and Lakehouses

Den Smyrnov talks architecture:

Historically, the two most popular approaches to storing and managing data are Data Warehouse and Data Lake. The choice between them usually depends on business objectives and needs. While Data Lakes are ideal for preserving large volumes of diverse data, warehouses are more favorable for business intelligence and reporting. Sometimes, organizations try to have the best of both worlds and mix Data Lake & Data Warehouse architectures. This, however, can be a time and cost-consuming process.

Against this backdrop, a new hybrid approach—Data Lakehouse—has emerged. It combines features of a Data Lake and a Data Warehouse, allowing companies to store and analyze data in the same repository and eliminating the Data Warehouse vs. Data Lake dilemma. Data Lakehouse mixes the scalability and flexibility of a Data Lake with the ability to extract insights from data easily. Ever so compelling, this approach still has certain limitations. It should not be treated as a “one-size-fits-all” solution.

Read on for an explanation of each of these three styles, including their pros and cons.

Comments closed

Converting Numbers into Roman Numerals with C#

Adron Hall changes numbering schemes:

Today, I’ll dive into a fascinating challenge: converting modern numbers into their ancient Roman numeral counterparts. The task is straightforward but intricate, involving a programming challenge that takes any positive integer from 1 to 3999 and converts it into the corresponding Roman numeral.

To convert regular decimal numbers into Roman numerals, one must follow a set of rules based on the values and combinations of specific Roman numeral characters. Here’s a brief summary of the conversion process:

Click through for the translation rules and how to do this in C#. Now I’m thinking about how to do this in F# and thinking tail-call recursion. I might give that a try on my own and blog about it if I come up with something neat.

Comments closed

Bucket Operations in Oracle Object Storage

Brendan Tierney continues a series on working with Oracle Object Storage:

In a previous post, I showed what you need to do to setup your local PC/laptop to be able to connect to OCI. I also showed how to perform some simple queries on your Object Storage environment. Go check out that post before proceeding with the examples in this blog.

In this post, I’ll build upon my previous post by giving some Python functions to:

  • Check if Bucket exists
  • Create a Buckets
  • Delete a Bucket
  • Upload an individual file
  • Upload an entire directory

Read on for those examples.

Comments closed

Test Data Generation Concepts and Challenges

Phil Factor talks about data generation:

Data generation is the science and art of providing data for database development work that is as realistic and controllable as possible. The skills of generating realistic data are an essential part of being a database developer. It is important: you need plenty of data of exactly the right type, size and verisimilitude in order to develop and test a database. It has got to look right. If you are launching a corporate, data-driven application, you’ll need it for training as well as testing and development.

Where things get really tough is when you also want to get the distribution of data right. Is a uniform distribution (e.g., outputs from a random number generator) okay? Or do you have a few large accounts that take up the majority of your data? That difference could affect query plans and if you have nice and uniform data, you might never see some of the cardinality estimation problems until they pop up in production.

Comments closed