Kevin Feasel – Page 262

Domain Lineage in Microsoft Fabric

Published 2024-09-19 by Kevin Feasel

Sandeep Pawar creates 1000 words of value:

In Fabric, you can use the Domains to create a data mesh architecture. It allows you to organize the data and items by specific business domains within the organization and make the overall data architecture decentralized. You can create domains within domains and assign workspaces to each domain. As it grows, you may find it challenging to understand how the domains & workspaces have been organized. Below code will help you trace the domains, subdomains and the workspaces assigned to them.

Click through to see how you can use the graphviz library in Python to generate a simple domain chart.

Comments closed

When to Think about Scalability

Published 2024-09-19 by Kevin Feasel

Steve Jones hits upon a dilemma:

There may not be a large workload in production either, at least not at first.

So, what do you worry about first: your code being used or performing well? That’s a similar question to this one: Worry about Scalability or Popularity First? While most of us don’t work for a startup and our organizations have some sort of financial stability, does popularity matter?

I don’t always have a solid answer for this. The closest I have is to try to make my baseline:

Easy for maintainers (including myself) to read
Reasonably efficient
Capable of some level of scale but not necessarily the most scalable

If you want practical terms, I create a somewhat-educated guess on approximately how many rows there will be in a steady state after launch and then multiply that by a factor of 2-3 when generating test data. If you can dodge 2-3x expectations, you can dodge a ball.

And if you suddenly balloon to 10x, you grouse and grumble and spend a couple of sprints digging out from the mess of success.

Comments closed

Compression Tuple Filtering in TimescaleDB

Published 2024-09-19 by Kevin Feasel

Sven Klemm talks compression:

However, it also created a problem. While we had originally intended mutating compressed chunks to be a rare event, people were now pushing its limits with frequent inserts, updates, and deletes. Seeing our customers go all in on this feature confirmed that we were on the right track, but we had to double down on performance.

Today, we’re proud to announce significant improvements as of TimescaleDB 2.16.0, delivering up to 500x faster updates and deletes and 10x faster upserts on compressed data. These optimizations make compressed data behave even more like uncompressed data—without sacrificing performance or flexibility.

Read on to learn a bit more about compression in Postgres and TimescaleDB, as well as how compression tuple filtering works.

Comments closed

A Quick Reference Guide for Power BI

Published 2024-09-19 by Kevin Feasel

Hristo Hristov tabulates:

I need a structured reference guide to help me get started or expand on my Power BI knowledge. I want to be able to bookmark a resource and use it daily when needed as I build my data sets, reports, and dashboards. Can you please enumerate some common and helpful resources as a Power BI Quick Reference guide?

Click through for plenty of links to prior MSSQLTips articles.

Comments closed

Self-Joins and Halloween Protection

Published 2024-09-19 by Kevin Feasel

Paul White has an explanation:

I was asked recently why Halloween Protection was needed for data modification statements that include a self-join of the target table. This gives me a chance to explain, while also covering some interesting product bug history from the SQL Server 7 and 2000 days.

Read on for that explanation.

Paul’s explanation of the bug reminded me of the “quirky update” approach to building a running total, except that, instead of fixing a bug that eliminated it, the process always remained on a knife’s edge of “unsupported but works…at least until we change something and it doesn’t work anymore.”

Comments closed

Natural Language Pre-Processing with Python

Published 2024-09-18 by Kevin Feasel

Harris Amjad does some text cleanup:

Natural Language Processing (NLP) is currently all the rage in the current machine learning landscape. With technologies like ChatGPT, Gemini, Llama, and so many other state-of-the-art text generators getting popular with the mainstream public, many newcomers are pouring into the field of NLP. Unfortunately, before we delve into how these fancy chatbots work, we must understand how we are engineering and treating our data before we feed it to our model. In this tip, we will introduce and implement some basic text preprocessing and cleaning techniques with Python.

Click through for some common operations. Some of these are very important for certain tasks but likely unhelpful for others. That could include things like lower-casing all words or removing stopwords. There are also some operations like spell checking and jargon expansion (or replacement) that you will likely want to include in a real-life project with actual people entering the data, versus a tidy sample dataset.

Comments closed

So You Dropped a Table–Snowflake Edition

Published 2024-09-18 by Kevin Feasel

Kevin Wilkie would never presume that you dropped the table, no no:

This week, I want to talk about something we’ve all done at least once – especially before our first cup of coffee in the morning. Yes, that’s right – dropping tables and databases.

Read on to see how you can rectify this sort of mistake.

Comments closed

Creating Profiles in Visual Studio Code and Azure Data Studio

Published 2024-09-18 by Kevin Feasel

I have a new video:

In this video, I show off a not-so-well-known capability in Visual Studio Code and Azure Data Studio: creating profiles.

Profiles are very useful in Visual Studio Code, though probably less useful for Azure Data Studio. I think the primary benefit to that would be handling things like zoom levels and menu layouts when you switch from a laptop on the go to something plugged into a larger monitor.

Comments closed

Loading Entra ID Group Membership into JSON via Python Notebook

Published 2024-09-18 by Kevin Feasel

Gilbert Quevauvilliers wants to know who’s in your group:

Using a Service Principal to get all Entra ID Group Members into JSON File using a Python Notebook

Sometimes it is useful to get all Group Members into a JSON file so that this could be used for reporting purposes.

Click through for the instructions.

Comments closed

Data Analysis with Window Functions in Postgres

Published 2024-09-18 by Kevin Feasel

Elizabeth Christensen dives into window functions:

SQL makes sense when it’s working on a single row, or even when it’s aggregating across multiple rows. But what happens when you want to compare between rows of something you’ve already calculated? Or make groups of data and query those? Enter window functions.

Window functions tend to confuse people – but they’re a pretty awesome tool in SQL for data analytics. The best part is that you don’t need charts, fancy BI tools or AI to get some actionable and useful data for your stakeholders.

Read on for several demonstrations. Most of this you can also do with SQL Server 2012 or later, though the DATE_TRUNC() example will only work in SQL Server 2022 or Azure SQL DB / Managed Instance. Prior to that, you’d need to use a different mechanism, such as CAST(o.order_date AS DATE), to get it working.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Author: Kevin Feasel