Press "Enter" to skip to content

Month: December 2024

Ways to Land Data into Microsoft Fabric OneLake

James Serra puts on a cape and takes on an iconic laugh:

Microsoft Fabric is rapidly gaining popularity as a unified data platform, leveraging OneLake as its central data storage hub for all Fabric-integrated products. A variety of tools and methods are available for copying data into OneLake, catering to diverse data ingestion needs. Below is an overview of what I believe are the key options:

Read on for a baker’s dozen methods.

Comments closed

Using the Azure AI Language and Translation Python SDK

Tomaz Kastrun continues a series on Azure AI:

Using SDK options for “Language + Translation” service is

pip install azure-ai-textanalytics==5.2.0

and adding your endpoint in format like: https://yyyyy_azurehub_xxxxxxx.cognitiveservices.azure.com/

and secret to your endpoint. And you will also need the region name (e.g.: west-europe).

Once you’ve set up the necessary credentials, Tomaz shows how easy it is to call the service.

Comments closed

A Gentle Introduction to dbt

Koen Verbeeck makes a change:

In a typical data warehouse project, data is loaded, transformed and stored into a data store using an ETL (extract – transform – load) process. Recently, there has been a shift to ELT processing, where data is first loaded into a data store (this can be a database, but also a data lake or something similar) and then transformed in-place.

Over the past years, the tool dbt – short for data build tool – has become quite popular in the data engineering world for handling such an ELT process. dbt takes on the role of the “T”, meaning it’s responsible for transforming the data in a certain data store. dbt is not meant for transferring data from one store to another, you’ll need another tool for this, such as Azure Data Factory for example.

Read on to see how it works.

Comments closed

Updating Spark Pool Runtime Versions in Microsoft Fabric

Sandeep Pawar keeps things up to date:

It’s always a good idea to use the latest GA runtime for the default Spark pool in Fabric workspaces. Unless you change it manually, the workspace will always use the previously set runtime even if a new version is available. To help identify the runtime that workspaces are using and to upgrade multiple workspaces at once, use the code below, powered by Semantic Link.

Read on to see how you can do it using a bit of Python scripting.

Comments closed

Database Nostalgia

Louis Davidson looks back on the past:

I will admit that the feature I will mention probably won’t be relevant to you, as it has been nearly 30 years since this feature was introduced. But since this is the most nostalgic time of the year, I want to go back, through the mists of times, to a time where we hadn’t even quite begun to fear the changing of the centuries. To a time when we a single CPU was all we had even thought of, computer screens were often colorless, a few megabytes of RAM was all that was needed. It was a simpler time, though to be fair, it was also a time before we had sports on our phones.

Louis goes on to reminisce about an implementation of database quality guarantees in a world without foreign key constraints. Which is usually about the time I rag on MySQL for its insistence that foreign keys weren’t important to the product until relatively late in its existence (version 3.23, released in 2000).

Comments closed

Against Using SQL Server Database Mail

Brent Ozar lays out an argument:

So you’re writing T-SQL code and you’ve decided that you want to send emails to customers, employees, or the general public. Perhaps you need to send order updates or low stock notifications.

Stop right there. You don’t really wanna do that with SQL Server. Let’s talk about why.

Read on for good reasons why. The bit of fuzziness here is “employees,” as Database Mail is quite useful for DBAs or individuals responsible for the upkeep of database-related operations. But the further you move away from “People whose job it is to keep the SQL Server instance up and running,” the less viable Database Mail is as a product, for the exact reasons Brent mentions.

Of course, you could always do what I did at one consulting engagement and set up sending Slack messages via CLR. I’m not saying it was a good idea, but it was pretty cool that it worked as well as it did.

Comments closed

High CPU with SQL Server Database Mail

Vitaly Bruk deals with a CPU issue:

Today, I investigated an interesting issue.

One of my clients called me and complained about high CPU on his server. Server with 0 load at this time frame.

Using my favorite “Activity” script and the sys.dm_os_ring_buffers DMV, I clearly saw that the CPU is being used by a non-SQL server service. Next, I’ve asked to connect SQL Server machine and opened a task manager.

Surprise, surprise! The CPU was being used by the SQL Server process! Well, kind of…

The moral of the story is to keep your systems patched.

Comments closed

The Value of Datasets in SSRS

Scott Murray writes a query:

SQL Server Reporting Services (SSRS) uses the object type of a data source as the basic unit of connecting a data source to a report. (See this tip – SQL Server Reporting Services SSRS 2017 Data Sources) A data source is the connection details that allow a report designer and report consumer to define where the data lives and includes items like a table in SQL Server or an API. Continuing this concept, the dataset is the basic object that defines the structure of the data to be loaded and establishes a collection of fields.

Read on to dive into the properties of an SSRS dataset.

Comments closed

The Data Streaming Landscape Entering 2025

Kai Waehner lays out the state of things:

Data streaming is a new software category. It has grown from niche adoption to becoming a fundamental part of modern data architecture, leveraging open source technologies like Apache Kafka and Flink. With real-time data processing transforming industries, the ecosystem of tools, platforms, and cloud services has evolved significantly. This blog post explores the data streaming landscape of 2025, analyzing key players, trends, and market dynamics shaping this space.

It’s always important to keep the writer’s bias in mind when reading these articles (and we all have biases, whether or not we admit to them). With that preparatory throat-clearing out of the way, Kai does an excellent job laying out the players, the criteria he uses for analysis, and the current state of the field.

Comments closed