Press "Enter" to skip to content

Author: Kevin Feasel

Using Spark Connect from .NET

Ed Elliott keeps the hope alive:

Over the past couple of decades working in IT, I have found a particular interest in protocols. When I was learning how MSSQL worked, I spent a while figuring out how to read data from disk via backups rather than via the database server (MS Tape Format, if anyone cared). I spent more time than anyone should learning how to parse TDS (before the [MS-TDS] documentation was a thing)—having my head buried in a set of network traces and a pencil and pen has given me more pleasure than I can tell you.

This intersection of protocols and Spark piqued my interest in using Spark Connect to connect to Spark and run jobs from .NET rather than Python or Scala.

There’s a whole lot more ceremony involved than the Microsoft .NET for Apache Spark project, but read on to see how it all works. Also, I hereby officially chastise Ed for having examples in C# and VB.NET but not the greatest .NET language of them all: F#. Chastisement aside, I appreciate the work Ed put into this to bring Spark Connect to the .NET masses.

Comments closed

Trying out Data Wrangler

Ginger Grant tries out a feature in Microsoft Fabric:

The second element in my series on new Fabric Features is Data Wrangler. Data Wrangler is an entirely new feature found inside of the Data Engineering and Machine Learning Experience of Fabric. It was created to help analyze data in a lakehouse using Spark and generated code. You may find that there’s a lot of data in the data lake that you need to evaluate to determine how you might incorporate the data into a data model. It’s important to examine the data to evaluate what the data contains. Is there anything missing? Incorrectly data typed? Bad Data? There is an easy method to discover what is missing with your data which uses some techniques commonly used by data scientists. Data Wrangler is used inside of notebooks in the Data Engineering or Machine Learning Environments, as the functionality does not exist within the Power BI experience.

Click through to see how it works. I liken it to Power Query for people who don’t like Python.

Comments closed

The Problem with Automatic Plan Correction

Kendra Little lays it out:

I’ve written a bit about SQL Server’s Automatic Plan Correction feature before– I have an hour long free course with demos on Automatic Plan Correction here on the site.

Today I’m updating that course with a note: after using Automatic Plan Correction in anger for a good amount of time, I do not recommend enabling the feature. I’ve had it cause too many performance problems, and there are not a ton of options for an administrator when it’s causing those problems.

Kendra is still bullish about the potential of this but has some major issues with the current implementation. Read on to learn more about it.

Comments closed

Handling Orphaned Database Files with dbatools

Rod Edwards rounds up the orphans:

This may be an edge case issue, it may not. Or some may not know this is a potentially a thing. For any of the above questions, i’m not sure of the answer. I do know however, that it doesn’t involve morally suspicious fairy tales of any kind, flutes, or pastry products for that matter.

I also know that it’s something that could potentially be robbing disk space across SQL Estates so i’ll talk about it anyway and supply a simple way to fix this in one sweep using the magnificent DBATools.

Rod’s claim is no pastry products, but my counter-argument is that the command probably runs better if someone brings donuts in.

Comments closed

TidyDensity and data.table

Steven Sanderson makes use of data.table:

I’m thrilled to announce a major upgrade to the TidyDensity package that’s sure to accelerate your data analysis workflows. We’ve integrated the lightning-fast data.table package for generating tidy distribution data, resulting in a jaw-dropping 30% speed boost.

The data.table package is so much faster than its competition in so many cases, yet I really don’t like its syntax.

Comments closed

NEXT VALUE FOR and ROWCOUNT

Jose Manuel Jurado Diaz diagnoses a problem:

In the realm of SQL Server, certain combinations of commands and functions can lead to unexpected conflicts and errors. A notable example is the conflict between the NEXT VALUE FOR function and the ROWCOUNT setting. This article aims to dissect the nature of this error, explaining why it occurs, its implications, and how to effectively capture and analyze it using Extended Events in Azure SQL Database. For example, we got the following error message: Msg 11739, Level 15, State 1, Line 11 – NEXT VALUE FOR function cannot be used if ROWCOUNT option has been set, or the query contains TOP or OFFSET. – NEXT VALUE FOR (Transact-SQL) – SQL Server | Microsoft Learn

Read on for more information, including what the error is telling us and how we can capture cases of that error using extended events.

Comments closed

Granting Users Access to Create Fabric Items

Gilbert Quevauvilliers is in a giving mood:

I was recently working with a customer where I was showing them the awesome new features of Microsoft fabric. I then created a workspace and attempted to grant the individual users access to the workspace to create fabric items or workloads.

When the users went into the app workspace with the fabric settings, enabled, the users could not create any workloads.

Read on to see what the problem was and how you can resolve it.

Comments closed