Press "Enter" to skip to content

Curated SQL Posts

Finding the Column with Max Value in R

Steven Sanderson finds the column with the maximum value for each row in an R data frame:

Finding the column with the maximum value for each row is a useful operation when you want to identify the dominant category, highest measurement, or most significant feature in your dataset. This can provide valuable insights and help in decision-making processes.

R offers several ways to accomplish this task, ranging from base R functions to powerful packages like dplyr and data.table. We’ll explore each approach in detail, providing code examples and explanations along the way.

Click through for several examples.

Comments closed

Using the Azure AI Speech Python SDK

Tomaz Kastrun writes some code:

Besides Python Speech SKD there are multiple languages supported with Speech SDK. Python SDK will expose you many of the Speech service capabilities for developing speech-enabled applications. Ideal for scenarios for (near) real-time and non real-time cases by using other Azure services as storage, streams and analytics

Click through for a demonstration.

Comments closed

Data Transformation with Dataflows Gen2

Boniface Muchendu provides an overview of Dataflows Gen2 in Microsoft Fabric:

Welcome to a journey into the world of data automation! Imagine working in an organization bustling with data scientists and analysts. In such an environment, you often need to gather and combine data from various sources for further analysis. You could do this manually, but why not leverage automation? In this blog, we’ll explore how to apply automation on data transformations using Dataflows Gen2 in Microsoft Fabric.

Admitting that I am not the primary audience for Dataflows Gen2, I’d still much rather write a Spark notebook and call it a day.

Comments closed

Fabric Studio 1.0

Gerhard Brueckl makes an announcement:

I am very proud to announce the first public release of Fabric Studio v1.0 – a VSCode extension that allows you to manage and develop your Fabric workspace(s). Similar to Power BI Studio, it seamlessly integrates into VSCode for increased productivity for professional developers and admins alike.

Click through for some of the functionality available in Fabric Studio. You can download the extension from the VS Code marketplace and Gerhard includes a link to the GitHub repo in the blog post.

Comments closed

Geospatial Data Exploration in Microsoft Fabric

Sandeep Pawar goes on a journey:

Simon Willison is one of my favorite bloggers. In fact, what I blog, how I blog & test, is inspired by him. He wrote a blog a couple of weeks ago about FourSquare Places data that has been open-sourced. I was exploring this dataset and ended up creating a few maps. I love OrgApps in Fabric and I truly believe as it matures, it will be THE way for analysts & data scientists to provide rich insights + traditional reports to business users. Notebooks can augment the Power BI reports to provide insights that are otherwise not possible. I have submitted a session on this topic to FabCon ‘25, let’s see. If it is selected, I hope to show how transformational it is and how businesses can use it.

Click through for a video and the notebook that Sandeep demonstrated.

Comments closed

Metadata-Driven Spark Clusters in Azure Databricks

Matt Collins ties the room together with a bit of metadata:

In this article, we will discuss some options for improving interoperability between Azure Orchestration tools, like Data Factory, and Databricks Spark Compute. By using some simple metadata, we will show how to dynamically configure pipelines with appropriately sized clusters for all your orchestration and transformation needs as part of a data analytics platform.

Click through for an explanation of the challenge, followed by the how-to.

Comments closed

An Overview of the Azure AI Services Speech Service

Tomaz Kastrun has been busy with the Azure AI series. First up is an overview of Azure AI Services (nee Cognitive Services) available in the Azure AI Foundry:

In Azure AI Foundry, you can always gow to Azure AI Services, where you can create intelligent apps with different AI models. These services are simple and ready to use with relative low costs.

Then Tomaz drills into the Speech service:

In Azure AI Foundry you will find the speech playground with the vast variety of solutions to enhance and add the functionalities to your applications.

Speech service will give you capabilities to convert speech to text, realtime translations, fast transcriptions, voice assistant and others.

After that, we get an intro of Speech Studio:

Speech studio (available at URL: https://speech.microsoft.com/portal)  is a set of UI-based tools for building and integrating features from Azure AI Speech service (available in Azure portal) into your applications using no-code approach. You can also create projects by using and referencing the assets and services using  Speech SDK, the Speech CLI, or the REST APIs.

The Speech service is by no means perfect, but it’s interesting just how well it can do at detecting languages (one set of functionality) and translating arbitrary audio from one language to another (via a different call).

Comments closed

API Testing with pytest

Xuan Nguyen Truong writes some tests:

API testing is an essential aspect of software development, ensuring that your application’s endpoints are functioning correctly and reliably. In this guide, we’ll introduce you to implement API testing in Python with Pytest and the Requests library.

I’m a big fan of pytest, as it makes testing in Python so much easier. There’s not a lot of ceremony involved in writing tests and it’s easy to see what’s failing during tests.

Comments closed

Postgres Synchronous Replication Guarantees

Kaarel Moppel has a public service announcement:

At last week’s local Postgres user group meetup here in Estonia, one of the topics was HA and recent Patroni (the most popular cluster manager for Postgres) improvements in supporting quorum commit, which by the way on its own has been possible to use for years. Things went deep quickly and we learned quite a bit of course. Including a good reminder that you shouldn’t build your bank on Patroni’s default synchronous mode 🙂

Anyways, during the hallway track (which sometimes are as valuable as the real ones) got an interesting question – with some 3+ quorum nodes, is Postgres then 100% bulletproof against all kinds failures? Excluding meteorites, rouge DBAs and such of course. One could think so, right? Nope.

Read on to learn what might cause failure in that scenario. Guaranteeing synchronous replication between machines over a network is a surprisingly difficult challenge.

Comments closed