Press "Enter" to skip to content

Day: December 10, 2024

Finding the Column with Max Value in R

Steven Sanderson finds the column with the maximum value for each row in an R data frame:

Finding the column with the maximum value for each row is a useful operation when you want to identify the dominant category, highest measurement, or most significant feature in your dataset. This can provide valuable insights and help in decision-making processes.

R offers several ways to accomplish this task, ranging from base R functions to powerful packages like dplyr and data.table. We’ll explore each approach in detail, providing code examples and explanations along the way.

Click through for several examples.

Leave a Comment

Data Transformation with Dataflows Gen2

Boniface Muchendu provides an overview of Dataflows Gen2 in Microsoft Fabric:

Welcome to a journey into the world of data automation! Imagine working in an organization bustling with data scientists and analysts. In such an environment, you often need to gather and combine data from various sources for further analysis. You could do this manually, but why not leverage automation? In this blog, we’ll explore how to apply automation on data transformations using Dataflows Gen2 in Microsoft Fabric.

Admitting that I am not the primary audience for Dataflows Gen2, I’d still much rather write a Spark notebook and call it a day.

Leave a Comment

Fabric Studio 1.0

Gerhard Brueckl makes an announcement:

I am very proud to announce the first public release of Fabric Studio v1.0 – a VSCode extension that allows you to manage and develop your Fabric workspace(s). Similar to Power BI Studio, it seamlessly integrates into VSCode for increased productivity for professional developers and admins alike.

Click through for some of the functionality available in Fabric Studio. You can download the extension from the VS Code marketplace and Gerhard includes a link to the GitHub repo in the blog post.

Leave a Comment

Geospatial Data Exploration in Microsoft Fabric

Sandeep Pawar goes on a journey:

Simon Willison is one of my favorite bloggers. In fact, what I blog, how I blog & test, is inspired by him. He wrote a blog a couple of weeks ago about FourSquare Places data that has been open-sourced. I was exploring this dataset and ended up creating a few maps. I love OrgApps in Fabric and I truly believe as it matures, it will be THE way for analysts & data scientists to provide rich insights + traditional reports to business users. Notebooks can augment the Power BI reports to provide insights that are otherwise not possible. I have submitted a session on this topic to FabCon ‘25, let’s see. If it is selected, I hope to show how transformational it is and how businesses can use it.

Click through for a video and the notebook that Sandeep demonstrated.

Leave a Comment

Metadata-Driven Spark Clusters in Azure Databricks

Matt Collins ties the room together with a bit of metadata:

In this article, we will discuss some options for improving interoperability between Azure Orchestration tools, like Data Factory, and Databricks Spark Compute. By using some simple metadata, we will show how to dynamically configure pipelines with appropriately sized clusters for all your orchestration and transformation needs as part of a data analytics platform.

Click through for an explanation of the challenge, followed by the how-to.

Leave a Comment