Press "Enter" to skip to content

Author: Kevin Feasel

Plotting SVM Decision Boundaries in R

Steven Sanderson goes right up to the edge:

Support Vector Machines (SVM) are a powerful tool in the world of machine learning and classification. They excel in finding the optimal decision boundary between different classes of data. However, understanding and visualizing these decision boundaries can be a bit tricky. In this blog post, we’ll explore how to plot an SVM object using the e1071 library in R, making it easier to grasp the magic happening under the hood.

Read on to see how you can perform this analysis as well.

Comments closed

Finding Object Counts for S3 Buckets

The Big Data in Real World team sees a problem:

There is no separate command in AWS CLI to find the number of objects in an S3 bucket but there is a workaround.

Read on for the solution to this. The way that S3 and Azure Blob Storage (without hierarchical namespaces) store files as tags and treat folders as cosmetic is neat from a technical standpoint, though it goes counter to how we’d expect a file system to behave.

Comments closed

TINYINT Casts in Spark SQL vs T-SQL

Bill Fellows runs into an interesting oddity:

Yet another thing that has bitten me working in SparkSQL in Databricks—this time it’s data types.

In SQL Server, a tinyint ranges from 0 to 255 but both of them allow for 256 total values. If you attempt to cast a value that doesn’t fit in that range, you’re going to raise an error.

SQL Server’s TINYINT data type is an unsigned one-byte number, whereas TINYINT in Spark SQL is a signed one-byte number. But that’s not the biggest difference Bill finds, so check out the post to learn more.

Comments closed

Documenting Power BI Workspaces with Fabric Notebooks

Prathy Kamasami shares a use case for notebooks in Microsoft Fabric:

If you are a consultant like me, you know how hard it can be to access Power BI Admin API or Service Principal. Sometimes, you need to see all the workspaces you have permission for and what’s inside them. Well, I found with MS Fabric, we can use notebooks and achieve it with a few steps:

Read on for an enumeration of those four steps, as well as detailed instructions for each.

Comments closed

Controlling Power BI Chart Ranges with DAX

Marco Russo and Alberto Ferrrari control the horizontal, Marco Russo and Alberto Ferrari control the vertical:

DAX is a powerful tool in the hands of a Power BI developer. Using simple DAX formulas, you can not only compute interesting metrics but also customize the behavior of Power BI visuals. In this article, we use DAX to control the range of charts to obtain more coherent visualizations.

Read on to see how.

Comments closed

Thoughts on NOLOCK

Erik Darling has some thoughts:

And generally, the more NOLOCK hints I see, the more money I know I’m going to make.

It shows me four things right off the bat:

  • The developers need a lot of training
  • The code needs a lot of tuning
  • The indexes need a lot of adjusting
  • There are probably some serious bugs in the software

Perhaps the only other thing that signals just how badly someone needs a lot of help is hearing “we’re an Entity Framework only shop”.

Cha-ching.

I have to admit, even being a consultant doesn’t soften the pain of walking into a place and seeing people use NOLOCK like they picked up a fresh pallet of it from Costco and need to use it up before it goes bad.

Comments closed

Sending Azure Cost Management Data to Azure Data Explorer

Brad Watts writes out some cost data:

Understanding your Azure Spend is one of the most important things you do as an Azure customer. Azure Cost Management is built into the platform to provide you insights. But we live in a world of data and looking at the Azure Cost Management data in a silo may not meet your organization’s needs. In those situations, we can solve that need by putting your Cost Management data into an analytical platform like Azure Data Explorer or Microsoft Fabric KQL Database. Here we can bring in or join additional data that’s useful, run ad-hoc queries and build visualization tying it all together.

Using the below repository, you’ll be able to utilize Azure Cost Management exports to setup an automated process that ingests the cost data into ADX or Fabric KQL Database.

There are several steps involved, but as Brad points out, you can do this either with Microsoft Fabric or with classic Azure Data Factory + Azure Data Explorer. I’d also throw in Azure Synapse Analytics, but that’s not as in vogue anymore.

Werner Zirkel also has a great comment showing how you can cut out most of the steps with Event Grid.

Comments closed

Plotting a Subset of Data in R

Steven Sanderson doesn’t need all of those data points:

Data visualization is a powerful tool for gaining insights from your data. In R, you have a plethora of libraries and functions at your disposal to create stunning and informative plots. One common task is to plot a subset of your data, which allows you to focus on specific aspects or trends within your dataset. In this blog post, we’ll explore various techniques to plot subsets of data in R, and I’ll explain each step in simple terms. Don’t worry if you’re new to R – by the end of this post, you’ll be equipped to create customized plots with ease!

Click through for several techniques for subsetting data, as well as reasons why you might want to do it.

Comments closed

Statistical Tests in R

Adrian Tam tries out a couple of tests:

R as a data analytics platform is expected to have a lot of support for various statistical tests. In this post, you are going to see how you can run statistical tests using the built-in functions in R. Specifically, you are going to learn:

  • What is t-test and how to do it in R
  • What is F-test and how to do it in R

This is one of the things that R does best among any language: statistical testing. R has support for an enormous number of statistical functions, either built into the base language or available as packages.

Comments closed

Microsoft Fabric Presentations

Wolfgang Strasser opens a vault:

Are you searching for Microsoft Fabric Presentations? You want learn more about the new unified analytics solution?

There are plenty of presentation available around the internet – some only as recordings, some as PDFs only.

BUT – last week, I found a (now not more) hidden gem of Microsoft Fabric content on the internet – the Microsoft Fabric Readiness repository

Click through for the link to those presentations.

Comments closed