Press "Enter" to skip to content

Author: Kevin Feasel

Creating an Image Classification Model in Oracle OCI Vision

Brendan Tierney separates the cats and the dogs:

In this post, I’ll build on the previous work on preparing data, to using this dataset as input to building a Custom AI Vision model. In the previous post, the dataset was labelled into images containing Cats and Dogs. The following steps takes you through creating the Customer AI Vision model and to test this model using some different images of Cats.

This post is part four of a series (first part, second part, third part) on custom image classification in Oracle.

Comments closed

Setting a Spark Compute Pool Size in Microsoft Fabric

Reitse Eskens manages compute pools:

This next blog won’t be a long one and will probably serve most as a reminder for myself where to find the settings for the Spark compute pool.

When you create a workspace, you get the default starter pool and it has taken me way longer than I care to admit to find where to find the setting and, more importantly, how to change it.

Read on to learn more about how to create a Spark pool of the size you desire. The sizing method is essentially the same as with Azure Synapse Analytics.

Comments closed

WITHIN GROUP in STRING_AGG()

Chad Callihan messes with groups:

When was the last time you wrote a SQL query and knew something was possible but just couldn’t remember how? I had one of those moments this week with STRING_AGG and ordering data, and although it was frustrating, I knew it would make a worthwhile blog post. Let’s look at some examples using STRING_AGG and WITHIN GROUP (aka the clause that slipped my mind).

There’s a perfectly good reason why WITHIN GROUP might slip your mind: STRING_AGG() is known as an ordered set function (versus a window function which uses an OVER() clause). It’s also the only ordered set function SQL Server supports, so you don’t get too many opportunities to use the key phrase.

Comments closed

Generating Reproducible Reports with Jupyter and Quarto

Parisa Gregg and Myles Mitchell don’t need to copy and paste for their TPS reports:

Quarto is a free-to-use, open-source software based on Pandoc that enables users to convert plain text files into a range of formats, including PDF, HTML and powerpoint presentations. These documents can contain a mixture of narrative text, Python code, and figures that are dynamically generated by the embedded code.

This has many use-cases:

  • Your company may have a weekly board meeting to go over the latest sales figures. By having a Quarto presentation that pulls in the latest company sales data, you can regenerate the presentation slides each week at the click of a button.
  • As a researcher you may be preparing a report for publication. By having the code that generates data tables and figures embedded within the report, regenerating the draft as the experimental data floods in is a breeze!

Read on for a fun example of how you could automated a research-driven report.

Comments closed

Creating Confidence Intervals on a Linear Model in R

Steven Sanderson goes frequentist on us:

Linear regression is a fundamental statistical technique used to model the relationship between a dependent variable and one or more independent variables. While fitting a linear model is relatively straightforward in R, it’s also essential to understand the uncertainty associated with our model’s predictions. One way to visualize this uncertainty is by creating confidence intervals around the regression line. In this blog post, we’ll walk through how to perform linear regression and plot confidence intervals using base R with the popular Iris dataset.

Click through to see how, even if you’re a Bayesian who considers confidence intervals to overstate precision in reality.

Comments closed

Maintaining Existing Power BI Data while Loading More with Fabric

Chris Webb looks back on an older post:

To be honest I’m slightly ashamed of this fact because, as I say in the post, the solution I describe is a bit of a hack – but at the same time, the post is popular because a lot of people have the problem of needing to add new data to the data that’s already there in their Power BI dataset and there’s no obvious way of doing that. As I also say in that post, the best solution is to stage the data in a relational database or some other store outside Power BI so you have a copy of all the data if you ever need to do a full refresh of your Power BI dataset.

Why revisit this subject? Well, with Fabric it’s now much easier for you as a Power BI developer to build that place to store a full copy of your data outside your Power BI dataset and solve this problem properly.

Read on for an example of the new solution.

Comments closed

Dynamic Highlighting of Data Points Based on Slicer Selection

Nikola Ilic shines a light on the data:

To quickly explain: when a user selects, for example, Contoso in the slicer, the Contoso bar should be highlighted by using a different color. As much as this sounds like a very basic and common business request, there is no straightforward solution in Power BI (or, at least, I’m not aware of it:))

However, the client’s wish is (almost) always our command – so, let’s see how this feature can be implemented with a little bit of data model tweaking and leveraging some DAX code.

Spoilers: there is a solution, though it does involve quite a few steps.

Comments closed

Using DAX to Find Products Missing Sales

Marco Russo and Alberto Ferrari observe the dog that didn’t bark:

What products did not sell in a specific area, store, or time period? This may be an important analysis for several businesses. There are multiple ways to obtain the desired result. Some specific implementations might be needed because of the user or model requirements, whereas developers can choose any formula in several cases. Or you might just find a solution on the web and blindly implement it without questioning whether there is a better way to achieve what you want.

It turns out that different formulas perform very differently. Choosing the right one in your scenario can make a slow report fast. This article analyzes the performance of different formulations of one same algorithm. 

It’s interesting to see the performance profile here: most are reasonably close together, although you can still get a 2x gain from using the fastest approach versus the second-slowest. And then there’s the slowpoke.

Comments closed