2023-09-26 – Curated SQL

Data Exploration in R with dplyr

Published 2023-09-26 by Kevin Feasel

When you are working on a data science project, the data is often tabular structured. You can use the built-in data table to handle such data in R. You can also use the famous library dplyr instead to benefit from its rich toolset. In this post, you will learn how dplyr can help you explore and manipulate tabular data. In particular, you will learn:

How to handle a data frame

How to perform some common operations on a data frame

I like dplyr a lot for its “functional flow”—you pipe outputs of one function to be inputs of the next function, so the chain makes a lot of sense. If you want high performance, though, it’s often not the best choice—that’s usually data.table.

Comments closed

Pairs Plots in Base R

Published 2023-09-26 by Kevin Feasel

Steven Sanderson shows how we can create a pairs plot using the pairs() function in R:

A pairs plot, also known as a scatterplot matrix, is a grid of scatterplots that displays pairwise relationships between multiple variables in a dataset. Each cell in the grid represents the relationship between two variables, and the diagonal cells display histograms or kernel density plots of individual variables. Pairs plots are incredibly versatile, helping us to identify patterns, correlations, and potential outliers in our data.

Click through for one example, how to interpret it, and how to customize the outputs.

Comments closed

ggplot2 in Python Notebooks

Published 2023-09-26 by Kevin Feasel

John Mount runs R in Python with rpy2:

For an article on A/B testing that I am preparing, I asked my partner Dr. Nina Zumel if she could do me a favor and write some code to produce the diagrams. She prepared an excellent parameterized diagram generator. However being the author of the book Practical Data Science with R, she built it in R using ggplot2. This would be great, except the A/B testing article is being developed in Python, as it targets programmers familiar with Python.

As the production of the diagrams is not part of the proposed article, I decided to use the rpy2 package to integrate the R diagrams directly into the new worksheet. Alternatively, I could translate her code into Python using one of: Seaborn objects, plotnine, ggpy, or others. The large number of options is evidence of how influential Leland Wilkinson’s grammar of graphics (gg) is.

Click through to see how you can execute R code within the context of Python, similar to how you can use the reticulate package to execute Python code in the context of R.

Comments closed

Creating an Image Classification Model in Oracle OCI Vision

Published 2023-09-26 by Kevin Feasel

Brendan Tierney separates the cats and the dogs:

In this post, I’ll build on the previous work on preparing data, to using this dataset as input to building a Custom AI Vision model. In the previous post, the dataset was labelled into images containing Cats and Dogs. The following steps takes you through creating the Customer AI Vision model and to test this model using some different images of Cats.

This post is part four of a series (first part, second part, third part) on custom image classification in Oracle.

Comments closed

Set a Power BI Background to a URL Image

Published 2023-09-26 by Kevin Feasel

Erik Svensen doesn’t need a theme file:

Quick Power BI tip here on a wonderful Monday morning – that I just discovered.

It’s possible to use a image URL as your background to your canvas – I used to think that this was only possible via a Power BI theme file – but no… that’s not the case.

Read on to see how.

Comments closed

WITHIN GROUP in STRING_AGG()

Published 2023-09-26 by Kevin Feasel

Chad Callihan messes with groups:

When was the last time you wrote a SQL query and knew something was possible but just couldn’t remember how? I had one of those moments this week with STRING_AGG and ordering data, and although it was frustrating, I knew it would make a worthwhile blog post. Let’s look at some examples using STRING_AGG and WITHIN GROUP (aka the clause that slipped my mind).

There’s a perfectly good reason why WITHIN GROUP might slip your mind: STRING_AGG() is known as an ordered set function (versus a window function which uses an OVER() clause). It’s also the only ordered set function SQL Server supports, so you don’t get too many opportunities to use the key phrase.

Comments closed

Setting a Spark Compute Pool Size in Microsoft Fabric

Published 2023-09-26 by Kevin Feasel

Reitse Eskens manages compute pools:

This next blog won’t be a long one and will probably serve most as a reminder for myself where to find the settings for the Spark compute pool.

When you create a workspace, you get the default starter pool and it has taken me way longer than I care to admit to find where to find the setting and, more importantly, how to change it.

Read on to learn more about how to create a Spark pool of the size you desire. The sizing method is essentially the same as with Azure Synapse Analytics.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Day: September 26, 2023

Data Exploration in R with dplyr

Pairs Plots in Base R

ggplot2 in Python Notebooks

Creating an Image Classification Model in Oracle OCI Vision

Set a Power BI Background to a URL Image

WITHIN GROUP in STRING_AGG()

Setting a Spark Compute Pool Size in Microsoft Fabric