Press "Enter" to skip to content

Day: June 27, 2023

Bootstrap Resampling in R

Steven Sanderson makes a list and checks it twice (or n number of times with replacement):

Bootstrap resampling is a powerful technique used in statistics and data analysis to estimate the uncertainty of a statistic by repeatedly sampling from the original data. In R, we can easily implement a bootstrap function using the lapply, rep, and sample functions. In this blog post, we will explore how to write a bootstrap function in R and provide an example using the “mpg” column from the popular “mtcars” dataset.

Read on for the process.

Comments closed

Detecting AI-Generated Profile Photos

Shivansh Mundra, et al, report on some research:

With the rise of AI-generated synthetic media and text-to-image generated media, fake profiles have grown more sophisticated. And we’ve found that most members are generally unable to visually distinguish real from synthetically-generated faces, and future iterations of synthetic media are likely to contain fewer obvious artifacts, which might show up as slightly distorted facial features. To protect members from inauthentic interactions online, it is important that the forensic community develop reliable techniques to distinguish real from synthetic faces that can operate on large networks with hundreds of millions of daily users, like LinkedIn. 

There are some interesting findings here.

Comments closed

Cache Recommendations for Azure Data Explorer

Guy Reginiano notes an update:

A new generation of cache recommendations for Azure Data Explorer is now available in the Azure portal! 
This update introduces significant improvements, including enhanced logic, additional statistics for end users, an improved user interface, and a streamlined process for reviewing and applying recommendations. In this blog post, we will explore the new features and benefits offered by this latest update. 

Read on to see where you can find these cache recommendations, as well as the types of recommendations you’re liable to receive.

Comments closed

Indexing Multiple Columns in Oracle with DBMS_SEARCH

Brendan Tierney rounds up the usual suspects:

This type of index is a little different to your traditional index. With DBMS_SEARCH we can create an index across multiple schema objects using just a single index. This gives us greater indexing capabilities for scenarios where we need to search data across multiple objects. You can create a ubiquitous search index on multiple columns of a table or multiple columns from different tables in a given schema. All done using one index, rather than having to use multiples. Because of this wider search capability, you will see this (DBMS_SEARCH) being referred to as a Ubiquitous Search Index. A ubiquitous search index is a JSON search index and can be used for full-text and range-based searches.

This is an interesting approach to the problem, though as I think about it, it makes me wonder, if you’re constantly searching in A+B+C+D, is that really four separate attributes or has something gone wrong in the design? It’s early enough in the morning for me that I’m willing to accede to there being use cases in a well-designed database.

Comments closed

Using Subqueries in a SELECT Statement

Greg Larsen builds a subquery:

Did you know you can include a SELECT statement within another SELECT statement? When a SELECT statement is embedded within another statement it is known as a subquery. There are two types of subqueries: basic subquery and correlated subquery.

In this article I will be discussing both types of subqueries and will be providing examples of how to use a subquery in different places within in a SELECT statement.

Greg has a good write-up on the topic of subqueries and does well to separate correlated from non-correlated subqueries.. And if you want to learn more about those, as well as common table expressions, I put out a video on the topic just last week.

Comments closed

A Complex Example of ADF Pipeline Return Value

Andy Leonard goes beyond the simple example:

In this post, I demonstrate one way to create a child pipeline that returns the SubscriptionId for a data factory. I then call the child pipeline from a parent package.

To build this demonstration, please follow the instructions that follow.

This is definitely more complicated than Andy’s simple example, but there are plenty of screenshots to take you through the process.

Comments closed

Loading Multiple Extended Events Files in SQL Server

Jose Manuel Jurado Diaz reviews the tapes:

As the volume of data grows, SQL Server creates multiple extended event files to store the captured information efficiently. These files are usually saved in a designated target folder. However, when it comes to loading and analyzing these files, administrators often face the challenge of dealing with multiple files individually. Manually loading each file can be time-consuming and inefficient, especially when dealing with a large number of extended event files.

Read on to see which function you can use to read multiple Extended Events files and how it works.

Comments closed