Press "Enter" to skip to content

Curated SQL Posts

Trying the sample() Function in R

Steven Sanderson gathers a sample:

Sampling is a fundamental technique in data analysis and statistical modeling. It allows us to draw meaningful insights and make inferences about a larger population based on a representative subset. In the world of R programming, the sample() function stands as a versatile tool that enables us to create random samples efficiently. In this post, we will explore the sample() function and its various applications through a series of plain English examples.

Click through for those examples.

Comments closed

Data Visualization Technology Landscape

Andy Kirk has a catalog:

My long-running catalogue of Data Visualisation Resources has for many years been the most-popular, most-visited, and most-referenced content on my website. For the last couple of years, though, it has been a little stagnant with my limited time preventing the frequent updates it needed.

Having recently completed the migration of my website to a new host and undertaken a wide-spread redesign and restructure, it felt an opportune moment to roll up my sleeves and belatedly spend some time pruning the catalogue of out-dated references and introduce all the new ones that I’d encountered, and bookmarked, but not yet added.

Click through for that, as well as the Chartmaker Directory, which gives you an idea of which visuals are available in which products, as well as examples to see them in action.

Comments closed

Choosing a Load Balancing Option in Azure

Santosh Hari looks at the options:

Azure docs have a great page on the various load balancing options in Azure that even has an awesome flowchart summing up the choices. However, not being from a networking background, combined with Microsoft’s “special” naming, combined with some sort of memory issue recalling these names from memory meant that even if I had to rely on rote memory when in conversations with customers, I would often mix up the names. For instance, confuse traffic manager and load balancer. So, I decided to understand some of the basics behind cloud load balancers to help become a more interesting conversationalist in this topic: “well actually, you should be using an app gateway there, John”.

This often isn’t in the database administrator’s purview, but Santosh does a good job of explaining the concepts and, if you’re hosted in Azure, it is good to know what’s sitting in front of your database.

Comments closed

Execution Plans of Graph Tables in SQL Server

Hugo Kornelis looks at the execution plan:

Welcome to part twenty-one of the plansplaining series, where we will continue our look at execution plans for graph queries. In the previous post, we looked at the internal structure of node and edge tables, and discovered that they have a few hidden columns. Now let’s look how those columns are used in graph queries.

Read on for the example and a deeper dive into how graph tables actually work.

Comments closed

Finding the Actual Error Line in sp_executesql

Thom Andrews solves a problem:

Notice that the error line states line 2 not line 8, which is the line the sys.sp_executesql was called on. Knowing the line the error occured on within the dynamic batch is certainly important, but if you’re working with multiple dynamic batches you have no way of knowing which dynamic batch might have produced the error; was it the one executed on line 100? Line 200? Line 350?

Solving this problem wasn’t wasn’t exactly simple, and it came with a couple of caveats.

Click through for the approach. As Thom mentions, it isn’t perfect, but it is reasonable and interesting.

Comments closed

Read and Write Data with PySpark

Dustin Vannnoy has two of the three R’s down:

Every Spark pipeline involves reading data from a data source or table. For data engineers we usually end the pipelines by writing the transformed data. In this tutorial we walk through some of the most common format and cloud storage locations for reading and writing with Spark. We’ll save some of the advanced Delta Lake capabilities for another tutorial.

Click through to see how to read from and write to CSV, JSON, and Parquet formats. Dustin has examples of working with Azure Blob Storage, S3, and Google Cloud Storage, and even some database examples with JDBC.

Comments closed

Creating Crosstabs in R

Steven Sanderson shows off the ability to perform crosstabs in R:

As a programmer, you’re constantly faced with the task of organizing and analyzing data. One powerful tool in your R arsenal is the xtabs() function. In this blog post, we’ll explore the versatility and simplicity of xtabs() for aggregating data. We’ll use the mtcars dataset and the healthyR.data::healthyR_data dataset to illustrate its functionality. Get ready to dive into the world of data aggregation with xtabs()!

Click through for two examples of it creating a crosstab (or, in SSRS/Power BI terms, a matrix) from your data.

Comments closed

Measuring Bandwidth with Powershell

Patrick Gruenauer checks download speed:

Short explanation: A file will be downloaded from my homepage. It’s a video about Powershell Profiles in German language. During the download of the file the process is measured with the Measure-Command cmdlet. At the end we get the final result and the time it took to download the file.

You can, of course, change the file location to whatever makes sense, but it’s nice to have something handy and not need to go to any websites for a speed test.

Comments closed

The Cost and Difficulty Level of Changes

Richard Swinbank has an image for us:

I spent some time working at a property portal, where users could look at online listings of homes for sale or rent, then go on to book a viewing appointment with an estate agent. On one occasion we were asked to build a Power BI report showing:

  • the number of appointments booked by portal users
  • the percentage of appointments where the user had viewed the online listing more than once.

Sounds easy enough, right?

Click through for the image. It makes intuitive sense, but is a good visual depiction of why some data requests are more challenging than others.

Comments closed