Press "Enter" to skip to content

Day: November 9, 2023

Plotting a Cumulative Distribution Function in R

Steven Sanderson builds a plot:

Before delving into the world of R programming, let’s first grasp the fundamental concept of a CDF. Imagine a group of students eagerly awaiting their exam results. The CDF for their scores would depict the probability of encountering a student with a score less than or equal to a specific value. For instance, if the CDF indicates a value of 0.7 at 80%, it implies that there’s a 70% chance of finding a student with a score of 80 or lower.

Read on to see how you can calculate this in a dataset and then plot the CDF.

Comments closed

Functions to Deal with Object Names in R

Maelle Salmon has a sticky note:

Interestingly the docs for setNames() sound as if it were created just for this use case!

“This is a convenience function that sets the names on an object and returns the object. It is most useful at the end of a function definition where one is creating the object to be returned and would prefer not to store it under a name just so the names can be assigned.”

For the opposite operation, removing the names of an object, we can use unname().

Read on for an overview of several of these functions. H/T R-Bloggers.

Comments closed

Using VS Code Extensions in Azure Data Studio

I have a quick tip blog post:

Something that a lot of people don’t know is, you can import some Visual Studio Code extensions into Azure Data Studio and have them work.

I emphasize the word “some” here because I have seen cases of extensions failing to install. I don’t know exactly why certain extensions fail or what percentage of them will work, so give it a whirl and see what you get.

Comments closed

A Look at UTF-8 in SQL Server

Daniel Hutmacher takes a peek:

A client asked me about SQL Server collations, and if they should consider the new UTF8 collations (new since SQL Server 2019). I tried to hide my blank stare of ignorance, and promised them I’d look it up and get back to them.

Not gonna lie, I think UTF and Unicode can be pretty confusing at times, so I did some googling and some testing, and here’s what I found.

Read on to see what Daniel found, and check the comments for one person’s unfortunate experience. For a deep dive into UTF-8 in SQL Server, I also recommend Solomon Rutzky’s blog post on the topic, noting that it was last updated in 2020 so some of the incompatibility section may not be relevant any longer.

Comments closed

Unpivoting Data via CROSS APPLY

Aaron Bertrand’s speaking my language:

I’ve written about UNPIVOT before – see Use SQL Server’s UNPIVOT operator to help normalize output and Use SQL Server’s UNPIVOT operator to dynamically normalize output. It’s a powerful language feature that lets you flip results sideways, sort of like the opposite of PIVOT. This is great if you have columns like Phone1 and Phone2 but want to collapse them into one column. One of the challenges is that if you have other columns you also want to collapse similarly, like Email1 and Email2, you need to add an additional UNPIVOT operator or use a different approach. Is there a way to do this in a less complicated way?

Click through for the answer. And a free Curated SQL protip: if you ever want linked here, write an article about a good use case for the APPLY operator. I’m a sucker for those.

Comments closed

Data Warehouse Updates for Microsoft Fabric

Dennes Torres brings the news:

We have a specific statement to clone a table. But what exactly does it mean?

The Clone Table feature promises to create an image of the table on a specific point in time or with the current information. The documentation is not precise, because at some points it says it’s only a clone of the structure, but we can see the data on the table.

What’s the advantage of this over a simple SELECT INTO statement?

Read on for that comparison, as well as several other things recently added to Microsoft Fabric data warehouses.

Comments closed

Using the NTILE() Window Function

Joe Celko explains how NTILE() works:

The SQL NTILE() is a window function that allows you to break a table into a specified number of approximately equal groups, or <bucket count>. For each row in a grouping, the NTILE() function assigns a bucket number representing the group to which the row belong starting at one.

Click through for the explanation, including what it is and isn’t, as well as several examples.

Comments closed

Azure Data Studio 1.47 Now Available

David Levy announces a new release of Azure Data Studio:

Another change that is sure to be noticed is that we have enabled parallel message processing by default. This change will improve the performance of the application when connected to SQL Server data sources. We have introduced two new settings to manage this new behavior. The first setting, Mssql: Parallel Message Processing is set to true by default. The other setting, Mssql: Parallel Message Processing Limit, has a default of 100. This setting controls the number of threads used for parallel processing. We are excited to hear your impressions on the impact of this change.

Read on for the full list of changes.

Comments closed