Press "Enter" to skip to content

Day: August 5, 2024

Cluster Sampling in R

Steven Sanderson shows us one sampling technique:

Cluster sampling is a useful technique when dealing with large datasets spread across different groups or clusters. It involves dividing the population into clusters, randomly selecting some clusters, and then sampling all or some members from these selected clusters. This method can save time and resources compared to simple random sampling.

In this post, we’ll walk through how to perform cluster sampling in R. We’ll use a sample dataset and break down the code step-by-step. By the end, you’ll have a clear understanding of how to implement cluster sampling in your projects.

Read on for the scenario and sample code.

Comments closed

Defining a OneLake Filesystem using fsspec

Sandeep Pawar looks at fsspec:

I mentioned on X the other day that, like other filesystem backends such as S3 and GCS, you can use fsspec to define the OneLake filesystem too. In this blog, I will explain how to define it and why it’s important to know about it.

Click through for the details on what fsspec is, why it’s important, and what benefits you can get in Microsoft Fabric as a result of its support of fsspec.

Comments closed

SSMS Layout and Tips

Tom Zika talks about SQL Server Management Studio:

I’d like to pin my Object Explorer and Registered Servers to the right and set them to auto-hide for these reasons:

  • I spend most of my time in the Query window and I like that it’s left-aligned
  • I have to resize the Object Explorer based on the level of nesting or length of the object name, sql jobs, etc
  • The Solution Explorer in Visual Studio is also on the right side

Read on for quite a few tips. Tom has a good tip regarding tabs and I’d bring up a separate tool for consideration: Tabs Studio. I paid for a license of it out of my own pocket when I was a database engineer and absolutely loved the control it gave me over displaying SSMS tabs, including having literally dozens of them on the screen at once.

1 Comment

Deconstructing Bitwise Days of the Week

Aaron Bertrand says, four days of the week, she thinks I’m the enemy:

I recently had to help support synchronization and distribution of workloads between multiple servers. Some of this work involves Task Scheduler and, be honest, who knew Task Scheduler was this complicated? On different servers, we wanted jobs to trigger on different days of the week. The way that Task Scheduler handles this programmatically is through a numeric property called WeeklyTrigger.DaysOfWeek, where the following coefficients are packed together into a single value:

Read on for the kind of pain that you can find within Task Scheduler. Any time I see bitwise storage like that in SQL Server, it hurts my head and not in a fun way. Aaron does help bring some sanity to the process, at least.

Comments closed

Spooling in DirectQuery when Moving through On-Premises Gateway

Chris Webb diagnoses and resolves an issue:

Recently I was working with a customer using DirectQuery mode and where all traffic to the data source had to go through an on-premises data gateway for security reasons. They noticed that report performance got worse when traffic went through the gateway and this was particularly true when Power BI generated SQL queries that returned hundreds of thousands of rows. 

Click through to learn more about what Chris found and how to fix the issue.

Comments closed

Quick Thoughts on Filestream

Chad Callihan shares some thoughts:

A few weeks ago, I had a post about running into some issues due to a database using the FILESTREAM feature. To be fair, the problem wasn’t with FILESTREAM itself, just that I hadn’t accounted for it being in use (or previously in use) while configuring Availability Groups. FILESTREAM has been around for years but isn’t something I’ve seen in use too often. In case you’re not familiar with it, let’s take a brief look at what it is and some gotchas to watch out for.

Filestream is usually not worth it. I did like FileTable, the SQL Server 2012 extension to Filestream that smoothed out some of the rough edges. But even that’s Windows-only and very uncommon to find in environments.

Comments closed