Press "Enter" to skip to content

Curated SQL Posts

Types of Fragmentation on Index Pages in SQL Server

Deepthi Goguri explains what sorts of fragmentation can occur on an index in SQL Server:

Logical Fragmentation occurs when the logical order of the leaf level pages (logical order meaning the next key values in order) no longer the continuous page to the next physical data file page. Because of these pages which are out of order will affect the read ahead mechanism and the scan performance. Because of this logical fragmentation, read ahead have to do smaller read ahead reads.

If the logical fragmentation pages are already in the memory than the read ahead mechanism will not be affected in that case. Logical fragmentation will cause the problem for bigger indexes and not for the smaller ones usually (smaller indexes having pages 1000-5000 pages). You can monitor the amount of the index logical fragmentation by using the DMV sys.dm_db_index_physical_stats.

Read on to learn more about logical fragmentation, as well peers extent fragmentation and low page density.

Comments closed

Creating XML from SQL Server

Barney Lawrence shows off how to build XML from data in SQL Server:

In my experience creating XML is a less common task than reading it with one semi-exception (that being the old trick of using FOR XML PATH(”) to concatenate strings) but it can be an important one. The biggest piece of XML I’ve built was for a government mandated return, it was the only XML I built on that job but a sizeable proportion of the entire department’s work culminated in it. The switch from a set of Excel macros which took over eight hours to run to something that could rebuilt in minutes had a huge impact on our efficiency, particularly when there would be multiple rounds of validation failures against initial attempts at submission.

There are a few variants when it comes to converting queries into XML but in my opinion the only one you really need to know is the XML PATH approach, the others being either fiddly or not providing enough control.

Click through to learn more about the process. FOR XML PATH(): it’s not just for concatenating strings…

Comments closed

Range Filtering in Elasticsearch

The Hadoop in Real World team shows off some Elasticsearch skills:

Filtering based on a range like greater than, less than, greater than equal etc. are pretty common requirements when you work with data. In this post we will see how to perform range based filtering with Elasticsearch.

Knowing the specific syntax makes it easy to follow along. And it does help that Powershell has similar comparison flags with -gt, -gte, and the like.

Comments closed

Functional Data Analysis in R

Joseph Ricker gives us a gentle introduction to a not-so-gentle topic:

This plot might depict 80 measurements for a participant in a clinical trial where each data point represents the change in the level of some protein level. Or it could represent any series of longitudinal data where the measurements are take at irregular intervals. The curve looks like a time series with obvious correlations among the points, but there are not enough measurements to model the data with the usual time series methods. In a scenario like this, you might find Functional Data Analysis (FDA) to be a viable alternative to the usual multi-level, mixed model approach.

This post is meant to be a “gentle” introduction to doing FDA with R for someone who is totally new to the subject. I’ll show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read on to learn more.

Comments closed

Selecting Columns Based on Data Types in Power Query

Sandeep Pawar solves an interesting problem:

It’s not uncommon to have a table with hundreds of columns with different column types such as numerical, text, date, percentage etc. You can select columns by using the UI option “Choose Columns” and then selecting which columns to keep. But if you have have tens of columns of the same type, you will have to manually go through the list of columns and select the columns to keep. As far as I know, there is no direct way to only keep, for example, numerical columns. In Pandas, you can use select_dtypes method and pass which columns to choose based on the data types. No such option or function is available in Power Query.

In this example, I will show how I achieved it. You can customize it or turn it into a function based on your use case.

Read on for two classes of technique which do the trick.

Comments closed

Power BI Announcements at Microsoft Business Applications Summit

Gilbert Quevauvilliers has a long list of Power BI announcements:

As I have done each and every year I go through and give an overview of all the Power BI Announcements at the Microsoft Business Applications Summit 2021.

This year once again they have announced some incredible features either available now or coming soon so, please read below.

There are quite a few interesting features here. One of the ones which caught my eye was automatic aggregations for DirectQuery calculations, as that reminded me of MDX pre-calculations.

Comments closed

Multi-Select Slicers in Power BI

Reza Rad simplifies multiple selection:

This is a very short, simple article about how to have a multi-select slicer in Power BI. Power BI slicer is in fact multi-select by default, however, there is a very small option that if you set it, makes it even easier to use, let’s talk about it. If you want to learn more about Power BI, read the Power BI book from Rookie to Rock Star.

Click through to see how you can perform multi-selection by default, as well as an alternative setting.

Comments closed

Doodles about the Storage Engine

Forrest McDaniel explains via image:

Paul Randal is a SQL Server legend with loads of informative articles. But when I was a baby DBA first reading Inside the Storage Engine, I got a little stuck. It took many passes before, eventually, finally, it clicked. I wish I had a lightweight introduction, so in the practice of paying it forward…

Here’s the starting point: sometimes it’s easier to manage lots of small things (say, the 1s and 0s of data) by grouping them into larger things. It’s the same reason you don’t buy rice by the grain.

Read on for that introduction to the storage engine.

Comments closed