Press "Enter" to skip to content

Day: May 6, 2021

Range Filtering in Elasticsearch

The Hadoop in Real World team shows off some Elasticsearch skills:

Filtering based on a range like greater than, less than, greater than equal etc. are pretty common requirements when you work with data. In this post we will see how to perform range based filtering with Elasticsearch.

Knowing the specific syntax makes it easy to follow along. And it does help that Powershell has similar comparison flags with -gt, -gte, and the like.

Comments closed

Functional Data Analysis in R

Joseph Ricker gives us a gentle introduction to a not-so-gentle topic:

This plot might depict 80 measurements for a participant in a clinical trial where each data point represents the change in the level of some protein level. Or it could represent any series of longitudinal data where the measurements are take at irregular intervals. The curve looks like a time series with obvious correlations among the points, but there are not enough measurements to model the data with the usual time series methods. In a scenario like this, you might find Functional Data Analysis (FDA) to be a viable alternative to the usual multi-level, mixed model approach.

This post is meant to be a “gentle” introduction to doing FDA with R for someone who is totally new to the subject. I’ll show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read on to learn more.

Comments closed

Selecting Columns Based on Data Types in Power Query

Sandeep Pawar solves an interesting problem:

It’s not uncommon to have a table with hundreds of columns with different column types such as numerical, text, date, percentage etc. You can select columns by using the UI option “Choose Columns” and then selecting which columns to keep. But if you have have tens of columns of the same type, you will have to manually go through the list of columns and select the columns to keep. As far as I know, there is no direct way to only keep, for example, numerical columns. In Pandas, you can use select_dtypes method and pass which columns to choose based on the data types. No such option or function is available in Power Query.

In this example, I will show how I achieved it. You can customize it or turn it into a function based on your use case.

Read on for two classes of technique which do the trick.

Comments closed

Power BI Announcements at Microsoft Business Applications Summit

Gilbert Quevauvilliers has a long list of Power BI announcements:

As I have done each and every year I go through and give an overview of all the Power BI Announcements at the Microsoft Business Applications Summit 2021.

This year once again they have announced some incredible features either available now or coming soon so, please read below.

There are quite a few interesting features here. One of the ones which caught my eye was automatic aggregations for DirectQuery calculations, as that reminded me of MDX pre-calculations.

Comments closed

Multi-Select Slicers in Power BI

Reza Rad simplifies multiple selection:

This is a very short, simple article about how to have a multi-select slicer in Power BI. Power BI slicer is in fact multi-select by default, however, there is a very small option that if you set it, makes it even easier to use, let’s talk about it. If you want to learn more about Power BI, read the Power BI book from Rookie to Rock Star.

Click through to see how you can perform multi-selection by default, as well as an alternative setting.

Comments closed

Doodles about the Storage Engine

Forrest McDaniel explains via image:

Paul Randal is a SQL Server legend with loads of informative articles. But when I was a baby DBA first reading Inside the Storage Engine, I got a little stuck. It took many passes before, eventually, finally, it clicked. I wish I had a lightweight introduction, so in the practice of paying it forward…

Here’s the starting point: sometimes it’s easier to manage lots of small things (say, the 1s and 0s of data) by grouping them into larger things. It’s the same reason you don’t buy rice by the grain.

Read on for that introduction to the storage engine.

Comments closed