Press "Enter" to skip to content

Day: April 7, 2021

Ordering and Sorting Data in Spark

Landon Robinson shows how to sort data in Spark RDDs and DataFrames:

In the analysis section of Spark Starter Guide 4.6: How to Aggregate Data, we asked these questions: “Who is the youngest cat in the data? Who is the oldest?”

Let’s use ordering in Spark as an alternative method to answer those same questions, and achieve the same result. Specifically, let’s again find the youngest and oldest cats in the data.

Click through for plenty of examples.

Comments closed

Time Series Data in PostgreSQL

Michael Grogan has a few examples of working with time series data in PostgreSQL:

Tools such as Python or R are most often used to conduct deep time series analysis.

However, knowledge of how to work with time series data using SQL is essential, particularly when working with very large datasets or data that is constantly being updated.

Here are some useful commands that can be invoked in SQL to better work with time series data within the data table itself.

Click through for examples like using a window function to calculate moving averages and using time zones. H/T Mark Hutchinson.

Comments closed

THROW and Linked Servers

Chad Baldwin hits on an interesting result when using THROW across a linked server:

The THROW command is non-terminating if it is used in a stored procedure over a linked-server.

I don’t know the details to why it works this way. The THROW command returns an error message with a severity level of 16, which, according to my RAISERROR Cheatsheet, does not stop execution.

There’s something special about the THROW command beyond raising an error message. Behind the scenes, there is likely some extra information being passed to tell SQL Server that execution needs to stop in that moment, and that extra bit of information does not appear to be passed between linked servers.

Click through for a demo.

Comments closed

Power BI Content Certification and Promotion

Melissa Coates talks about certification (though not certifications):

The number of people who are allowed to certify content should be pretty limited, and should only include people who are knowledgeable about both the data and the certification requirements. Ideally, certification should only be done by domain/subject matter experts — and this isn’t usually your Power BI administrator. In larger organizations, you might reference multiple Azure AD security groups in the tenant setting – but it should remain a pretty limited number of people to ensure that the certification designation is truly meaningful. (Note that we currently cannot specify who can endorse content on a per-workspace level.)

Read on for more information about content certification, as well as content promotion, in Power BI.

Comments closed

Power BI Exporter

Soheil Bakhshi announces a new community tool:

Apart from my website statistics showing many people want to export data from Power BI Desktop, some of my customers asked the same question. They wanted to export the curated data from their data model within Power BI Desktop to CSV format and make the curated data available for their other platforms. While all the methods mentioned in my previous blog posts are working, some users still find them complex. So I thought, we can make it better. We can make a straightforward tool that exports the data with only two clicks. So we started building the Power BI Exporter as a micro-project. We added some more ideas to the original idea of only exporting the data. We thought it is good to export the data along with the table names, column names and relationships. Having that information handy, we can quickly build the same data model as the one we exported its data but using the CSV files as the data sources. The other idea was to pack everything in a ZIP file on the fly, so we have a single ZIP file, including the tables, columns, and relationships. As a result, the first version of the Power BI Exporter is born. In this post I explain how it works.

Read on to see how to use it. Download is free, though you do have to provide them with an e-mail address.

Comments closed

Rolling 12-Month Averages in DAX

Alberto Ferrari shows how to calculate a rolling 12-month average in DAX:

The measure we want to compute is Rolling Avg 12M, which computes the rolling average of the Sales Amount measure over the last 12 months. When you project the rolling average on a chart, the resulting line is much smoother; it removes the spikes and drops that would make it difficult to recognize a trend in sales.

Click through to see two ways to do this: via a DAX measure and then as a calculation group.

Comments closed

Temporal Table Data Retrieval Execution Plans

Hugo Kornelis digs into execution plans when retrieving data from temporal tables:

The query above will simply grab the requested data from the Products table as if it were a normal table. The corresponding history table is not used at all. To understand why this happens it is important to recall that system-versioned temporal tables always store the currently valid version of the row in the table itself; all older, no longer valid versions are stored in the accompanying history table. So the query above, which does not use any specific temporal logic, means “I don’t care about the history, I want the data as it is now”. Due to how temporal tables are designed, SQL Server only has to query the actual table for this.

Since there is nothing special or interesting about this, let’s move on to queries that do use special logic in the query to retrieve older versions of the data.

Click through for quite a bit more detail on what it looks like for less-regular queries.

Comments closed

Granular Deployment of Power BI Changes with ALM Toolkit

Gilbert Quevauvilliers shows off an interesting scenario:

In this blog post I am going to demonstrate how to make a granular deployment where I will create a new column in my City table, and only deploy those changes.

What this means is that by deploying only the column change to my PPU dataset, I am only updating the column in the table.

This now saves me from doing the following tasks previously:

– Time taken to refresh the PBIX file so that the data is up to date.
– Re-uploading my PBIX.
– If configured re-creating the incremental refreshing
– Time and effort to upload and wait for dataset refresh.
– Quick updates to my dataset.

I will not have to worry about saving my PBIX file, file and if configured re-creating the incremental refreshing. This saves me a lot of time and effort.

Click through to see those steps in action.

Comments closed