April 2021 – Page 12

IDENTITY Overflow in SSIS

Published 2021-04-08 by Kevin Feasel

Conversion/overflow errors aren’t that unusual – normally a data flow broken by some unexpected data (“no, there’s no chance that field would ever have a character in it”), or perhaps a column hitting max size (“INT will be enough for years, like, 5 years. I’ll have left the company by then”)
But that wasn’t the case here – the package and user tables involved were checked by the dev team and there was no possible overflow. I’d checked system databases for maxed-out identity columns and found nothing. Heads were scratched.

Read on for the post-head-scratch answer.

Comments closed

SQL Server 2019 CU10 and Ubuntu 20.04

Published 2021-04-08 by Kevin Feasel

Amit Khandelwal has some good news for us:

We are happy to announce the official production support and availability of SQL Server 2019 package native to Ubuntu 20.04 distribution starting with CU 10. You can read more about the SQL Server 2019 CU 10 release here.

Also click through for some good news around Python 2 dependencies going away.

Comments closed

Automating Montoring

Published 2021-04-08 by Kevin Feasel

Thomas Williams has a multi-part series. Part 1 is a big picture primer:

It would be nice if every problem, now and in the future, was automatically monitored by an intuitive, cheap, flexible tool that only raised the alarm when there was a real problem, at the right time, to the right people.
However, in reality, monitoring is a game of whack-a-mole, built on an ad-hoc collection of tools and technologies, requiring know-how to interpret the often puzzling situations which may be high priority – or just background noise.

Part 2 continues the big picture look:

Monitoring is here to stay. There’s never enough disk space, memory, CPU, network throughput. If they ever existed, perfect conditions don’t persist, whether because of a downstream system failing, newly-discovered security issue, change in process, extraordinary amount of load etc. And, systems grow and change to meet new requirements, so yesterday’s monitoring may not meet tomorrow’s uptime goals.
Following on from part 1, here are some further “big picture” considerations for automated monitoring for the DBA:

Part 3 gets to ideas on instrumentation:

With some of the broader ideas out of the way, if there’s no monitoring in place and I was the “accidental”/”default” DBA, there are some metrics I’d definitely want to monitor. However, there’s a ton of real-time metrics I haven’t included below like CPU & memory use, number of current connections, disk I/O as at right now. They’re probably not good candidates for “roll-your-own”-type monitoring.
I consider “alerts” in the table below to include an e-mail, SMS or dashboard item that aims to trigger a response to fix. The collection frequency especially is just a guide – closer to real-time might be desirable, but the trade-off is impact on servers and connected apps.

Check out all three posts so far in the series.

Comments closed

Ordering and Sorting Data in Spark

Published 2021-04-07 by Kevin Feasel

Landon Robinson shows how to sort data in Spark RDDs and DataFrames:

In the analysis section of Spark Starter Guide 4.6: How to Aggregate Data, we asked these questions: “Who is the youngest cat in the data? Who is the oldest?”
Let’s use ordering in Spark as an alternative method to answer those same questions, and achieve the same result. Specifically, let’s again find the youngest and oldest cats in the data.

Click through for plenty of examples.

Comments closed

Time Series Data in PostgreSQL

Published 2021-04-07 by Kevin Feasel

Michael Grogan has a few examples of working with time series data in PostgreSQL:

Tools such as Python or R are most often used to conduct deep time series analysis.
However, knowledge of how to work with time series data using SQL is essential, particularly when working with very large datasets or data that is constantly being updated.
Here are some useful commands that can be invoked in SQL to better work with time series data within the data table itself.

Click through for examples like using a window function to calculate moving averages and using time zones. H/T Mark Hutchinson.

Comments closed

THROW and Linked Servers

Published 2021-04-07 by Kevin Feasel

Chad Baldwin hits on an interesting result when using THROW across a linked server:

The THROW command is non-terminating if it is used in a stored procedure over a linked-server.
I don’t know the details to why it works this way. The THROW command returns an error message with a severity level of 16, which, according to my RAISERROR Cheatsheet, does not stop execution.
There’s something special about the THROW command beyond raising an error message. Behind the scenes, there is likely some extra information being passed to tell SQL Server that execution needs to stop in that moment, and that extra bit of information does not appear to be passed between linked servers.

Click through for a demo.

Comments closed

Power BI Content Certification and Promotion

Published 2021-04-07 by Kevin Feasel

Melissa Coates talks about certification (though not certifications):

The number of people who are allowed to certify content should be pretty limited, and should only include people who are knowledgeable about both the data and the certification requirements. Ideally, certification should only be done by domain/subject matter experts — and this isn’t usually your Power BI administrator. In larger organizations, you might reference multiple Azure AD security groups in the tenant setting – but it should remain a pretty limited number of people to ensure that the certification designation is truly meaningful. (Note that we currently cannot specify who can endorse content on a per-workspace level.)

Read on for more information about content certification, as well as content promotion, in Power BI.

Comments closed

Power BI Exporter

Published 2021-04-07 by Kevin Feasel

Soheil Bakhshi announces a new community tool:

Apart from my website statistics showing many people want to export data from Power BI Desktop, some of my customers asked the same question. They wanted to export the curated data from their data model within Power BI Desktop to CSV format and make the curated data available for their other platforms. While all the methods mentioned in my previous blog posts are working, some users still find them complex. So I thought, we can make it better. We can make a straightforward tool that exports the data with only two clicks. So we started building the Power BI Exporter as a micro-project. We added some more ideas to the original idea of only exporting the data. We thought it is good to export the data along with the table names, column names and relationships. Having that information handy, we can quickly build the same data model as the one we exported its data but using the CSV files as the data sources. The other idea was to pack everything in a ZIP file on the fly, so we have a single ZIP file, including the tables, columns, and relationships. As a result, the first version of the Power BI Exporter is born. In this post I explain how it works.

Read on to see how to use it. Download is free, though you do have to provide them with an e-mail address.

Comments closed

Rolling 12-Month Averages in DAX

Published 2021-04-07 by Kevin Feasel

Alberto Ferrari shows how to calculate a rolling 12-month average in DAX:

The measure we want to compute is Rolling Avg 12M, which computes the rolling average of the Sales Amount measure over the last 12 months. When you project the rolling average on a chart, the resulting line is much smoother; it removes the spikes and drops that would make it difficult to recognize a trend in sales.

Click through to see two ways to do this: via a DAX measure and then as a calculation group.

Comments closed

Temporal Table Data Retrieval Execution Plans

Published 2021-04-07 by Kevin Feasel

Hugo Kornelis digs into execution plans when retrieving data from temporal tables:

The query above will simply grab the requested data from the Products table as if it were a normal table. The corresponding history table is not used at all. To understand why this happens it is important to recall that system-versioned temporal tables always store the currently valid version of the row in the table itself; all older, no longer valid versions are stored in the accompanying history table. So the query above, which does not use any specific temporal logic, means “I don’t care about the history, I want the data as it is now”. Due to how temporal tables are designed, SQL Server only has to query the actual table for this.
Since there is nothing special or interesting about this, let’s move on to queries that do use special logic in the query to retrieve older versions of the data.

Click through for quite a bit more detail on what it looks like for less-regular queries.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Month: April 2021