August 2019 – Page 3

CPU Usage DMV in SQL Server on Linux

Published 2019-08-28 by Kevin Feasel

Tejas Shah announces an improvement to sys.dm_os_ring_buffers in SQL Server 2019 RC1 on Linux:

Sys.dm_os_ring_buffers DMV has been a key DMV used for monitoring SQL Server by built-in tools as well as third party monitoring utilities. When SQL Server 2017 was released on Linux, unfortunately this DMV did not return correct CPU usage information by SQL Server process. SQL Server team is glad to announce that the starting with SQL Server 2019 release candidate, the sys.dm_os_ring_buffers DMV returns SQL CPU utilization correctly. This improvement should benefit the SQL Server monitoring ecosystem on Linux by providing a way to monitor SQL Server CPU usage and enable decision making to engage corrective action if required.

This brings it in line with what we have on Windows.

Comments closed

The Structure of Graph Data

Published 2019-08-28 by Kevin Feasel

Mala Mahadevan begins a series on graph data in SQL Server:

The simplest way to understand a graph data model is that there are just two entities – Nodes, which is what we call Entities in the relational world, and Edges, which are what we call relationships. They are typically represented like below, with the circles standing for nodes, and the arrows for relationships. The emphasis, as we can see is on the bold arrows – because relationships are what graph data is about, with less emphasis on entities/nodes.

Read the whole thing.

Comments closed

PARSE, CAST, and CONVERT

Published 2019-08-28 by Kevin Feasel

Max Vernon gives us three ways to change data types:

PARSE provides a mechanism to convert a wide variety of character based dates into a datetime data type. From the Docs:
Returns the result of an expression, translated to the requested data type in SQL Server.
Use PARSE only for converting from string to date/time and number types. For general type conversions, continue to use CAST or CONVERT. Keep in mind that there is a certain performance overhead in parsing the string value.

PARSE has its value, but comes at a cost, as Max shows.

Comments closed

Two Takes on Power BI Time Dimensions

Published 2019-08-28 by Kevin Feasel

Brett Powell gives us a version of a time dimension in Power Query:

Like most dimensions in a data model, users will generally begin their analysis of time at a high level (e.g. hour) and then filter and drill their way into greater levels of detail, often leveraging hierarchies built into the model and/or drilling functionality in tools like Power BI. To support analysis by multiple grains and hierarchies, the query produces columns which group the seconds into hourly quartiles, hours, and minutes.

Gilbert Quevauvilliers has a script to generate a time dimension as well:

Below is the syntax to create a Time Dimension Table in Power Query
I had a requirement where I needed to create a Time Dimension for a customer. Most of the time I only need the date. Upon searching I could not find a resource where they had created the time dimension only using Power Query.

Gilbert’s gives you a key insight into the value of time dimensions: breaking everything out into periods. 5-minute intervals, 15-minute intervals, or however the company looks at data.

Comments closed

Apache Flink 1.9 Released

Published 2019-08-27 by Kevin Feasel

The Apache Flink crew announces version 1.9.0:

The Apache Flink project’s goal is to develop a stream processing system to unify and power many forms of real-time and offline data processing applications as well as event-driven applications. In this release, we have made a huge step forward in that effort, by integrating Flink’s stream and batch processing capabilities under a single, unified runtime.
Significant features on this path are batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries. We are also excited to announce the availability of the State Processor API, which is one of the most frequently requested features and enables users to read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and its integration with the Apache Hive ecosystem.

Click through for the major changes.

Comments closed

Working with Tables in Databricks

Published 2019-08-27 by Kevin Feasel

Brad Llewellyn shows us how to build tables (temporary and permanent) and views in Azure Databricks using each of the main languages:

Simply put, an External Table is a table built directly on top of a folder within a data source. This means that the data is not hidden away in some proprietary SQL format. Instead, the data is completely accessible to outside systems in its native format. The main reason for this is that it gives us the ability to create “live” queries on top of text data sources. Every time a query is executed against the table, the query is run against the live data in the folder. This means that we don’t have to run ETL jobs to load data into the table. Instead, all we need to do is put the structured files in the folder and the queries will automatically surface the new data.

Each language has its own way of doing things, but they all use the Hive metastore under the covers.

Comments closed

“Big” Data

Published 2019-08-27 by Kevin Feasel

Buck Woody explains that “Big Data” is just data:

A few years ago it was all the rage to talk about “Big Data”. Lots of descriptions of “Big Data” popped up, including the “V’s” (Variety, Velocity, Volume, etc.) that proved very helpful. I even have my own definition:
Big Data is any data you can’t process
in the time you want
with the systems you have

This post is quite reasonable in its depiction of the problem. I extend it a bit further than that and talk about difficulty of processing the data. Nonetheless, read Buck’s full thoughts and check out the Big Data Clusters workshop.

Comments closed

Asymmetric Crosstabs in Power BI

Published 2019-08-27 by Kevin Feasel

Teo Lachev shows how we can implement asymmetric crosstabs in Power BI:

The Internet column shows the sales amount from FactInternetSales. Then, the matrix pivots on the BusinessType column in the FactResellerSales. Because, Internet sales don’t relate to BusinessType, it doesn’t make sense to pivot it. Instead, we want to show Internet sales in a single static column before the crosstab portion starts.
Implementing such a report in SSRS is easy thanks to its support of adjacent groups and static columns but not so much in Power BI. The issue is that Matrix would happily pivot both measures and the InternetSalesAmount would be repeated for each business type.

The solution isn’t awful, but it does involve knowledge of DAX.

Comments closed

Power BI Practices: Good and Best

Published 2019-08-27 by Kevin Feasel

Paul Turley has a great document plus checklist on Power BI practices:

I find there there are so many things to remember when starting a project that a checklist is handy. I’ve been collecting the following as notes for some time. Bare with me as I work on consolidating this article into a concise checklist.
The purpose of this article is to outline a set of guidelines and recommended practices for managing Microsoft Power BI projects. This guide is primarily focused on the work performed by the IT-managed BI Solution Developers for formally-managed BI projects.

If you work heavily with Power BI, you’ll really want to read this and review Paul’s checklist at the end.

Comments closed

On the Certification Debate

Published 2019-08-27 by Kevin Feasel

Grant Fritchey has thoughts on certifications:

However, you frequently see people, usually the ones with 47 certificates, going on and on about how, just one more cert, just one more, this time, I’ll get the job I want.
Nope.
Let me break the news. Experience, a proven track record, and knowledge are what get you jobs. And yes, I understand, how do you get experience without first getting a job? That is indeed the hurdle. I’m just telling you that certifications are not the rocket in your bottom that will throw you over that hurdle.

Also connections (which Grant also points out in the post which you should read). Connections land many more jobs than certifications. Most certifications are as much noise as signal, which greatly dilutes the value of the thing. Once again I lament the loss of the MCM, one of the few certifications with a near-zero percent noise rate due to how difficult it was and how many things you needed to understand to get past it. But even with it, experience and networking will get you much further.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Month: August 2019