Kevin Feasel – Page 782

Measure Filters in Power BI

Published 2021-09-14 by Kevin Feasel

Marco Russo and Alberto Ferrari dive into a topic:

The first paragraph of this article needs to be a warning: the article itself is here for DAX and Power BI enthusiasts only. We are going to show a report that does not work, and then we explore how to fix the problem by performing a deep analysis of the queries generated by Power BI, finding the problem, and finally fixing it. The article contains a lot of references to advanced DAX concepts and the final solution is NOT a best practice. The value of the article is not in the specific solution. Rather, the important part is that a deep understanding of DAX and Power BI can help you obtain the right results, specifically when you have the feeling that you are faced with a bug because Power BI is acting strange. If you do not like DAX before reading this article, you will like it even less at the end. But if you love DAX, then chances are you will really enjoy the reading, even though it requires quite a lot of brain bandwidth. For sure, it took all of mine when I first encountered this behavior.

Break out the propeller hats before you dive in.

Comments closed

Active Directory Authentication for SQL Containers on AKS

Published 2021-09-14 by Kevin Feasel

Amit Khandelwal has a walkthrough for us:

In this blog today, let’s configure AD (Active Directory) authentication for SQL Server containers running on Azure Kubernetes Service (AKS). Same steps can be followed for SQL Server containers deployed on other kubernetes environments as well.

Click through for step-by-step instructions on the process.

Comments closed

Building a Progress Bar in R

Published 2021-09-13 by Kevin Feasel

Tomaz Kastrun is already counting down to next year:

Progress bar showing the number of days until end of the year. Yes, only based on the days and not hour. And needless to say, the progress is soothing using ASCII charts.
So the function is fair to say, simple and useless.

In fairness, the function isn’t that useful but the technique is, particularly if you’re installing something or running a long process.

Comments closed

Order, Sort, Cluster, and Distribute in Hive

Published 2021-09-13 by Kevin Feasel

The Hadoop in Real World team give us three methods (and one synonym) to organize results in Hive:

Hive provides 3 options to order or sort the result of records – order by, sort by, cluster by and distribute by. Which option you choose has performance implications. So it is important to understand the difference between the options and choose the right one for the use case at hand.

Click through for a high-level overview of the techniques.

Comments closed

Scaling Hadoop Beyond 10,000 Nodes

Published 2021-09-13 by Kevin Feasel

Keqiu Hu, et al, take us through a problem of scale:

At LinkedIn, we use Hadoop as our backbone for big data analytics and machine learning. With an exponentially growing data volume, and the company heavily investing in machine learning and data science, we have been doubling our cluster size year over year to match the compute workload growth. Our largest cluster now has ~10,000 nodes, one of the largest (if not the largest) Hadoop clusters on the planet. Scaling Hadoop YARN has emerged as one of the most challenging tasks for our infrastructure over the years.
In this blog post, we will first discuss the YARN cluster slowdowns we observed as we approached 10,000 nodes and the fixes we developed for these slowdowns. Then, we will share the ways we proactively monitored for future performance degradations, including a now open-sourced tool we wrote called DynoYARN, which reliably forecasts performance for YARN clusters of arbitrary size. Finally, we will describe Robin, an in-house service which enables us to horizontally scale our clusters beyond 10,000 nodes.

Read on to learn about the problems they experienced and how they resolved them.

Comments closed

Creating and Using Variables in DAX

Published 2021-09-13 by Kevin Feasel

Jeet Kainth takes us through the process of working with variables in DAX:

Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.

Read on for a demonstration, as well as several examples of how you can use variables to make your DAX-writing life easier.

Comments closed

AWS EC2 I3 Instance Types and Storage Persistence

Published 2021-09-13 by Kevin Feasel

Steve REzhener has a warning for us:

Amazon Web Services Elastic Cloud Computing (a.k.a. EC2) is a service that lets anyone with a credit card rent a virtualized server from Amazon. To cater to different clients’ needs, AWS provides various instance types that are either general instance or specific-purpose instances (focused on CPU, RAM, IO). You can see the different types in Fig 1. This blog post is going to talk about a storage optimized instance. the I3 instance type family, its little-known problem, and the solution in the form of Elastic Block Storage (a.k.a. EBS).

Click through for the warning, more explanation, and what you can do about it. H/T the SQLServerCentral newsletter.

Comments closed

Preserving SQL Data Types when Query Folding

Published 2021-09-13 by Kevin Feasel

Chris Webb has a follow-up to a prior post:

My post earlier this year on enabling query folding when using SQL queries as a data source in Power Query provoked a lot of interest. This post adds one more useful detail: how to preserve the original data types of the columns in your query when using this technique with SQL Server-related sources.

Read on for an example using the DATE datatype.

Comments closed

Exchange Spill Wait Stats

Published 2021-09-13 by Kevin Feasel

Erik Darling looks at exchange spills:

There are quite high waits on PAGEIOLATCH_EX, SLEEP_TASK, and SLEEP_BPOOL_STEAL rounding out the top five. This is quite interesting, because I’ve never explicitly thought of PAGEIOLATCH_EX waits in the context of exchange spills. Normally, I think of them when queries read pages from disk into memory for modification.
Going down the line, SLEEP_TASK is familiar from our time spent with hash spills, but SLEEP_BPOOL_STEAL is so far undocumented anywhere.

Erik also does the math on this query and recommends that you not write a query like this one.

Comments closed

Data Lakehouse Point-of-Sale Analytics Demo

Published 2021-09-10 by Kevin Feasel

Bryan Smith and Rob Saker share a pattern:

Disruptions in the supply chain – from reduced product supply and diminished warehouse capacity – coupled with rapidly shifting consumer expectations for seamless omnichannel experiences are driving retailers to rethink how they use data to manage their operations. Prior to the pandemic, 71% of retailers named lack of real-time visibility into inventory as a top obstacle to achieving their omnichannel goals. The pandemic only increased demand for integrated online and in-store experiences, placing even more pressure on retailers to present accurate product availability and manage order changes on the fly. Better access to real-time information is the key to meeting consumer demands in the new normal.
In this blog, we’ll address the need for real-time data in retail, and how to overcome the challenges of moving real-time streaming of point-of-sale data at scale with a data lakehouse.

It’s a cool scenario, at the least.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Author: Kevin Feasel