Press "Enter" to skip to content

Month: May 2020

Spark Application Execution Modes

Kundan Kumarr explains how the two execution modes differ with Apache Spark:

Whenever we submit a Spark application to the cluster, the Driver or the Spark App Master should get started. And the Driver will be starting N number of workers. Spark driver will be managing spark context object to share the data and coordinates with the workers and cluster manager across the cluster. Cluster Manager can be Spark Standalone or Hadoop YARN or Mesos. Workers will be assigned a task and it will consolidate and collect the result back to the driver. A spark application gets executed within the cluster in two different modes – one is cluster mode and the second is client mode.

Click through for a comparison.

Leave a Comment

Big-O Notation in .NET

Camilo Reyes takes us through a useful concept in computer science as applied to .NET Core:

Performance sensitive code is often overlooked in business apps. This is because high-performance code might not affect outcomes. Concerns with execution times are ignorable if the code finishes in a reasonable time. Apps either meet expectations or not, and performance issues can go undetected. Devs, for the most part, care about business outcomes and performance is the outlier. When response times cross an arbitrary line, everything flips to less than desirable or unacceptable.

Luckily, the Big-O notation attempts to approach this problem in a general way. This focuses both on outcomes and the algorithm. Big-O notation attempts to conceptualize algorithm complexity without laborious performance tuning.

This is a rather high-level take on the idea, as it doesn’t cover any of the O(NlogN) or O(logN) algorithms out there. But if you are not familiar with the concept, it is good to know.

Leave a Comment

Standardized DAX Separators in Power BI Desktop

Marco Russo goes over the ramifications of a recent change to Power BI Desktop:

Starting from the May 2020 version of Power BI Desktop, regardless of the Windows locale settings DAX always uses standard separators by default. This change does not affect most Power BI users around the world; but if you use the comma as a decimal separator and 10,000 is just ten and not ten thousand, then you are affected by this change.

First of all, you can restore the previous behavior as I describe in this post, but the recommendation and the default are now to use the standard DAX separators. I want to describe why this (I think good) choice was made and how I contributed to making this happen.

Read the whole thing.

Leave a Comment

May 2020 Release of Azure Data Studio

Alan Yu has some goodies for us:

The key highlights to cover this month include:

– Announcing Redgate SQL Prompt extension
– Announcing the new machine learning extension
– Added new Python dependencies wizard
– Added support for parameterization for Always Encrypted
– Improvements to the notebook markdown toolbar
– Bug fixes

For a list of complete updates, refer to the Azure Data Studio release notes.

I’ll have to check out the ML extension.

Leave a Comment

AMD Processor Recommendations for SQL Server

Glenn Berry has some thoughts on AMD’s EPYC line of processors:

Over the years, I have written many articles about the fine art of processor selection for SQL Server. This is an important topic, because it has a direct relationship to your SQL Server license costs. It also affects your performance and scalability. As new processor families are introduced, I do the required analytical work and update my recommendations. In this post, I will list my recommended AMD Processors for SQL Server.

I’m just happy that the answer isn’t a null set anymore.

Leave a Comment

Filtering Power BI Dimensions with List.Contains

Ed Hansberry gives us a second option for filtering dimension values:

I don’t like loading up a slicer with dozens or hundreds of items that have no corresponding records. The same would apply if there was no slicer, but the consumer wanted to filter using the Filter pane. So I’ll filter the customer table so it only includes what I would call “active customers” that are shown in the sales table.

The most straight forward way to do this is by doing an Inner Join between the tables, but there is another way, using the powerful List.Contains() feature of Power Query. And what makes it so powerful is not just it’s utility, but when you run it against data in a SQL Server or similar server, Power Query will fold the statement.

Let me walk you through both methods so it is clear.

Read on for the walkthrough.

Leave a Comment

Power BI Incremental Refresh Against Web API

Dustin Ryan shows how you can have Power BI perform incremental refresh against a .NET Web API source:

The customer is using Power BI to report on data from Service Now via APIs. So the customer was able to quickly connect Power BI to Service Now data and begin reporting on relevant datasets very quickly. The challenge, however, is that querying multiple years of data via the API was less than desirable for a variety of reasons.

The customer was interested in exploring the incremental refresh capabilities of Power BI, but were worried about using Power BI’s native incremental refresh capability since query folding (if you’re new to query folding, read this here) is not supported by Power BI’s web connector. So the customer team reached out to me with the challenge and I worked up an example that I think will help them meet their requirement.

Click through for the solution.

Leave a Comment

R 4.0 Improvements: stopifnot()

Bob Rudis looks at one of the R 4.0 changes hidden in the changelog:

R 4.0.0 has been out for a while, now, and — apart from a case where merge() was slower than dirt — it’s been really stable for at least me (I use it daily on macOS, Linux, and Windows). Sure, it came with some headline-grabbing features/upgrades, but I’ve started looking at what other useful nuggets might be in the changelog and decided to blog them as I find them.

Today’s nugget is the venerable stopifnot() function which was significantly enhanced by this PR by Neil Fultz.

Read on for a quality of life improvement with error handling in R.

Leave a Comment

Mongo Shell Preview for Azure Cosmos DB

Hasan Savran takes a look at the preview for a native Mongo shell in Cosmos DB:

Native Mongo Shell became available as In-Preview mode in Azure Cosmos DB on March. I had chance to check it out this week and I decided to write about it this week. Mongo Shell let you execute Mongo database commands in Cosmos DB Data Explorer! Currently, It is not available in all Azure regions. If you don’t see this option, your database might be in a region that does not support this option yet. 
     Click on Data Explorer to see the Mongo Shell button. If you have never used it before, you will need to activate the Mongo Shell by clicking Complete Setup button. This box will open up when you click on Open Mongo Shell.

It sounds like it’s a little bit limited at the moment, but Hasan takes you through the things you can do today.

Leave a Comment

Azure Synapse Analytics in Preview

Simon Whiteley clarifies a Build announcement:

Today’s the day! There’s much buzz & excitement as we FINALLY get to see Azure Synapse Analytics in public preview, ready for us all to get our hands on it. There’s a raft of other announcements that come hand & hand with it too.

What’s that? You thought Azure Synapse Analytics was already available? You’ve been using all year and don’t see what the fuss is about??

I’m expecting this to be the common reaction. The marketing story for Synapse has been… interesting… to say the least. I’ve been asked several times in the last week exactly what the new story is and, given today’s news, I thought I’d clarify.

The big picture is the version of Azure Synapse Analytics I’ve been interested in for a bit, so it’s nice to see the movement here.

Leave a Comment