Press "Enter" to skip to content

Day: September 30, 2021

Why 200 Tasks for a Spark Execution?

The Hadoop in Real World team explains why you might see 200 tasks when running a Spark job:

It is quite common to see 200 tasks in one of your stages and more specifically at a stage which requires wide transformation. The reason for this is, wide transformations in Spark requires a shuffle. Operations like join, group by etc. are wide transform operations and they trigger a shuffle.

Read on to learn why 200, and whether 200 is the right number for you.

Comments closed

Switching Connections from AAS to Power BI

Marc Lelijveld wants to swap a connection from using Azure Analysis Services to Power BI Premium:

Having the context of an Azure Analysis Services dataset that is migrated to Power BI Premium, you might have to rebind many reports. Especially if this dataset is positioned as being a managed dataset that is also used for self-service purposes and has many related reports.

In this blog I will elaborate on how you can easily rebind all these reports to the new Power BI dataset, without downloading all reports and manual rebinding.

It’s not a trivial operation, but it is a lot easier than updating each entry individually.

Comments closed

Power BI 101

Soheil Bakhshi is starting some 101-level training on Power BI:

Many people talk about Power BI, its benefits and common challenges, and many more want to learn Power BI, which is excellent indeed. But there are many misconceptions and misunderstandings amongst the people who think they know Power BI. In my opinion, it is a significant risk in using tools without knowing them, and using the technology is no different. The situation is even worse when people who must know the technology well don’t know it, but they think they do. These people are potential risks to the businesses that want to adopt Power BI as their primary analytical solution across the organisation. As a part of my day-to-day job, I communicate with many people interacting with Power BI. Amongst many knowledgeable users are some of those who confuse things pretty frequently, which indicates a lack of understanding of the basic concepts.
So I decided to write a series of Power BI 101 to explain the basics of the technology that we all love in simple language. Regardless of your usage of Power BI, I endeavour to help you know what to expect from Power BI. This is the first part of this series.

Read on for the start of this series, asking the question “What is Power BI?”

Comments closed

Partition Switching of Staging Data

Aaron Bertrand shares a technique to make table refreshes easier for end users:

So, what is a staging table in SQL? A staging table can be more easily understood using a real-world example: Let’s say you have a table full of vegetables you’re selling at the local farmer’s market. As your vegetables sell and you bring in new inventory:

– When you bring a load of new vegetables, it’s going to take you 20 minutes to clear off the table and replace the remaining stock with the newer product.

– You don’t want customers to sit there and wait 20 minutes for the switch to happen, since most will get their vegetables elsewhere.

Now, what if you had a second empty table where you load the new vegetables, and while you’re doing that, customers can still buy the older vegetables from the first table? (Let’s pretend it’s not because the older vegetables went bad or are otherwise less desirable.)

Read on for some techniques Aaron used for a long time and why he switched to partition switching.

Comments closed

Database Deployment with External References

Sebastian Meine and Liz Baron try to untangle the Gordian knot:

Most database developers are dealing with databases that contain external references. Even if the database code is in source control, these external references can make it very difficult to deploy to new environments. In these multi-database environments, tools like SQLCompare and SQL Change Automation do not automatically resolve object-order across databases, resulting in errors during deployment.

One way to tackle this, which works especially well for CI pipelines, is to create facades for all externally referenced databases. A facade in this context is a database with the expected name, with the expected objects, but those objects are hollowed out and do not contain any dependencies. You can compare this concept to an interface in an object-oriented language. Once you have these facades, they can be used in a pre-deployment step, simplifying the rest of the deployment by effectively removing object-order dependencies with these external databases.

This is one of the most painful parts of converting existing databases into model-driven database development. Especially once you start having to deal with cross-dependencies and rapidly-changing databases.

Comments closed

Calculating Lead Time from Jira and GitHub

Maria Zakourdaev wants to measure agility:

Do you want to visualize your RnD team performance to drive business value? Is there anything that is slowing down your development pipeline? How agile is your team? How long are your customers waiting for the features?

There are many things that can hold you back. Backlog management, code review delays, resources provisioning, manual testing and deployment automation efficiency. In this article I will show you my method of measuring one of the metrics described in this book called LeadTime.

Read on to see how you can do this.

Comments closed