Press "Enter" to skip to content

Author: Kevin Feasel

Spark Transformations and Actions

Divyansh Jain differentiates the key sets of functions in Spark:

Now there is a point to be noted here and that is when you apply the transformation on any RDD it will not perform the operation immediately. It will create a DAG(Directed Acyclic Graph) using the applied operation, source RDD and function used for transformation. And it will keep on building this graph using the references till you apply any action operation on the last lined up RDD. That is why the transformation in Spark are lazy.

Read on for more details.

Comments closed

Making SQL Agent Jobs AG-Aware

Stuart Moore shows how you can use dbatools to make SQL Agent jobs Availability Group-aware:

What do I mean by Availability Group aware? When running on an Availability Group, one SQL Server instance ‘owns’ the database at any point in time, but the SQL Agent jobs have to be replicated across all of the instances in the cluster. So you want to make sure that your SQL Server Agent jobs only do work on the instance that currently owns the Availability Group.

Doing this is pretty simple. Below is a piece of T-SQL that checks if the current SQL Server Instance is the primary instance in the AG. If it isn’t then we exit with an error.

Read on to see how, and how you can use dbatools to automate this work.

Comments closed

When Extended Event Loss Occurs

Jonathan Kehayias explains when Extended Events will discard an event:

There are three specific session options that determine how large of an event an event session can actually collect, and one that controls how events are dropped when the buffer memory for the event session is full or under pressure. All four of these matter when we are talking about collecting events that could generate a large event payload and we want to minimize the chance that we could potentially drop an event.

Read on to see these settings in action.

Comments closed

PowerApps Security

Jason Bonello gives us some tips on PowerApps security:

Depending on how the backend is set up, the tables having these sensitive data might be in the same database. For example, ERP solutions can have Company Accounts data, Customer related data and Inventory related data all in the same database, maybe under different schemas – but still part of the same database.

Now let’s say we are about to create a PowerApps solution to maintain Customer information. However, as part of the organization policy, this information should not be shared across other departments apart from the intended users.

Read on for some ideas of how to limit the risk of data exposure.

Comments closed

Azure Data Factory Switch Activity

Rayis Imayev explains what the Switch activity does in Azure Data Factory:

Developing conditional logic of your Azure Data Factory control flow has been simplified with introducing of the Switch activity – https://docs.microsoft.com/en-us/azure/data-factory/control-flow-switch-activity. Official documentation resource states, this new data factory activity “provides the same functionality that a switch statement provides in programming languages“. I would also add a more simplified definition of the Switch activity in Azure Data Factory: it is a container (or wrapper) for multiple IF conditions.

Click through for an example.

Comments closed

Accelerated Database Recovery

Andy Mallon explains the concept of Accelerated Database Recovery:

Accelerated Database Recovery(ADR) is a new feature intended to speed up the recovery process, which could be very slow, particularly when there are long-running, large transactions. ADR is not just for recovery after a crash, but also helps in other scenarios where the transaction log needs to be recovered–including Availability Group secondary redo and Failover Cluster Instance failovers.

This is one of the most interesting new features in SQL Server 2019.

Comments closed

Using Power BI Cards to Display Notes

Prathy Kamasani takes us through one use of the card visual in Power BI:

This is a long-overdue blog post. A couple of months ago, I worked with a client in Amsterdam; one of the use cases was to show key metrics, flags that need attention. The user also wanted to click on warning symbol to confirm, where the issues were, however, the user didn’t want a drill through, it has to be a left-click.

As of yet, except for button/action we can not do left clicks in Power BI. As the user didn’t want the report to open in another browser tab etc., so was thinking about other options and at the end decided to go for tooltips and symbols to show flags like below:

Click through for an example and an explanation of how it works.

Comments closed

Building Custom R Packages

Brad Lindblad takes us through building a custom package in R:

Don’t repeat yourself (DRY) is a well-known maxim in software development, and most R programmers follow this rule and build functions to avoid duplicating code. But how often do you:
– Reference the same dataset in different analyses
– Create the same ODBC connection to a database
– Tinker with the same colors and themes in ggplot
– Produce markdown docs from the same template

and so on? Notice a pattern? The word “same” is sprinkled in each bullet point. I smell an opportunity to apply DRY!

This is a good point: packages don’t have to go out to the broader world. They’re useful even if they just help you (or your team) out. H/T R-bloggers

Comments closed

Evaluating a Classification Model with a Spam Filter

John Mount shares an extract from Mount and Nina Zumel’s Practical Data Science with R, 2nd Edition:

This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction.

It is funny, but it takes some effort to teach in this way. New data scientists want to dive into the details of model construction first, and statisticians are used to getting model diagnostics as a side-effect of model fitting. However, to compare different modeling approaches one really needs good model evaluation that is independent of the model construction techniques.

Click through for that extract. I liked the first edition of the book, so I’m looking forward to the 2nd.

Comments closed

Top 5 and All Others in Power BI

Marco Russo and Alberto Ferrari show how you can include the top N rows and include an “Others” aggregate at the end:

Power BI offers the ability to apply a Top N constraint in a visual level filter, so that only a certain number of items are visible based on the evaluation of a measure. A common requirement is to show an additional row that accumulates the “other” items, which are those that are not visible in the report like in the following figure.

In order to solve this scenario you cannot use the Top N filter of Power BI. Instead, you apply the filter in a special measure (TopN Sales) and you use a calculated table to accommodate for the additional row named Others. Moreover, you need an additional column to let the Others row appear at the bottom of the table.

Read on to see how you can solve the problem.

Comments closed