Press "Enter" to skip to content

Curated SQL Posts

Spark RDD Transformations

Meenakshi Goyal walks us through the transformation functions available to you when using a Spark RDD:

The role of transformation in Spark is to create a new dataset from an existing one. Lazy transformations are those that are computed only when an action requires a result to be returned to the driver programme.

When we call an action, transformations are executed since they are inherently lazy. Not right away are they carried out. There are two primary types of transformations: map() and filter ().
The outcome RDD is always distinct from the parent RDD after the transformation. It could be smaller (filter, count, distinct, sample, for example), bigger (flatMap(), union(), Cartesian()), or the same size (e.g. map).

Read on to learn more about transformations, including examples of how each works. Even if you’re using the DataFrames API for Spark, it’s still important to understand that transformations are lazy.

Comments closed

GENERATE_SERIES and Data Types

Bill Fellows runs into an issue:

Perfect, now I have a row for each second from midnight to approximately 5.5 hours later. What if my duration need to vary because I’m going to compute these ranges for a number of different scenarios? I should make that 19565 into a variable and let’s overengineer this by making it a bigint.

Things don’t work out quite the way you might have expected there. Read on and see what Bill found and how you can circumvent the problem.

Comments closed

SQLErrorCodes

Sean Gallardy takes a number:

I am often asked about all kinds of various errors, of course with absolutely no context. I also get asked what error X is or means or says… I don’t remember that stuff off the top of my head. The thing is, you kind of need SQL Server to go look it up and there have been a plethora of times when this wasn’t possible. I’ve also noticed that people tend to give you just the error number and not anything else.

Read on to learn more about what Sean has created, akin to the SQLskills wait stats compendium.

Comments closed

Deploying Database Project Changes

Olivier Van Steenlandt continues a series on database projects:

In a previous blog post (Database Projects – Merging changes), we successfully merged our feature branch into our development branch. Now, as a final step in our development process, we want to get our changes deployed to our development environment.

In this blog post, we will go through the process step by step to execute a manual deployment. We will take a look at what happens behind the scenes, how deployment works and we also will take a look at Publishing Profiles.

Check out that process.

Comments closed

Testing Power BI REST APIs

Gilbert Quevauvilliers tries it:

Did you know that there is an easy way to run and extract Power BI REST API data?

The good news is that you can do this directly in your web browser. You don’t have to install or configure anything!

The method below works well if you want to either test the API to see what it returns.

Or if you want to run it to extract some data.

Read on for the process.

Comments closed

Reviewing an Existing Data Model with Power BI Model Documenter

Marc Lelijveld wants to see what’s out there in the wild:

In some scenarios, it can happen that you do not even have a Power BI desktop data model. For example, when you migrated from Analysis Services to Power BI Premium, or in case you have to deal with large datasets and it is directly developed using Visual Studio, Tabular Editor or any other tool of your preference and deployed over the XMLA endpoint. Similar setup could be that you once enriched your data model using Tabular Editor or ALM Toolkit, which resulted in the fact that your Power BI Desktop file, is no longer your golden version of your data model.

Another scenario could be gaining an overview of partitioning when using incremental refresh. The partitions of Incremental Refresh are only generated in the Power BI Service. So, including this information in your generated documentation is only possible when you connect directly to the Power BI Service.

But what if you still want to show a complete view of your Power BI data model, and extract insights using the Power BI Model Documenter? I can tell you; it is possible!

Read on to see what you can do in that case.

Comments closed

Making a Newsletter Template in R

Benjamin Smith’s ideas are intriguing to me and I wish to subscribe to his newsletter:

Jinja is a powerful templating engine that is useful in a variety of contexts. Recently, I discovered how its possible to use the power of Jinja syntax in R with the jinjar package written by David C Hall. With jinjar and the tidyRSS package by Robert Myles it is possible to make an email template that can provide short and informative updates. In his blog, I’m going to share how the jinjar and tidyRSS packages work and show how to combine them to make a simple daily email newsletter.

Read on to learn how.

Comments closed

MySQL Database Backups with mydumper

Lukas Vileikis continues a series on MySQL backup options:

There are many tools we can use to back up our MySQL databases. Some are well-known and used by the best technology companies out there (mysqldump comes to mind), and some are a little less famous, but still have their place in the MySQL world. Enter mydumper – the tool is built by the engineering team over at Percona and it‘s supposedly created to address performance issues caused by mysqldump.

Read on to see what it is and how it works.

Comments closed