Press "Enter" to skip to content

Author: Kevin Feasel

Restoring a Database in an Availability Group

Rajendra Gupta walks us through the process of restoring a database which is currently in an Availability Group:

You might think a question here– We can take production database backup and restore it on the development database. What difference does it make in a standalone database restore or availability group database restore?

Database restore works with the standalone database, but if the database is configured in the availability group, we cannot directly restore the database. It requires additional steps because of the AG configurations. Our database should be in the same state (AG synchronized) after the database restores as well.

In this article, let’s cover the steps to restore an existing availability group database in the SQL Server Always On Availability Group.

Read on for the answer.

Comments closed

Generating Scripts in SSMS and ADS

Rob Farley walks us through automation options in SQL Server Management Studio and Azure Data Studio:

User interfaces are great, but I simply don’t want to have to remember to do everything the same way each time.

To that end, I want to wax lyrical for a moment about the Script button on most dialog boxes in SQL Server Management Studio (SSMS), and make a quick mention of what things (currently) look like in Azure Data Studio (ADS). (I say “currently” because ADS is still under very active development and could have changed even by the time I publish this.)

The Script button is one of the best features of Management Studio. The UI lets you know what can be done, and the Script button lets you know the process. It’s also a nice reminder that this functionality is not unique to the UI. Management Studio’s implementation of the Script button isn’t perfect, but it is good.

Comments closed

Filtering with dplyr Using Strings as Expressions

Kenneth Tay shows how to build arbitrary expressions to execute in dplyr’s filter function:

This took me a while to figure out and so I thought I would post this as future reference. Let’s say I have the mtcars data and I want to filter for just the rows with cyl == 6. I would do something like this:

library(tidyverse)
data(mtcars)
mtcars %>% filter(cyl == 6)

What if I had the filter condition as a string instead?

Read on to see how you can do this. Given the answer, I wonder if there’s any chance that could be turned into an injection vulnerability. H/T R-Bloggers

Comments closed

The State of R Packages Locally

Maelle Salmon and Gabor Csardi walk us through some details about installed packages on a system:

Now how do you know where any of your installed packages was installed? You can use find.package() and path.package()!

To check whether a package is installed, it is better to use find.package() than installed.packages() because the latter, as its docs state, can be slow on some systems. In both cases, it does not mean the package is usable, for that you’d need to use library() or require().

Read on for several tips around where packages are located, what their contents look like, and learning a bit more about the actual code in packages.

Comments closed

Triggers Should Handle Multi-Row Scenarios

Kenneth Fisher has a public service announcement:

While I was helping him with part of it I noticed that the trigger was only set up to handle single row updates. When I told him it needed to be able to handle multiple row updates he asked me in all seriousness, “Why? Since only one value can be changed at a time why should I worry about multiple rows being changed at once?”

Read on for Kenneth’s response, which approximates to Au contraire mon frère. Unhandled assumptions often become bugs.

Comments closed

Messy Code and Reasonable Expectations

Rachel by the Bay has a doozy of a story:

One day not so long ago, I was in a meeting listening to a team explain why their service had gone down and taken out a big chunk of a business. They were one of those things that has to exist and work in order for the actual “thing that makes money” to go. Think of delivering pizzas, connecting dog walkers with dogs who need to be walked, that kind of thing.

It turned out they had been crashing every time a request came through for a certain part of the country. That is, not all pizzas, dog walkers, or whatever it was were handled identically, so they had their own city or region configurations. Think of differences in pricing, taxes, features, or whatever. Trying to process a request for this one particular region had caused the entire process to die when it hit a new config that was “bad” somehow.

Read on for the story. This sounds like a boundary issue. Boundaries are messy and need thorough examination to handle as many possible points of failure as is reasonable. Taking seriously the point that it makes the code messy, the answer is not “Don’t do the checks,” but rather “Put the checks in a place where their messiness has a minimal impact on the rest of my beautiful code but still does the important work we need them to do.” Failing that, live with the mess and have a working process.

Comments closed

Building dacpac Files on Non-Windows Machines

Erik Ejlskov Jensen provides another advantage for Azure SQL Database’s database projects:

For a while now, it has been possible to publish a .dacpac file (meaning apply it to an new or existing database) using the cross-platform version of sqlpackage.

But authoring and building a database project (sqlproj) was only possible on Windows, as the .sqlproj project type is based on the classic .NET Framework .csproj project type.

Now, thanks to the new Database Project extension in Azure Data Studio Insiders build, it is now possible to author, build and manually publish a SQL Server Database project.

And by using the new MsBuild.Sdk.SqlProj SDK and project type, is is also possible to build and publish a Database Project from a build agent (CI pipeline), without having to install the sqlpackage tool. Read on!

You heard Erik.

Comments closed

Data Lineage and SSIS

Aveek Das has a two-parter. First up is a discussion of data lineage:

In this article, I am going to explain what Data Lineage in ETL is and how to implement the same. In this modern world, where companies are dealing with a humongous amount of data every day, there also lies a challenge to efficiently manage and monitor this data. There are systems that generate data every second and are being processed to a final reporting or monitoring tool for analysis. In order to process this data, we use a variety of ETL tools, which in turn makes the data transformation possible in a managed way.

While transforming the data in the ETL pipeline, it has to go through multiple steps of transformations in order to achieve the final result. For example, when the ETL receives the raw data from the source, there may be operations applied to it like filtering, sorting, merging, or splitting two columns, etc. There can also be aggregations or other calculations made on this raw data before finally moving into a data warehouse or preparing it for reporting. In order to be able to detect what the source of a particular record is, we need to implement something known as Data Lineage. It is a piece of simple metadata information that helps us detect gaps in the data processing pipeline and enables us to fix issues later.

Part two covers data lineage with SQL Server Integration Services:

In this article, I am going to discuss SSIS data lineage concepts, which are often used while designing ETL workloads on a data warehouse. Although this article is focused on implementing data lineage using SSIS, it does not only confine to SSIS but to any ETL tools in the market using which data is moved from one source to a destination. In my previous article, Understanding Data Lineage in ETL, I have already discussed the generic importance of data lineage concepts for any ETL tool. I would definitely suggest you have a look at it if you want to understand in general how data lineage helps to track the source of a single record in the warehouse.

If you’re fairly new to this world, it’s a good introduction to an important topic.

Comments closed

Join Operations in Spark

Swantika Gupta compares hash and merge join operations in Apache Spark:

One of the most frequently used transformations in Apache Spark is Join operation. Joins in Apache Spark allow the developer to combine two or more data frames based on certain (sortable) keys. The syntax for writing a join operation is simple but some times what goes on behind the curtain is lost. Internally, for Joins Apache Spark proposes a couple of Algorithms and then chooses one of them. Not knowing what these internal algorithms are, and which one does spark choose might make a simple Join operation expensive.

While opting for a Join Algorithm, Spark looks at the size of the data frames involved. It considers the Join type and condition specified, and hint (if any) to finally decide upon the algorithm to use. In most of the cases, Sort Merge join and Shuffle Hash join are the two major power horses that drive the Spark SQL joins. But if spark finds the size of one of the data frames less than a certain threshold, Spark puts up Broadcast Join as it’s top contender.

Click through for the comparison, though do note that this comparison doesn’t include nested loop joins, which are possible in Spark as well.

Comments closed

Automatic Retention Periods with Power Query

Gilbert Quevauvilliers has an easy method of constraining data sizes in Power Query:

How cool would it be to not have to manually update your dataset to keep data for last 2 Years (last year and this year to the current date)?

In this blog post I will show you how you can easily filter dates in Power Query to show dates for last year and year to date by using the GUI and not having to hardcode anything.

I must admit the first time I saw this was when I watching my good friend Reid Havens presenting. Thanks Reid!

Read on for the technique.

Comments closed