Press "Enter" to skip to content

Author: Kevin Feasel

Using Azure Storage Explorer

Arun Sirpal takes us through Azure Storage Explorer:

I only ever use the storage explorer when managing my blobs, files, queues within storage accounts. It is your single view access point for all your storage needs and I totally recommend downloading it and using it (https://azure.microsoft.com/en-gb/features/storage-explorer/).

Why do I like using it? I am sure there are more reasons, but these are personal to me.

Click through for Arun’s reasons as well as installation basics.

Comments closed

Dealing with Thousands of Databases

Andy Levy has some Q&A about dealing with large numbers of databases on a single server. Part one:

What was the most difficult challenge faced initially with a large environment and how does that challenge relate to now?

For me personally, it was just getting a handle on how to deal with this many databases because I didn’t “grow up” with the system. I walked into an environment with a lot of established tools and procedures for performing tasks and had to learn how those all fit together while also not breaking anything. You don’t want to be the person who walks in the door, says “why are you doing things like this, you should be doing it this other way” and then falls victim to hubris. If something seems unusual, there’s probably a reason for that and you need to understand the “why” before trying to change anything.

Part 2 is also up:

How large is the team that manages the databases? Is the knowledge shared and everyone can work on everything or do these people fill niches?

There are two of us. We each have a few specialties but we aren’t “territorial” and we try to share as much as possible. If we aren’t both directly involved in a given project, we keep each other in the loop as it progresses.

Stay tuned for part 3.

Comments closed

PolyBase and Dockerized Hadoop

I have a solution to a problem which vexed me for quite some time:

Quite some time ago, I posted about PolyBase and the Hortonworks Data Platform 2.5 (and later) sandbox.

The summary of the problem is that data nodes in HDP 2.5 and later are on a Docker private network. For most cases, this works fine, but PolyBase expects publicly accessible data nodes by default—one of its performance enhancements with Hadoop was to have PolyBase scale-out group members interact directly with the Hadoop data nodes rather than having everything go through the NameNode and PolyBase control node.

Click through for the solution.

Comments closed

Null Checks in Spark DataFrames

Bipin Patwardhan gives us four techniques for validating whether data in Spark exists:

The task at hand was pretty simple — we wanted to create a flexible and reusable library of classes that would make the task of data validation (over Spark DataFrames) a breeze. In this article, I will cover a couple of techniques/idioms used for data validation. In particular, I am using the null check (are the contents of a column ‘null’). In order to keep things simple, I will be assuming that the data to be validated has been loaded into a Spark DataFrame named “df.”

Click through for those techniques.

Comments closed

strace and SQL Server Containers

Anthony Nocentino tries using strace to diagnose SQL Server process activity in a container:

We’re attaching to an already running docker container running SQL. But what we get is an idle SQL Server process this is great if we have a running workload we want to analyze but my goal for all of this is to see how SQL Server starts up and this isn’t going to cut it.
 
My next attempt was to stop the sql19 container and quickly start the strace container but the strace container still missed events at the startup of the sql19 container. So I needed a better way.

Don’t worry—Anthony finds a better way.

Comments closed

dbatools: the Book

Chrissy LeMaire has an exciting announcement:

After nearly 10 months of work, early access to Learn dbatools in a Month of Lunches is now available from our favorite publisher, Manning Publications!

For years, people have asked if any dbatools books are available and the answer now can finally be yes, mostly. Learn dbatools in a Month of Lunches, written by me and Rob Sewell (the DBA with the beard), is now available for purchase, even as we’re still writing it. And as of today, you can even use the code bldbatools50 to get a whopping 50% off.

They’re in active book development, so buy a copy now and watch as the book evolves.

Comments closed

Backup to URL in Azure

Joey D’Antoni recommends backing up database to Blob Storage via URL in Azure:

Unlike in your on-premises environment, where you might have up to a 32 Gbps fibre channel connection to your storage array and then a separate 10 Gbps connection to the file share where you write your SQL Server backups, in the cloud you have a single connection to both storage and the rest of the network. That single connection is metered and correlates to the size (and $$$) of your VM. So bandwidth is somewhat sacred, since backups and normal storage traffic go over the same limited tunnel. This doesn’t mean you can’t have good storage performance, it just means you have to think about things. In the case of the customer I mentioned, they were saturating their network pipe, by writing backups to the file system, and then having the Azure backup service backup their VM, they were saturating their pipe and making regular SQL Server I/Os wait.

Read on to see what the alternative is.

Comments closed

Exploratory Data Analysis with ExPanDaR

Joachim Gassen walks us through the ExPanDaR package in R:

The ‘ExPanDaR’ package offers a toolbox for interactive exploratory data analysis (EDA). You can read more about it here. The ‘ExPanD’ shiny app allows you to customize your analysis to some extent but often you might want to continue and extend your analysis with additional models and visualizations that are not part of the ‘ExPanDaR’ package.

Thus, I am currently developing an option to export the ‘ExPanD’ data and analysis to an R Notebook. While it is not ready for CRAN yet, it seems to work reasonably well and I would love to see some people trying it and letting me know about any bugs or other issues that they encounter. Hence, this blog post.

Looks like an interesting package. H/T R-bloggers

Comments closed

Paired RDDs in Spark

Ramandeep Kaur explains how Paired Resilient Distributed Datasets (PairRDDs) differ from regular RDDs:

So, assuming that you have a fair idea about what Spark is and the basics of RDDs. Paired RDD is one of the kinds of RDDs. These RDDs contain the key/value pairs of data. Pair RDDs are a useful building block in many programs, as they expose operations that allow you to act on each key in parallel or regroup data across the network. For example, pair RDDs have a reduceByKey() method that can aggregate data separately for each key, and a join() method that can merge two RDDs together by grouping elements with the same key.

When datasets are described in terms of key/value pairs, it is common to want to aggregate statistics across all elements with the same key.

Paired RDDs bring us back to that key-value pair paradigm which Hadoop’s version of MapReduce brought to the forefront.

Comments closed