Press "Enter" to skip to content

Curated SQL Posts

Azure SQL Database Maintenance

Jeffrey Verheul mentions that Azure SQL Database databases need regular maintenance, too:

Before I’m going into detail, I want to give full kudos to Ola Hallengren (Website | @olahallengren). He has spend a lot of his time to build a SQL Server Maintenance Solution that is completely free for everyone to use. And he did such an excellent job a lot companies (also huge companies) use his solution to run maintenance tasks on their databases.

None of the scripts below are written by me, but only small changes are made in order to make things more clear when the solution is deployed to an environment. The original scripts can be downloaded via the download page on Ola’s website.

Most of the to-dos are the same between on-premises and Azure SQL DB, but some of the implementation steps are a bit different.  This is worth checking out if you have any Azure SQL Database instances.

Comments closed

Visualizations In Five Lines Of Code

David Smith highlights a Sharon Machlis article showing visualizations in up to five lines of R code:

I’ve reproduced Sharon’s code and charts below. I did make a couple of tweaks to the code, though. I added a call to checkpoint(“2016-08-22”) which, if you’ve saved the code to a file, will install all the necessary packages for you. (I also verified that the code runs with package versions as of today’s date, and if you’re trying out this code at a later time it will continue to do so, thanks to checkpoint.) I also modified the data download code to make it work more easily on Windows. Here are the charts and code

It’s really easy to get basic visualizations within R, and these are better than basic visualizations.

Comments closed

Multi-Database Schema Comparison

Kevin Hill shows how to perform schema comparison across multiple databases concurrently:

I recently had the need to compare a “Gold” or “Master” copy of our database to the 300 client copies of that database that exist in our Production environment.  I’m not alone…many fellow DBAs have had the same need.   Google searches for this confirm it.   This is for an upcoming upgrade to the application that will need post-upgrade comparison/verification.

There are 3rd party tools that do SQL Compares…my particular favorite is aptly named SQL Compare from Red Gate.  I’ve been using it off an on for 10 years.   I don’t know if it can be set up to hit more than one database at a time.  The other issue is that I don’t have a copy here.

Microsoft’s SQL Server Data Tools will also do this within Visual Studio.   Still one database at a time.  I forget where, but someone pointed me to the fact that SSDT uses SQLPackage.exe under the hood to do the work.  I figure if I can run it at a command line I can script out all of the databases.  I’m not much of a DOS scripting guy, so everything that follows is just my hack version…but it works, and not just on my machine!

This is very useful if you work in an environment with multiple copies of databases—or even if you have dev, test, and prod versions of the same database.

Comments closed

Reprocessing Kafka Stream Data

Matthias J Sax shows how to reprocess input data using Kafka Streams:

In this blog post we describe how to tell a Kafka Streams application to reprocess its input data from scratch. This is actually a very common situation when you are implementing stream processing applications in practice, and it might be required for a number of reasons, including but not limited to: during development and testing, when addressing bugs in production, when doing A/B testing of algorithms and campaigns, when giving demos to customers or internal stakeholders, and so on.

The quick answer is you can do this either manually (cumbersome and error-prone) or you can use the new application reset tool for Kafka Streams, which is an easy-to-use solution for the problem. The application reset tool is available starting with upcoming Confluent Platform 3.0.1 and Apache Kafka 0.10.0.1.

In the first part of this post we explain how to use the new application reset tool. In the second part we discuss what is required for a proper (manual) reset of a Kafka Streams application. This parts includes a deep dive into relevant Kafka Streams internals, namely internal topics, operator state, and offset commits. As you will see, these details make manually resetting an application a bit complex, hence the motivation to create an easy to use application reset tool.

Being able to reprocess streams is a critical part of the Kappa architecture, and this article is a nice overview of how to do that if you’re using Kafka Streams.

Comments closed

Rules And Exceptions

Shane O’Neill had to change a mental process regarding granting permissions:

With CREATE permissions this isn’t the case; there is a piece of the above template that isn’t needed, and it’s quite easy to see why when I sat down and thought about it.

Specifically, it’s this bit:

<On What>

I’m granting CREATE permissions; since I haven’t created anything, I can’t grant the permission on anything.

I like this post for the direct reason (granting certain permissions doesn’t require specifying an object), but for the implicit point as well:  we build up internal systems of rules and processes as we act on things.  This inductive reasoning tends to work well for us in most scenarios, but at some point, our systems break down and we find out either that we need to incorporate edge cases into our system, or that we were actually focusing on an edge case the entire time.

Comments closed

Getting Started With Azure SQL Data Warehouse

Warner Chaves looks at Azure SQL Data Warehouse:

The first thing to keep in mind is that ASDW was designed to be a cloud based system. As such, it aims to be very flexible for resource allocation and very efficient to scale up or down. To meet those goals, the system allows you too:

  • Increase or decrease compute power represented by Data Warehousing Units.

  • The amount of storage can grow and is charged independently from the compute power.

  • The compute power can be completely paused and only storage is payed at that point.

Warner also has a brand new Pluralsight course on the topic.

Comments closed

Hadoop: DAS Or NAS?

Jagdish Mirani asks whether you should prefer Direct Attached Storage (DAS) or Network Attached Storage (NAS) for your Hadoop cluster:

If you want to spin up an Apache Hadoop cluster, you need to grapple with the question of how to attach your disks. Historically, this decision has favored direct attached storage (DAS). This approach is in keeping with the fundamental Hadoop principle of moving processing to a where the data lives, thereby taking advantage of disk locality to optimize performance. Disk locality is so core to Hadoop that virtually any description of Hadoop starts with this.

The alternative is to use network attached storage (NAS). In contrast to DAS, NAS separates the compute and storage layers so that storage can be shared across a number of servers by shipping data over the network. Historically, this heavy dependence on the network made NAS an order of magnitude slower. Remember, the state of the art was 1GbE networks, and switches were slower and more expensive. I/O requirements for demanding Hadoop-based applications could only be met by DAS.

This is a very interesting discussion.  In my limited experience, I’ve had trouble selling operations teams on DAS, given the increased ops effort required to keep a bunch of attached disks going.  Hat tip Ari Amster.

Comments closed

VMware Configuration Reports

Allen McGuire has a few Reporting Services reports that he created against vCenter Database:

So you are a DBA and you are in a virtual environment – VMware in particular.  You are curious to know the health of the VMware hosts in terms of CPU and RAM, but you really don’t know how to get the data you need and you’re not certain if the information you are asking for is entirely accurate.  Well, chances are you have access to the VMware databases themselves – if that is the case, you can create these reports based on a blog post from Jonathan Kehayias: “Querying the VMware vCenter Database (VCDB) for Performance and Configuration Information“.

I have created five reports that are based on Jonathan’s queries and you can download the RDL for the SSRS reports below – enjoy!

Click through for the reports.

Comments closed

Calling Azure ML Web Services Using Data Factory

Ginger Grant shows how to call an Azure Machine Learning web service from within Azure Data Factory:

The Linked Service for ML is going to need some information from the Web Service, the URL and the API key. Chances are neither of these have been committed to memory, instead open up Azure ML, go to Web Service and copy them. For the URL, look under the API Help Pagegrid, there are two options, Request/Response and Batch Execution. Clicking on Batch Execution loads a new page Batch Execution API Document. The URL can be found under Request URI. When copying the URL, you do not need to include any text after the word “jobs”. The rest of the URL, “?api-version=2.0”. Copying the entire URL will cause an error. Going back to the web Services page, The API Key appears on the dashboard section of Azure ML and there is a convenient button for copying it. Using these two pieces of information, it is now possible to create the Data Factory Linked Service to make the connection to the web service, which here I called AzureMLLinkedService

Read the whole thing.

Comments closed