Press "Enter" to skip to content

Author: Kevin Feasel

Getting Ahead Of Corruption

Mike Walsh has some recommendations before you have corrupt databases:

So. Always. Always. ALWAYS choose to be proactive and prepared. Don’t wait for corruption to catch you! When we do our SQL Server health assessments, seeing the findings that together mean you aren’t prepared for corruption is a huge red flag. Partially it is because as a consultant, I end up seeing corruption a lot – and it is always “after the fact” and usually from clients who either chose or, more likely didn’t realize they were choosing, the option with less preparation.

So this post won’t really talk about recovering from corruption. It will focus on prevention and preparedness. A follow on post will talk about some initial steps to do if you get a report of corruption.

If you already know how you’ll solve the problem (and ideally, have a step-by-step runbook so you don’t miss anything), corruption is more of an annoyance than a catastrophe.

Comments closed

Restoring A Database To A Different Location

Mike Fal shows how to restore a database to a different location using Powershell:

The cmdlet is straightforward in its use. Fundamentally, all we need to declare is an instance, database name, and backup file. However, if we don’t declare anything else, the cmdlet will try and restore the database files to their original locations. Keep in mind this is no different than how a normal RESTORE DATABASE command works.

This is where we make our lives easier with PowerShell. First off, to move files using Restore-SqlDatabase, we need to create a collection of RelocateFile objects. Don’t let the .Net-ness of this freak you out. All we’re doing is creating something that has the logical file name and the new physical file name. In other words, it’s just an abstraction of the MOVE statement in RESTORE DATABASE.

Read the whole thing.

Comments closed

Backing Up To Azure Storage

Neil Gelder shows how to back up directly to Azure blob storage:

The URL is the one from the container we made a note of and the credential is the one we created in the last step.

Now if we return to the container screen in the Azure Console and refresh the screen you’ll see your backup file like below

My personal preference here would be to back up locally and then have a job migrate backups to Azure or S3.  That storage is 1-3 cents per GB per month (and even cheaper if you’re willing to store the data in Glacier), so for more small to mid-sized organizations running databases in the tens of gigs, it’s a great way of getting around only being able to store a week or two worth of backups on-site.

Comments closed

Understanding DATEADD And DATEDIFF

Matan Yungman and Guy Glantser take a hack at DATEDIFF versus DATEADD for date calculations.  First up is Matan:

Pretty simple right?

Well, it is, and since this problem is pretty common, I used this solution in many performance tuning sessions I performed over the years.

There’s a slight problem though: This solution isn’t 100% accurate.

When carefully looking at the results, I find out that for the first query, I get 5859 rows, and for the second query, I get 5988 rows. Where does this difference come from?

Then, Guy gives his take on the problem:

I tested both queries on a sample table, which has millions of rows, and only around 500 rows in the last 90 days. The first query produced a table scan, while the second query produced an index seek. Of course, the execution time of the second query was much lower than the first query.

Both queries were supposed to return the orders in the last 90 days, but the first query returned 523 rows, and the second query returned 497 rows. So what’s going on?

The answer has to do with the way DATEDIFF works. This function returns the number of date parts (days, years, seconds, etc.) between two date & time values. It does that by first rounding down each one of the date & time values to the nearest date part value, and then counting the number of date parts between them.

They both start from the same base problem, but end up with slightly different formulations of a solution.

Comments closed

Structured Streaming With Spark

Tathagata Das, et al, discuss enterprise-grade streaming of structured data using Spark:

Fortunately, Structured Streaming makes it easy to convert these periodic batch jobs to a real-time data pipeline. Streaming jobs are expressed using the same APIs as batch data. Additionally, the engine provides the same fault-tolerance and data consistency guarantees as periodic batch jobs, while providing much lower end-to-end latency.

In the rest of post, we dive into the details of how we transform AWS CloudTrail audit logs into an efficient, partitioned, parquet data warehouse. AWS CloudTrail allows us to track all actions performed in a variety of AWS accounts, by delivering gzipped JSON logs files to a S3 bucket. These files enable a variety of business and mission critical intelligence, such as cost attribution and security monitoring. However, in their original form, they are very costly to query, even with the capabilities of Apache Spark. To enable rapid insight, we run a Continuous Application that transforms the raw JSON logs files into an optimized Parquet table. Let’s dive in and look at how to write this pipeline. If you want to see the full code, here are the Scala and Python notebooks. Import them into Databricks and run them yourselves.

This introductory post discusses some of the architecture and setup, and they promise additional posts getting into finer details.

Comments closed

Searching For A Query

Kendra Little shows how to search the plan cache and query store for a particular query:

If I’m looking in SQL Server’s Execution Plan Cache, I like to use the sys.dm_exec_text_query_plan dynamic management view. This stores those XML query plans as text.

I learned about using this DMV from Grant Fritchey in his post, “Querying the Plan Cache, Simplified.” Grant points out that while doing wildcard searches in the text version of a query plan isn’t fast, querying it as XML is often even slower.

This is definitely worth a read.

Comments closed

Thinking About Availability Group Outages

Brent Ozar reminds us to think about graceful degradation of applications:

There’s a gray bar across the top that says, “This site is currently in read-only mode; we’ll return with full functionality soon.”

That’s not a hidden feature of Always On Availability Groups. Rather, it’s a hidden feature of really dedicated developers whose application:

This is where a bit of foresight and hard work can really pay off.  Read the whole thing.

Comments closed

Beginning With Amazon Athena

Jen Underwood looks at the basics behind Amazon Athena:

Today early adopters of Amazon Athena are using it for big data analytics pipeline projects along with Kinesis streaming data and other Amazon data sources.

Athena is serverless parallel query pay-per-use service. There is no infrastructure to set up or manage. It scales automatically and can handle large datasets or complex distributed queries.

The easy way of thinking about Athena is that it’s ElasticMapReduce (a pay-as-you-go Hadoop cluster) without the ceremony of administering or spinning up the cluster.

Comments closed

R Visuals In Power BI

Ryan Wade ties ggplot2 visuals into Power BI:

The package that we are going to use to develop our custom visualization is ggplot2. The ggplot2 package is arguably the most popular data visualization package in R. It is based on the “grammar of graphics” concept that was created by the statistician, Leland Wilkinson. The ggplot2 package allows you to approach creating charts and graphs in the same manner that Bob Ross approached painting trees in the forest. With ggplot2 you are able to start with a blank canvas and add layers upon layers via short code snippets that builds on each other until you end up with the desired visualization.

The pbix file that is being used in this blog can be found here: http://bit.ly/2jwoCyP. The GentleIntroToR_ChartExample.pbix file contains an example of using R to create a box plot chart that shows the distribution of player scores for the L.A. Lakers. Chiclet slicers were added that allows you to filter by division and/or opponent. The R visualization was created in four steps.

Check out the PBIX file.

Comments closed