S3Guard

Mingliang Liu and Rajesh Balamohan explain why you shouldn’t use S3 as your primary Hadoop data store, as well as a tool which helps mitigate those problems:

Some of the real world use cases which can be impacted due to the S3 eventual consistency model are:

  1. Listing Files. Newly created files might not be visible for data processing. In Hive, Spark and MapReduce, this can lead to erroneous results from incomplete source data or failure to commit all intermediate results.

  2. ETL Workflow. Systems like Oozie rely on marker files to trigger the subsequent workflows. Any delay in the visibility of these files can lead to delays in the subsequent workflows.

  3. Existence-guarded path operations. Any action which fails if the destination path is present may see a deleted file in a listing, and so fail — even though the file has already been deleted.

Read on to see how S3Guard works and how to enable it in HDP 2.6.

Finding Last DBCC Command Runs

Andrew Kelly has a script to find the last time somebody ran a DBCC command like DBCC FREEPROCCACHE:

Let me explain a few things about the script. I am getting the path of the current trace file and placing it into a variable. The current file name will almost certainly have a suffix of _nn just before the .trc extension.  If I were to run the script as is I would only be reading the current log file and not the other 4 that preceded it. If all you care about is the current log file then fine but most will want to search all the existing log files. One way to do this is to simply replace the current file name with just log.trc and use default as the 2nd parameter as I did above in the fn_trace_gettable function. The default parameter value tells the function to read all files from that one onward. even though log.trc doesn’t actually exist it knows how to handle it and reads all of the existing trace files in order.

So if the string that we search on (here we use ‘dbcc free%’) is in any of the files it will return the matching rows. You may have to adjust the wildcards and such but I think you get the idea. Again remember that the data is transient so always look at the StartTime column in the logs to ensure you know which Date and Time range you are looking at. You can do something like this but I will leave that up to you.

SELECT MIN(StartTime) AS [Begin], MAX(StartTime) AS [End]  FROM ::fn_trace_gettable(@Path,default)

A word of caution in that I never bothered to see just how resource intensive this function is. while I don’t expect any issues with normal use it is not something you want to be searching on every second. Be sensible and you should have no problems.

Click through for more details, including the script Andy uses to do this search.

What You Need To Know About DTC

Allan Hirt gives some important information regarding the Distributed Transaction Coordinator:

What exactly is a distributed transaction? It’s one where the work needs to be completed in more than one database so data is kept in sync everywhere. For example, if you need to update data in Database A and in Database B, and they need to be kept in sync, that’s a distributed transaction. Database A and Database B can be in the same SQL Server instance, or they could be in different instances … or even in different data sources, such as Oracle or DB2. This whole shebang is often referred to as a cross-database transaction. DTC is based on the principle of a two phase commit – for the whole thing to get done, all the little bits need to be committed everywhere before claiming complete success. Otherwise stuff needs to be rolled back so that things stay in sync and all is right in the world.

Read on for Allan’s thoughts and guidance.

Using Event Notifications To E-Mail Deadlock Graphs

Dave Mason captures details whenever a deadlock occurs and uses Event Notifications to e-mail them to himself:

As noted, there are other ways to handle deadlocks in SQL Server. The approach presented here may have some drawbacks compared to others. There is an authorization issue for msdb.dbo.sp_send_dbmail that will need to be addressed for logins without elevated permissions. Additionally, you might get hit with an unexpected deluge of emails. (The first time I got deadlock alerts, there were more than 500 of them waiting for me in my Inbox.) Lastly, there’s the XML issue: it’s not everyone’s cup of tea. On the plus side, I really like the proactive nature: an event occurs, I get an email. I think most would agree it’s better to know something (bad) happened before the customers start calling. The automated generation of Deadlock Graph (*.xdl) files is convenient. And event notifications have been available since SQL Server 2005. As far as I know, the feature is available in all editions, including Express Edition.

Click through for all of the code Dave used to set this up.

EXISTS Is Self-Contained

Shane O’Neill ponders an existential problem:

So, drinking my first (of many) coffee of the day, I asked him what was wrong with it.

I have two tables. 1 with values 1,2,3 & the other with values 1,2,3,4,5. When I use delete exists, it should just delete 1,2,3 but table1 is always empty.

Hmmm, not an unreasonable assumption I suppose so I asked him for his code.

Read on for Shane’s explanation, though he doesn’t like the verbosity.  My version is, what happens in EXISTS stays in EXISTS.  It just returns a signal to the outer query saying yea or nay and the outer query does its thing accordingly.  In this case, if you want to tie results back to the delete operation, use IN (the ANSI standard way) or JOIN (typically my preferred way, given that IN can get dicey with more complex criteria).

Generating Database Restore Scripts

David Fowler announces sp_RestoreScript:

I’m sure we’ve all been there, we need to restore a database to 11:34am, four days ago. What’s the first step?  To go off hunting down the relevant full and differential backups and then figure out what transaction logs we need to play in and in what order.  Well here at SQL Undercover we’ve got the solution to all your restoration woes, the latest addition to the Undercover Toolbox, sp_RestoreScript.

sp_RestoreScript will do all the hard work for you, it’ll hunt down all the backup files that you need to restore your database to a specific time and present you with a nice, simple script to run based on the parameters that you give it.

Click through for the script, which includes great ASCII art.

Creating A Powershell Module

Rob Sewell has a two-parter.  First, he looks at the SQL Server Diagnostics API:

The Diagnostic Analysis API allows you to upload memory dumps to be able to debug and self-resolve memory dump issues from their SQL Server instances and receive recommended Knowledge Base (KB) article(s) from Microsoft, which may be applicable for the fix.

There is also the Recommendations API to view the latest Cumulative Updates (CU) and the underlying hotfixes addressed in the CU which can be filtered by product version or by feature area (e.g. Always On, Backup/Restore, Column Store, etc).

I have written a module to work with this API. It is not complete. It only has one command as of now but I can see lots of possibilities for improvement and further commands to interact with the API fully and enable SQL Server professionals to use PowerShell for this.

This alone is quite interesting.  But then Rob shows how to turn this into a module, complete with tests:

I have been asked a few times what the process is for creating a module, using Github and developing with Pester and whilst this is not a comprehensive how-to I hope it will give some food for thought when you decide to write a PowerShell module or start using Pester for code development. I also hope it will encourage you to give it a try and to blog about your experience.

This is my experience from nothing to a module with a function using Test Driven Development with Pester. There are some details missing in some places but if something doesn’t make sense then ask a question. If something is incorrect then point it out. I plan on never stopping learning!

There are many links to further reading and I urge you to not only read the posts linked but also to read further and deeper. That’s a generic point for anyone in the IT field and not specific to PowerShell. Never stop learning. Also, say thank you to those that have taken their time to write content that you find useful. They will really appreciate that.

If you’re interested in developing professional-grade Powershell modules, this is a great starting point.

sp_AllNightLog

Brent Ozar announces an open source project to perform log shipping at scale:

On the primary server, multiple Agent jobs take backups simultaneously across lots of databases using Ola Hallengren’s proven DatabaseBackup proc. You control the backup frequency and the number of Agent jobs to balance recoverability against performance overhead.

On the restoring server, same thing, but with sp_DatabaseRestore – multiple Agent jobs watch folders for incoming files, and restore ’em. Just like with regular log shipping, you can control the job schedules, so you can use the servers for read-only reporting workloads during the day, then catch ’em back up to current overnight.

You can install the restore side of it on multiple servers, in multiple locations, too.

This looks like a very interesting project for automating DR solutions.

Open Query Store

William Durkin announces Open Query Store, a Query Store-like solution for pre-2016 SQL Server instances:

The first release was published at the end of June 2017 and provides a background collection of query execution statistics in a database of your choice. The v1.0 release supports SQL Server from 2008 to 2014 and all editions (Express, Standard, Enterprise). There is also a PowerShell installer for those that are so inclined, which will grab the code from GitHub and install OQS into a specified database.

There is also a custom report which can be installed in SSMS (2016 and 2017), which will display information that OQS has collected.

This is a really cool community solution if you’re stuck on an older version of SQL Server for a while.

Bar Plot Alternatives

Alboukadel Kassambara shows off a couple alternatives to bar charts:

Cleveland’s dot plot

Color y text by groups. Use y.text.col = TRUE.

ggdotchart(dfm, x = "name", y = "mpg", color = "cyl", # Color by groups palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette sorting = "descending", # Sort value in descending order rotate = TRUE, # Rotate vertically dot.size = 2, # Large dot size y.text.col = TRUE, # Color y text by groups ggtheme = theme_pubr() # ggplot2 theme )+ theme_cleveland() # Add dashed grids

I like the lollipop chart example.

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930