Tests Are Hard

Cody Konior explains how testing is harder than it seems in the demos:

We should have 4 backup jobs

Definitely 4, not 3, and 5 is right out.

Unless of course someone builds a 500GB database with SQL CLR UDTs which slow down DBCC to a crawl, so you move the backups and maintenance for that database into their own jobs or steps with different flags, and possibly start staggering some stuff so it finishes on the weekend without dripping into Monday.

Then you’ve got more. But just kind of sweep them under the carpet. We have file system backup checks so that covers most of it…

These are just administrative test examples; when you have to start testing queries and procedures, it’s a whole new level of fun, as it feels like you’re building a castle on sand.

Elasticsearch Write Operations

Kunal Kapoor has a presentation on Elasticsearch write operations (inserts, updates, and deletes) and explains what’s going on:

In this presentation, we are going to discuss how Elasticsearch handles the various operations like insert, update, delete. We would also cover what is an inverted index and how segment merging works.

Click through for the slides; they helped me firm up a few thoughts I had about Elasticsearch.

R Graph Gallery

David Smith points out the new R Graph Gallery:

Once upon a time, there was the original R Graph Gallery, by Romain François. Sadly, it’s been unavailable for several years. Now there’s a new R Graph Gallery to fill the void, created by Yan Holtz. It contains more than 200 data visualizations categorized by type, along with the R code that created them.

You can browse the gallery by types of chart (boxplots, maps, histograms, interactive charts, 3-D charts, etc), or search the chart descriptions. Once you’ve found a chart you like, you can admire it in the gallery (and interact with it, if possible), and also find the R code which you can adapt for your own use. Some entries even include mini-tutorials describing how the chart was made. You can even submit your own graph, if you’d like to have it displayed in the gallery as well.

Looks like a good place to go to get some inspiration.

Accidental DBAs

Charity Majors on the Accidental DBA phenomenon:

(OH RIGHT, WE WROTE A BOOK ABOUT THIS!!!)

My friend Laine and I are writing a book for people on the data side, called “Database Reliability Engineering“, which is aimed at generalist engineers who want to learn how to deal with data responsibly and effectively.

(Actually that’s a good point, I am supposed to be pitching this book! — which is really mostly Laine with a smidgen of me but it’s going to be super awesome.  Consider this your sales pitch.)

So first, as an accidental DBA, you should obviously buy this book  :).  Second: stateful services require a different mindset[*].  It’s cool that you are running your own databases!  But reading post mortems like this where the conclusion is “MongoDB sucks” makes me fucking grind my teeth.

The theme of the story is a Mongo upgrade gone south, but this is a post about principles.  And rainbows.

Extended Properties

Phil Factor has a detailed article on extended properties:

Extended properties are easy to read, thanks to a useful system view. Sys.extended_properties. However, they are a pain to create, update and delete; they rely on special stored procedures that have a syntax that isn’t at all intuitive for those of us without mutant mental powers. They have a limit of 7,500 characters but are actually stored in a SQL_variant so that DateTime, Approximate numeric, exact numeric, character, Unicode and binary information can be stored in it. Most of us use some sort of tool such as SSMS to maintain this documentation rather than to do it via SQL. The SQL is cumbersome.

Extended properties was an interesting idea but there was so little tooling available to make them really useful.  I don’t see that changing.

Bulk Load Tools

Kevin Feasel

2016-10-21

ETL

Erland Sommarskog has a brand new essay:

The bulk-load tools have been in the product for a long time and they are showing their age. When they work for you, they are powerful. But you need to understand that these tools are binary to their heart, and they have no built-in rule that says that each line a file is a record – they don’t even think in lines. You also need to understand that there are file formats they are not able to handle.

I have tried to arrange the material in this article so that if you have a simple problem, you only need to read the first two chapters after the introduction. I first introduce you them to their mindset, which is likely to be different from yours. Next I cover the basic options to use for every-day work. If you have a more complex file, you will need to use a format file and the next three chapters are for you. I first describe how format files work as such, and the next two chapters show how to use format files for common cases for import and export respectively. This is followed by a chapter about Unicode files, including files encoded in UTF‑8. Then comes a chapter about “advanced” options, including how to load explicit values into an IDENTITY column. A short chapter covers permissions. The last chapter discusses XML format files, and I am not sorry at all if you give this chapter a blind eye – I find XML format files to be of dubious value.

I haven’t had a chance to read this yet, but because I have never had good luck with bcp and BULK INSERT, it’s on my to-read list.

Deploy SQL Server R Services Without Internet Access

Arvind Shyamsundar shows how to install SQL Server R Services on a machine without internet access:

When deploying SQL Server R Services, it is important to note that the setup components for SQL Server do not include the Microsoft R Open and Microsoft R Server components. Those ‘R Components’ (as we will refer to them later in this post) are provided as separate downloadable components. SQL Server will automatically download these when executed on computer which is connected to the Internet. But in cases where setup is done on a computer without Internet access (quite typical of many SQL Server deployments) we need to handle things differently. There is a documented process for doing this. But even with the documentation, we still had some customers with questions on the process.

Inspired by those customer engagements, this blog post walks through the process of setting up SQL Server R Services in environments without Internet access. We walk through a number of scenarios, right from the very basic scenario to the more complex ones involving unattended and ‘smart setup’.

This is a nice walkthrough.  I wanted to highlight a link at the end showing how to create a local repository so you can install packages as well.

SELECT INTO

Kevin Feasel

2016-10-21

T-SQL

Daniel Janik is not a fan of SELECT INTO:

This query for AdventureWorks will dump all of its results into a table named #MyDuplicateCities. Note that there is no CREATE TABLE statement. The INTO [tablename] will create the table for you.

Running this query a second time will result in failure if you haven’t dropped the #MyDuplicateCities table.

Using this syntax can be really helpful if you just need to do some quick and dirty cleanup; however, it should be avoided for stored procedures. Here’s why…

There are some trade-offs here and good arguments either way.  The comments tend to take the pro approach, so they’re worth reading as well.

Database Snapshot Creation History

Paul Randal shows how to read the master transaction log to find when database snapshots were created:

Earlier today someone asked on the #sqlhelp Twitter alias if there is a history of database snapshot creation anywhere, apart from scouring the error logs.

There isn’t, unfortunately, but you can dig around the transaction log of the master database to find some information.

When a database snapshot is created, a bunch of entries are made in the system tables in master and they are all logged, under a transaction named DBMgr::CreateSnapshotDatabase. So that’s where we can begin looking.

Click through for the script and some explanation around it.

Max Data Types In Queries

Erik Darling shows how variable definition can be important, even without implicit conversion:

SQL Server makes many good and successful attempts at something called predicate pushdown, or predicate pushing. This is where certain filter conditions are applied directly to the data access operation. It can sometimes prevent reading all the rows in a table, depending on index structure and if you’re searching on an equality vs. a range, or something else.

What it’s really good for is limiting data movement. When rows are filtered at access time, you avoid needing to pass them all to a separate operator in order to reduce them to the rows you’re actually interested in. Fun for you! Be extra cautious of filter operators happening really late in your execution plans.

Click through for Erik’s demo.

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31