Press "Enter" to skip to content

Month: October 2016

R Graph Gallery

David Smith points out the new R Graph Gallery:

Once upon a time, there was the original R Graph Gallery, by Romain François. Sadly, it’s been unavailable for several years. Now there’s a new R Graph Gallery to fill the void, created by Yan Holtz. It contains more than 200 data visualizations categorized by type, along with the R code that created them.

You can browse the gallery by types of chart (boxplots, maps, histograms, interactive charts, 3-D charts, etc), or search the chart descriptions. Once you’ve found a chart you like, you can admire it in the gallery (and interact with it, if possible), and also find the R code which you can adapt for your own use. Some entries even include mini-tutorials describing how the chart was made. You can even submit your own graph, if you’d like to have it displayed in the gallery as well.

Looks like a good place to go to get some inspiration.

Comments closed

Accidental DBAs

Charity Majors on the Accidental DBA phenomenon:

(OH RIGHT, WE WROTE A BOOK ABOUT THIS!!!)

My friend Laine and I are writing a book for people on the data side, called “Database Reliability Engineering“, which is aimed at generalist engineers who want to learn how to deal with data responsibly and effectively.

(Actually that’s a good point, I am supposed to be pitching this book! — which is really mostly Laine with a smidgen of me but it’s going to be super awesome.  Consider this your sales pitch.)

So first, as an accidental DBA, you should obviously buy this book  :).  Second: stateful services require a different mindset[*].  It’s cool that you are running your own databases!  But reading post mortems like this where the conclusion is “MongoDB sucks” makes me fucking grind my teeth.

The theme of the story is a Mongo upgrade gone south, but this is a post about principles.  And rainbows.

Comments closed

Extended Properties

Phil Factor has a detailed article on extended properties:

Extended properties are easy to read, thanks to a useful system view. Sys.extended_properties. However, they are a pain to create, update and delete; they rely on special stored procedures that have a syntax that isn’t at all intuitive for those of us without mutant mental powers. They have a limit of 7,500 characters but are actually stored in a SQL_variant so that DateTime, Approximate numeric, exact numeric, character, Unicode and binary information can be stored in it. Most of us use some sort of tool such as SSMS to maintain this documentation rather than to do it via SQL. The SQL is cumbersome.

Extended properties was an interesting idea but there was so little tooling available to make them really useful.  I don’t see that changing.

Comments closed

Bulk Load Tools

Erland Sommarskog has a brand new essay:

The bulk-load tools have been in the product for a long time and they are showing their age. When they work for you, they are powerful. But you need to understand that these tools are binary to their heart, and they have no built-in rule that says that each line a file is a record – they don’t even think in lines. You also need to understand that there are file formats they are not able to handle.

I have tried to arrange the material in this article so that if you have a simple problem, you only need to read the first two chapters after the introduction. I first introduce you them to their mindset, which is likely to be different from yours. Next I cover the basic options to use for every-day work. If you have a more complex file, you will need to use a format file and the next three chapters are for you. I first describe how format files work as such, and the next two chapters show how to use format files for common cases for import and export respectively. This is followed by a chapter about Unicode files, including files encoded in UTF‑8. Then comes a chapter about “advanced” options, including how to load explicit values into an IDENTITY column. A short chapter covers permissions. The last chapter discusses XML format files, and I am not sorry at all if you give this chapter a blind eye – I find XML format files to be of dubious value.

I haven’t had a chance to read this yet, but because I have never had good luck with bcp and BULK INSERT, it’s on my to-read list.

Comments closed

Deploy SQL Server R Services Without Internet Access

Arvind Shyamsundar shows how to install SQL Server R Services on a machine without internet access:

When deploying SQL Server R Services, it is important to note that the setup components for SQL Server do not include the Microsoft R Open and Microsoft R Server components. Those ‘R Components’ (as we will refer to them later in this post) are provided as separate downloadable components. SQL Server will automatically download these when executed on computer which is connected to the Internet. But in cases where setup is done on a computer without Internet access (quite typical of many SQL Server deployments) we need to handle things differently. There is a documented process for doing this. But even with the documentation, we still had some customers with questions on the process.

Inspired by those customer engagements, this blog post walks through the process of setting up SQL Server R Services in environments without Internet access. We walk through a number of scenarios, right from the very basic scenario to the more complex ones involving unattended and ‘smart setup’.

This is a nice walkthrough.  I wanted to highlight a link at the end showing how to create a local repository so you can install packages as well.

Comments closed

SELECT INTO

Daniel Janik is not a fan of SELECT INTO:

This query for AdventureWorks will dump all of its results into a table named #MyDuplicateCities. Note that there is no CREATE TABLE statement. The INTO [tablename] will create the table for you.

Running this query a second time will result in failure if you haven’t dropped the #MyDuplicateCities table.

Using this syntax can be really helpful if you just need to do some quick and dirty cleanup; however, it should be avoided for stored procedures. Here’s why…

There are some trade-offs here and good arguments either way.  The comments tend to take the pro approach, so they’re worth reading as well.

Comments closed

Database Snapshot Creation History

Paul Randal shows how to read the master transaction log to find when database snapshots were created:

Earlier today someone asked on the #sqlhelp Twitter alias if there is a history of database snapshot creation anywhere, apart from scouring the error logs.

There isn’t, unfortunately, but you can dig around the transaction log of the master database to find some information.

When a database snapshot is created, a bunch of entries are made in the system tables in master and they are all logged, under a transaction named DBMgr::CreateSnapshotDatabase. So that’s where we can begin looking.

Click through for the script and some explanation around it.

Comments closed

Max Data Types In Queries

Erik Darling shows how variable definition can be important, even without implicit conversion:

SQL Server makes many good and successful attempts at something called predicate pushdown, or predicate pushing. This is where certain filter conditions are applied directly to the data access operation. It can sometimes prevent reading all the rows in a table, depending on index structure and if you’re searching on an equality vs. a range, or something else.

What it’s really good for is limiting data movement. When rows are filtered at access time, you avoid needing to pass them all to a separate operator in order to reduce them to the rows you’re actually interested in. Fun for you! Be extra cautious of filter operators happening really late in your execution plans.

Click through for Erik’s demo.

Comments closed

Flushing Change Tracking Internal Tables

Amit Banerjee mentions a new stored procedure for change tracking cleanup:

In SQL Server 2014 Service Pack 2 and above, we provided a new Stored Procedure, sp_flush_CT_internal_table_on_demand, to assist with Change Tracking cleanup. KB3173157 has more details. This stored procedure accepts a table name as parameter and will attempt to cleanup records from the corresponding change tracking internal table.  During the course of the deletion, it will print some verbose in the output window about the progress of deletion.

If automated change tracking cleanup works well enough for you, there’s no change; but if you’re struggling with that cleanup, this procedure might help.

Comments closed

What Is DevOps?

Julia Evans looks at what DevOps means in practice:

I enjoyed reading this article about devops at Etsy. One of the really key things about this article is – there is no devops organization at Etsy. It’s about how developers and operations people work productively together! Also, it was a slow incremental migration towards different practices. They did not wake up one day and become devops. I think this is the first talk that used the term ‘devops’?

It’s also not about “everyone is a software developer” – one of the authors of this book, Katherine Daniels, is a senior operations engineer at Etsy at Etsy. I don’t know any of the details of her job, but my impression is that she has a lot of expertise in operations. It’s not like “make operations so easy that nobody has to an expert at it”. Of course you need people who know a ton about operations! Probably those people write software as part of their job?

One of the scariest realizations that I’m slowly coming to (other than “Information Technology is people!”) is the sheer number of overlapping dependencies in the tech world.  A bit earlier in my career, I felt like I could be “a SQL Server guy” and focus on that while not caring too much about the outside world.  It seems like saying that you want to be “just an X” has become more difficult at the margin, and DevOps is just one example of this:  keeping an edge means going broader about more things while still trying to dig deeper in relevant areas.  That’s a tough balancing act.

Comments closed