Press "Enter" to skip to content

Month: February 2018

Navigating A SQL Server Instance With Powershell

Andy Mallon shows off the drive-based navigation available in the SQL Server Powershell module:

I recently noticed that the system objects were missing from my results when I do a Get-ChildItem. I noticed it with views, but then realized that none of the system objects showed up. What gives? I floundered through a quick Google search, where I knew I wasn’t searching for the right thing, and was not surprised when I didn’t see the answer.

I said to myself, “Andy, hold on a second & think. If something doesn’t want to open up, sometimes you just have to -force it open.”

I don’t tend to use this much, as I have recollections of it being slow.  Nonetheless, it is good to know about.

Comments closed

Find And Replace Stored Procedure Code

Jana Sattainathan has to find and replace a lot of code:

The database has 100’s of stored procedures (700+ to be somewhat precise)

Qualifying stored procedures have these characteristics

  • Procedure name ends with “_del”
  • Procedure has the string “exec” in the code
  • Procedure has the string “sp_execute” in the code

This is what needs to be done:

  • Replace CREATE PROC with ALTER PROC

  • Replace SYSTEM_USER with “ORIGINAL_LOGIN()”

  • Replace AS at the beginning of CREATE PROC with “WITH EXECUTE AS OWNER AS”

  • Comment out some SET statements

  • …in fact, there could be any number of other changes

Read on to see how Jana did it.

Comments closed

The Downside Of Nested Views

Randolph West doesn’t mince words:

Nested views are bad. Let’s get that out of the way.

What is a nested view anyway? Imagine that you have a SELECT statement you tend to use all over the place (a very common practice when checking user permissions). There are five base tables in the join, but it’s fast enough.

Instead of copying and pasting the code wherever you need it, or using a stored procedure (because for some reason you’re allergic to stored procedures), you decide to simplify your code and re-use that SELECT in the form of a database view. Now whenever you need to run that complicated query, you can instead query the view directly.

What harm could this do?

Spoilers:  a lot.

Comments closed

Diagramming Databases With Power BI

Philip Seamark shows how to visualize the relationships between tables using Power BI:

The network navigator was another good visual, and if you have an R instance installed on your local machine, you can play with some of the custom R visuals.

The catalog views could be used in a similar way to generate power bi visuals showing other object dependencies inside an MS SQL Database.  Additional columns could be added to the base query to be used in tool-tips etc.

If your database tables do not have foreign keys, here is another query that can be used to help guess relationships between tables based on column name and datatype.

Most of these techniques look like they wouldn’t work well on databases with a very large number of tables, but for an average-sized database, it can serve as an avant-garde ERD.

Comments closed

Batch Mode Memory Fractions

Joe Obbish explains what memory fractions are and how incorrect calculations can lead to tempdb spills:

There’s very little information out there about memory fractions. I would define them as information in the query plan that can give you clues about each operator’s share of the total query memory grant. This is naturally more complicated for query plans that insert into tables with columnstore indexes but that won’t be covered here. Most references will tell you not to worry about memory fractions or that they aren’t useful most of the time. Out of thousands of queries that I’ve tuned I can only think of a few for which memory fractions were relevant. Sometimes queries spill to tempdb even though SQL Server reports that a lot of query memory was unused. In these situations I generally hope for a poor cardinality estimate which leads to a memory fraction which is too low for the spilling operator. If fixing the cardinality estimate doesn’t prevent the spill then things can get a lot more complicated, assuming that you don’t just give up.

Extremely interesting post.

Comments closed

Optimal Image Colorization With Python

Sandipan Dey walks through a paper on colorization and shows some examples:

Colorization is a computer-assisted process of adding color to a monochrome image or movie. In the paper the authors presented an optimization-based colorization method that is based on a simple premise: neighboring pixels in space-time that have similar intensities should have similar colors.

This premise is formulated using a quadratic cost function  and as an optimization problem. In this approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence.

In this article the optimization problem formulation and the way to solve it to obtain the automatically colored image will be described for the images only.

It’s an interesting approach.

Comments closed

Radar Charts With ggplot2

I have wrapped up my ggplot2 series, with the last post being on radar charts:

First, we need to install ggradar and load our relevant libraries. Then, I create a quick standardization function which divides our variable by the max value of that variable in the vector. It doesn’t handle niceties like divide by 0, but we won’t have any zero values in our data frames.

The radar_data data frame starts out simple: build up some stats by continent. Then I call the mutate_each_ function to call standardize for each variable in the vars set. mutate_each_is deprecated and I should use something different like mutate_at, but this does work in the current version of ggplot2 at least.

Finally, I call the ggradar() function. This function has a large number of parameters, but the only one you absolutely need is plot.data. I decided to change the sizes because by default, it doesn’t display well at all on Windows.

It was a lot of fun putting this series together. I think the most important part of the series was learning just how easy ggplot2 is once you sit down and think about it in a systemic manner.

Comments closed

Using The Power Query SDK

Chris Webb shows how to build M queries in Visual Studio:

Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars Schreiber’s M extension for Notepad++ (see here for details) or the M extension for Visual Studio Code (available from the Visual Studio Marketplace here; more details on Brett Powell’s blog here), but the trouble with them is that you have to copy the code back into Excel or Power BI to run it. What many people don’t realise, however, is that it is possible to write M code and have IntelliSense, formatting, keyword highlighting and also the ability to execute your own M queries, using the Power Query SDK in Visual Studio.

The Power Query SDK (which you can download here) supports Visual Studio 2015 and 2017 and is intended for people who are writing custom Data Connectors for Power BI. To let you test your Data Connector you can create a .pq file containing M code, and this in fact allows you to run any M query you want whether you’re building a Data Connector or not.

And then, once you get comfortable with M, start learning F#.  That will allow you to laugh haughtily at those poor object-oriented sods out there.

Comments closed

Parallel Execution With SSIS Framework Community Edition

Andy Leonard has a great post on parallel execution with SSIS:

Sit a spell and let Grandpa Andy tell yall a story about some data integratin’.

Suppose for a minute that you’ve read and taken my advice about writing small, unit-of-work SSIS packages. I wouldn’t blame you for taking this advice. It’s not only online, it’s written in a couple books (I know, I wrote that part of those books). One reason for building small, function-y SSIS packages is that it promotes code re-use. For example, SSIS packages that perform daily incremental loads can be re-used to perform monthly incremental loads to a database that serves as a data mart by simply changing a few parameters.

Change the parameter values and the monthly incremental load can load both quarterly and yearly data marts.

You want better performance out of the daily process, so you read and implement the parallel execution advice* you’ve found online. For our purposes let’s assume you’ve designed a star schema instead of one of those pesky data vaults (with their inherent many-to-many relationships and the ability to withstand isolated and independent loads and refreshes…).

You have dependencies. The dimensions must be loaded before the facts. You decide to manage parallelism by examining historical execution times. Since you load data in chronological order and use a brute-force change detection pattern, the daily dimension loads always complete before the fact loads reach the latest data. You decide to fire all packages at the same time and your daily execution time drops by half, monthly executions time drops to 40% of its former execution time, and everyone is ecstatic…

…until the quarterly loads.

This is a great post.

Comments closed