Press "Enter" to skip to content

Day: February 15, 2018

Optimal Image Colorization With Python

Sandipan Dey walks through a paper on colorization and shows some examples:

Colorization is a computer-assisted process of adding color to a monochrome image or movie. In the paper the authors presented an optimization-based colorization method that is based on a simple premise: neighboring pixels in space-time that have similar intensities should have similar colors.

This premise is formulated using a quadratic cost function  and as an optimization problem. In this approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence.

In this article the optimization problem formulation and the way to solve it to obtain the automatically colored image will be described for the images only.

It’s an interesting approach.

Comments closed

Radar Charts With ggplot2

I have wrapped up my ggplot2 series, with the last post being on radar charts:

First, we need to install ggradar and load our relevant libraries. Then, I create a quick standardization function which divides our variable by the max value of that variable in the vector. It doesn’t handle niceties like divide by 0, but we won’t have any zero values in our data frames.

The radar_data data frame starts out simple: build up some stats by continent. Then I call the mutate_each_ function to call standardize for each variable in the vars set. mutate_each_is deprecated and I should use something different like mutate_at, but this does work in the current version of ggplot2 at least.

Finally, I call the ggradar() function. This function has a large number of parameters, but the only one you absolutely need is plot.data. I decided to change the sizes because by default, it doesn’t display well at all on Windows.

It was a lot of fun putting this series together. I think the most important part of the series was learning just how easy ggplot2 is once you sit down and think about it in a systemic manner.

Comments closed

Using The Power Query SDK

Chris Webb shows how to build M queries in Visual Studio:

Writing M in the Advanced Editor in Excel or Power BI can be a frustrating experience unless you’re the kind of masochist who loves writing code in Notepad. There are some options for writing M code outside Excel and Power BI, for example Lars Schreiber’s M extension for Notepad++ (see here for details) or the M extension for Visual Studio Code (available from the Visual Studio Marketplace here; more details on Brett Powell’s blog here), but the trouble with them is that you have to copy the code back into Excel or Power BI to run it. What many people don’t realise, however, is that it is possible to write M code and have IntelliSense, formatting, keyword highlighting and also the ability to execute your own M queries, using the Power Query SDK in Visual Studio.

The Power Query SDK (which you can download here) supports Visual Studio 2015 and 2017 and is intended for people who are writing custom Data Connectors for Power BI. To let you test your Data Connector you can create a .pq file containing M code, and this in fact allows you to run any M query you want whether you’re building a Data Connector or not.

And then, once you get comfortable with M, start learning F#.  That will allow you to laugh haughtily at those poor object-oriented sods out there.

Comments closed

Parallel Execution With SSIS Framework Community Edition

Andy Leonard has a great post on parallel execution with SSIS:

Sit a spell and let Grandpa Andy tell yall a story about some data integratin’.

Suppose for a minute that you’ve read and taken my advice about writing small, unit-of-work SSIS packages. I wouldn’t blame you for taking this advice. It’s not only online, it’s written in a couple books (I know, I wrote that part of those books). One reason for building small, function-y SSIS packages is that it promotes code re-use. For example, SSIS packages that perform daily incremental loads can be re-used to perform monthly incremental loads to a database that serves as a data mart by simply changing a few parameters.

Change the parameter values and the monthly incremental load can load both quarterly and yearly data marts.

You want better performance out of the daily process, so you read and implement the parallel execution advice* you’ve found online. For our purposes let’s assume you’ve designed a star schema instead of one of those pesky data vaults (with their inherent many-to-many relationships and the ability to withstand isolated and independent loads and refreshes…).

You have dependencies. The dimensions must be loaded before the facts. You decide to manage parallelism by examining historical execution times. Since you load data in chronological order and use a brute-force change detection pattern, the daily dimension loads always complete before the fact loads reach the latest data. You decide to fire all packages at the same time and your daily execution time drops by half, monthly executions time drops to 40% of its former execution time, and everyone is ecstatic…

…until the quarterly loads.

This is a great post.

Comments closed

The Value Of Schemabinding

Vitaly Bruk explains what schemabinding is and why we sometimes need WITH SCHEMABINDING in our code:

In SQL Server, when we use the “WITH SCHEMABINDING” clause in the definition of an object (view or function), we bind the object to the schema of all the underlying tables and views. This means that the underlying tables and views cannot be modified in a way that would affect the definition of the schema-bound object. It also means that the underlying objects cannot be dropped. We can still modify those tables or views, as longs as we don’t affect the definition of the schema-bound object (view or function).

If we reference a view or function in a schema-bound object (view or function), then the underlying view or function must also be schema-bound. And we cannot change the collation of a database, if it contains schema-bound objects.

I’ve only used schemabinding when mandated (e.g., using row-level security or creating an indexed view), but I can see the value behind using it with normal development.

Comments closed

Loading Excel Files With Powershell

Garry Bargsley walks us through a solution he implemented to load Excel file data into Powershell:

Recently one of our development teams has increased the request of importing an Excel file with 20 sheets in to 20 tables in a database from about once a quarter to multiple times a week and this past Monday was three times in one day.  I have been the lucky DBA to get these requests as of late and after Monday I was determined to fix the process.  The current procedure is to use the good ol’ Import/Export Wizard since this was a rare request. (This included a lot of point and click and possibility for manual error)  With increased requests and increased table counts I knew there had to be a better way to get this accomplished without grimacing each time I see the request.

Garry has a script which he uses, but which can be tailored for other uses pretty easily.

Comments closed

Executing Powershell Against Multiple Servers

Stuart Moore shows an easy way to execute a Powershell script against multiple servers:

We setup new PsSessions using New-PsSession, I set ErrorAction to SilentlyContinue just in case a host isn’t available for some reason (if I was being good I’d try/catch here).

As we’re just using PS standard functionality here with Get-Service there’s no need to build a a new function, we can just call this directly here. By calling Invoke-Command against a session pointed at numerous hosts we can PowerShell handle all the connection management here and just assume the command will be ran against each host. if we were running against a lot of hosts then we would want to look into using the -ThrottleLimit parameter to limit the number of concurrent hosts we’re hitting. The one little trick here is using the using scope modifier here so PS pulls in the variable defined in our main scope (gory details on scoping here

Click through for the script, and do check out the comments, where Stuart gives a bit of advice when you’re trying to execute against a large number of servers.

Comments closed

GROUP BY vs DISTINCT

Rob Farley looks at how GROUP BY and DISTINCT and lead you down different execution plan paths:

What I want to explore in this post is the particular example that we both used… to bring an important point that could be missed because of the similarity of our examples.

You see, we both happened to use a FOR XML concatenation query, looking back at the same table. We did this to simulate a practical GROUP BY – somewhere that you might feel like GROUP BY is useful, but you know that you’re not using an aggregate function like SUM or MAX, but there isn’t one available. Ok, for Aaron he could’ve used the really new STRING_AGG, but for the old-timer like me, having to use SQL Server 2005, that wasn’t available.

In this post, Rob looks at a different sort of example and sees a more complicated scenario unfold.

Comments closed