Press "Enter" to skip to content

Author: Kevin Feasel

Optimizing SSIS Catalog Cleanup

Tim Mitchell has a script which replaces [internal].[cleanup_server_retention_window] in the SSISDB database:

Earlier this week, I blogged about the automatic cleanup process that purges old data from the SSIS catalog logging tables. This nightly process removes data for operations that are older than 365 days. While this is useful, many SSIS admins have complained that this process is very slow and contentious on large or busy SSISDB databases.
In this post, I’ll show to you one of the main reasons this purge process is slow, and will share a more efficient way of performing this delete operation.

Click through for the script and explanation.

Comments closed

Preventing Execution With PARSEONLY And NOEXEC

Solomon Rutzky shows us a way to prevent accidental full script execution:

There are times when I am working on a SQL script that really shouldn’t be executed all at once. Sometimes it’s a series of examples / demos for a presentation or forum answer. Other times it’s just a temporary need while I’m in the process of creating a complex script, but once the script is completed and tested then it should run all at once. In either case, I have accidentally hit F5 too many times when I thought that a certain section of code was highlighted (so only that section would execute) but in fact nothing was highlighted so the script started executing from the very top, and either ran until completion or until I was able to cancel it (if it ran long enough for me to have time to understand what was happening and hit the “cancel” button).
So I needed some way of ensuring that a script would not execute if no section was highlighted.

Read on to learn about PARSEONLY and NOEXEC.

Comments closed

Combining M and Python To Export Power BI Data To CSVs

Imke Feldmann combines Python and M to export Power BI table data to CSV format written out to a directory of your choosing:

Why Python?
I prefer it to R mostly because I don’t have to create the csv-file(names) in advance before I import data to it. This is particularly important for scenarios where I want to append data to an existing file. The key for this task is NOT to use the append-option that Python offers, because M-scripts will be executed multiple times and this would create a total mess in my file. Instead I create a new file with the context to append and use the Import-from-folder method instead to stitch all csvs back together. Therefore I have to dynamically create new filenames for each import. So when the M-Python-scripts are executed repetitively here, the newly created file will just be overwritten – which doesn’t do any harm.

Click through for the code as well as a few caveats.

Comments closed

CSV Cleanup With Powershell

Andy Mallon has a quick fix when some rows in a CSV are missing values:

I was just goofing around with the data, so I didn’t really need anything perfect…but I did want something that was good enough to be repeatable, in case I wanted to do it again.
Fixing thousands of rows by hand sounded like torture. Heck. No.
The data was from a publicly available data set, so getting the file format fixed seemed like it would probably be neither quick nor easy. Depending on others could be a dead end, and while this would be the “rightest” solution to ensure a stable future fix, it was overkill for my casual playtime.

Andy has shown the easy way. Now we lock him in a room with sed and a book on regular expressions to learn the other way. The correct answer to that, of course, is to fashion a pick kit out of the book (and whatever else you might be able to acquire) to get out.

Comments closed

SQL Server R Services 2016 Bug

Gonzalo Bissio takes us through a bug in SQL Server R Services 2016 (which has already been fixed in 2017):

2 Weeks ago I was working in a very interesting case in SQL 2016. I received an email from one of my customers saying that they were having intermitent issues within their app that was executing some SP_Execute_External_script SP calls to the database.

[…]


We also restarted the launchpad service but with no luck….. The biggest challenge was that sometimes the service responded fine and sometimes it showed the issue that I paste above (And this for me was absolutelly new…). From SQL Side we executed an extended events session with all R services counters but nothing appeared… From sys.dm_Exec_session_Wait_stats we just observed that the session was waiting for SATELLITE_SERVICE_SETUP wait which points that SQL Was waiting an answer from the R service itself.

Click through for the solution.

Comments closed

Using Hive For Data Virtualization

Gunther Hagleitner, et al, walk us through some reasons why we might want to use Apache Hive for data virtualization:

Assume you want to execute a Hive query that accesses data from an external RDBMS behind a JDBC connection. A possible naïve way of doing this would treat the JDBC source as a “dumb” storage system, reading all the raw data over JDBC and processing it in Hive. In this case you would ignore the query capabilities of the RDBMS and pull too much data over the JDBC link, thus ending up with poor performance and an overloaded system.
For that reason, Hive implements smart push-down to other systems by relying on its storage handler interfaces and cost-based optimizer (CBO) powered by Apache Calcite. In particular, Calcite provides rules that match a subset of operators in the logical representation of the query and generates a new equivalent representation with more operations executed in the external system. Hive includes those rules that push computation to the external systems in its query planner, and then relies on Calcite to generate a valid query in the language that those systems support. The storage handler implementations are responsible to send the generated query to the external system, retrieve its results, and transform the incoming data into Hive internal representation so it can be processed further if needed.

A lot of platforms are moving toward data virtualization (e.g., SQL Server with its Big Data Clusters). That appears to be the next product battleground.

Comments closed

$null In Powershell

Kevin Marquette goes into great detail on Powershell’s $nullconcept:

When a $null value is used in a numeric equation then your results will be invalid if they don’t give an error. Sometimes the $null will evaluate to 0 and other times it will make the whole result $null. Here is an example with multiplication that gives 0 or $nulldepending on the order of the values.

Nulls are tricky to handle in any language, making their nuances important to understand.

Comments closed

Becoming An Expert

Adrian Colyer wraps up The Morning Paper for the year by reviewing a big picture paper on developer expertise:

You’ll know an expert programmer by the quality of the code that they write. Experts have good communication skills, both sharing their own knowledge and soliciting input from others. They are self-aware, understanding the kinds of mistakes they can make, and reflective. They are also fast (but not at the expense of quality).
Experience should be measured not just on its quantity (i.e., number of years in the role), but on its quality. For example, working on a variety of different code bases, shipping significant amounts of code to production, and working on shared code bases. The knowledge of an expert is T-shaped with depth in the programming language and domain at hand, and a broad knowledge of algorithms, data structures, and programming paradigms.

Click through for the full review.

Comments closed

The Ultimate Powershell Telemetry Prompt

Jeffery Hicks might have taken things a bit too far:

Well, I knew I wouldn’t be satisfied. The other day I shared a PowerShell prompt function that could display telemetry like information for a few remote servers. One of the drawbacks was the limited amount of information I could display. I’ve revised that function and have a new version that displays additional information via a few performance counters. I’ve also reorganized the function to make it a bit more efficient. Want to see it?

My jokey lede aside, this is really cool. Click through for details and to get a link to the code.

Comments closed

Displaying Human-Readable Month Sets With DAX

Alberto Ferrari wants to show sets of contiguous months using DAX:

Today I woke up with an interesting question, about how to show a selection of months in a nice way, detecting contiguous selection. You can easily understand the desired solution from the following figure:

I enjoyed writing a quick solution, which is worth sharing. The code is somewhat verbose, but this is mainly for educational purposes (meaning I did not want to spend time optimizing it). I will likely write a full article on it, for now, just enjoy some DAX code:

I removed the image, but to get the gist (and get you to click through to see it in its beauty), it reads “January, March-April, August-December”

Click through for Alberto’s quick-and-dirty solution and then Chris Webb’s improvement.

Comments closed