Press "Enter" to skip to content

Category: Integration Services

Test Connection With HDInsight

I have a post trying to test a connection using HDInsight:

WebHCat is a web-based REST API for HCatalog, a management layer for dealing with files in HDFS.  If you’re looking for configuration settings for WebHCat, you’ll want generally to look for “templeton” in config files, as Templeton was the project name before WebHCat.  In Ambari, you can go to the Hive configs and look at webhcat-site.xml for configuration settings.  For WebHCat, the default port in HDInsight is 30111, which you should find in the templeton.port configuration setting.

I don’t like the fact that WebHDFS is blocked, but at least WebHCat is functional.

Comments closed

Alert On SQL Jobs Missing Schedules

Brian Hansen wraps up a three-part series on scheduled job alerts:

The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.

To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.

Regardless of how you do it, knowing when jobs fail is important enough to build some infrastructure around answering this question.

Comments closed

Finding Packages Which Use Configurations

Bill Fellows explains how to find SSIS packages still using the Configuration option in the classic deployment model:

Create an SSIS package. Add a Variable to your package called FolderSource and assign it the path to your SSIS packages. Add a Script Task to the package and then add @[User::FolderSource] to the ReadOnly parameters.

Double click the script, assuming C#, and when it opens up, use the following script as your Main

Bill continues on with the contents of his script task, so click through for more.

Comments closed

SSIS And SSRS Practices

Chris Seferlis has a list of practices which he’s learned over the years:

Use Source Control

  1. For anyone who was a developer in their past life, or is one now, this is a no-brainer, no-alternative best practice.  In my case, because I come from a management and systems background, I’ve had to learn this the hardway.  If this is your first foray into development, get ready, because you’re in for some mistakes, and you’re going to delete or change some code you really wish you didn’t.  Whether it be for reference purposes on something you want to change, or something you do by accident, you’re going to need that code you just got rid of yesterday, and we both know you didn’t back up your Visual Studio jobs… Hence, source control.  Github and Microsoftoffer great solutions for Visual Studio, and Redgate offers a great solution for SSMS.  I highly recommend checking them out and using the tools!  There are some other options out there that are free, or will save your code to local storage locations, but the cloud is there for a reason, and many of us are on the go, so having it available from all locations is very helpful.

Regarding source control for Integration Services packages, that’s a good reason to learn Biml—it works much better for source control than the native packages (which change every time you open the package and contain a lot of noise).

Comments closed

Control Flow Package Parts

Todd McDermid explains a feature new to Integration Services 2016:

The basic idea behind package parts makes complete sense to a coder – they’re macros.  You take code you’ve used in several places, put it in a separate file that you then include and “expand” in multiple other files.
If you have multiple packages with parts of the Control Flow that are identical – setting up a database in a certain way, sending emails, calling a set of stored procedures, … – then Control Flow Package Parts can help.
The assistance isn’t just limited to the initial coding, either.  Yes – creating a new package with the “duplicate” code is much easier.  But the real gain of Control Flow Package Parts is when your “standard” code needs changes.  Instead of having to edit multiple packages to address the modifications – you only have to alter the package part.  Deploying the project(s) that depend on this part automatically incorporates those improvements.

I’d be a lot more interested in this if Biml weren’t already a better option.  Read on for Todd’s rundown.

Comments closed

Processing 2016 Tabular From SSIS 2014

Meagan Longoria shows how to process a Tabular Model with a compatibility level of 1200 in SQL Server Integration Services 2014:

Attempting to use the AS Processing Task results in the following error: “[Analysis Services Execute DDL Task] Error: This command cannot be executed on database ‘MySSASDB’ because it has been defined with StorageEngineUsed set to TabularMetadata. For databases in this mode, you must use Tabular APIs to administer the database”

The reason for keeping SSAS processing in an SSIS package was because it kept consistent logging throughout their data refresh process. So we set out to find another solution.

Read on for the explanation and the solution.

Comments closed

SSISDB Maintenance

Jesse Seymour shows how to trim the SSIS catalog size:

The options we are interested in are OPERATION_CLEANUP_ENABLED and RETENTION_WINDOW.  By default, RETENTION_WINDOW is 365. and OPERATION_CLEANUP_ENABLED is TRUE.

Since we want to set our retention window to 10 days, we need to update RETENTION_WINDOW to 10.  We could do this with a simple update statement, but Microsoft provides us with a stored procedure that will do that for us.  The benefit of the stored procedure over the UPDATE statement is that a vendor-provided stored procedure will typically encapsulate any additional steps required.

I do not at all like the idea of running SHRINKDATABASE and definitely wouldn’t have that plus a backup in the deletion loop, but if you get caught in a nasty situation with SSISDB, this can serve as the starting point for digging yourself out.

Comments closed

Against Visual Programming Languages

Ian Hellstrom has a critique of visual programming languages for data engineers:

Anyone with a software development background who has ever dealt with visual ETL tools may have marvelled at the lack of proper version control and diff tools that go with it. Some tools come with their own built-in VCS, while others allow you to use any or no VCS at all. The difficulty lies in the fact that the visual representation is often stored as an XML (or JSON) file. So, if a box is moved by 1 pixel, the file is different. You could argue that it’s indeed different because the layout is different, but you could equally make the case that the logic has not changed. This argument is moot though: it is technically possible to ensure that the tool auto-aligns blocks and routes/colours arrows, very much like yEd does (via menu items). Some users may not be happy with the reduced control over the way the flow looks, but others may rejoice that version control has become usable.

ETL (and ORM) tools often auto-generate code that is not particularly tuned for the data source in question. I have encountered many odd nested loops where simple hash joins would have been more appropriate if only the predicates had been pushed down properly (and if only the tool had evaluated blocks lazily). Aggregations and timestamp-based filters are also often a cause for performance issues. Again, performance is technically solvable, so this may be a valid argument against visual tools in data engineering now but perhaps not tomorrow.

This is a good argument against VPLs, although there are a couple of good arguments for VPLs, including how it’s easier to see if the overall architecture of a flow looks correct.  In the end, I like the compromise that Biml offers Integration Services developers:  write code but visualize results.

Comments closed

SSISDB Management

Andy Leondard describes steps you can take to maintain the SSIS Catalog:

Back It Up

As with all SQL Server database, please back up SSISDB. What follows is a (very) basic guide describing one simple method to backup your SSISDB database. Please, please, please learn more about SQL Server backup and restore options and their implications before backing up an SSISDB database in your enterprise. Feel free to use the steps I describe on your laptop or a virtual machine. And please remember…

Backups are useless. Restores are priceless. Conduct practice Disaster Recovery exercises in which you restore databases and then test functionality. You’ll be glad you did. Here is a link containing Microsoft’s advice on restoring the SSISDB database in SQL Server 2016.

The advice is pretty similar to what you’d expect for any other database, but there are a couple twists around SSISDB functionality, so do read on.

Comments closed

Finding Destinations In SSISDB

Bill Fellows has a script to figure out the name of that table throwing errors upon insertion:

There is a rich set of tables and views available in the SSISDB that operate as a flight recorder for SSIS packages as they execute. Markus Ehrenmüller (t) had a great question in Slack. In short, can you figure out what table is being used as a destination and I took a few minutes to slice through the tables to see if I could find it.

If it’s going to be anywhere, it looks like you can find it in catalog.event_message_context

If someone is using an OLE DB Destination and uses “Table or view” or “Table or View – fast load” settings, the name of the table will be the event message_context table. If they are using a variable name, then it’s going to be trickier.

Read on for the script.

Comments closed