One of the first Data Integration Lifecycle Management (DILM) Suite solutions I built was Catalog Reports. Catalog Reports is a relatively simple and straightforward version of some of the SSIS Catalog Reports embedded in SSMS. The main difference is Catalog Reports is a SQL Server Reporting Services (SSRS) solution.
And it’s open source. Here’s a screenshot of the Overview Report for the same execution above
Check it out.
Data Integration Lifecycle Management (DILM) is not about data integration development.
DILM is about everything else:
- Configurations Management
- Version Management
Although DILM is not about development, implementing DILM will impact the design of SSIS solutions.
This is the first part in a multi-part series, and covers some of the conceptual basicsbehind DILM.
Which I looked up every..single..time I used it. “?” means THEN…not IF? “:” means ELSE? Huh? I know this comes from one of those cool languages that I have never mastered, but as I was searching for the syntax again a few days ago, I found REPLACENULL. I had never seen this function before, so I figured I might not be the only one. And perhaps if a commenter feels like telling me how dumb I am to not know about other new expression features I will not be offended. REPLACENULL won’t replace every use of the these and the other symbols one must use for SSIS expressions, it does replace one of the more common ones.
Click through for usage. It’s a bit easier to understand than the ternary operator. To answer Louis’s question, a ? b : c comes from C# syntax.
If the reference was broken after the SSIS package execution was scheduled, we may see an error similar to that shown below in the SQL Agent log for the job step that attempted to execute the SSIS package:
Failed to execute IS server package because of error 0x80131904. Server: vmSql16\Test, Package path: \SSISDB\Test\ParametersTest\SensitiveTest.dtsx, Environment reference Id: 35. Description: The environment ‘env2’ does not exist or you have not been granted the appropriate permissions to access it.
Andy has an explanation of what these are, how you might find them, and how to fix them.
Data Flow GroupsData Flow Groups is what they’re calling it, and it’s deceptively simple to use. One of the reasons I’m sure I (and SSIS people I talk to who DID NOT LET ME KNOW IT WAS THERE) missed it is because I was expecting it to be a component in the toolbox. Not so.Code up your Data Flow as you normally would. Then go and select the components that you want to group together – via clicking and dragging a selection window, or click-selecting components. Any component combinations you want. Then right-click and select Group.
I admit that I didn’t know it existed either. This does seem rather useful.
What I do want to bring to your attention is the differences between the two when it comes to redirecting error rows, specifically rows that are truncated. One of the beauties of SSIS is the ability to output rows that fail to import through the error pipeline and push them into an error table for example. With fast load there is a downside to this, the whole batch will be output even if there is only 1 row that fails, there are ways to handle this and a tried and tested method is to push those rows into another OLE DB Destination where you can run them either in smaller batches and keep getting smaller or simply push that batch to run in row-by-row to eventually output the 1 error you want. Take a look at Marco Schreuder’s blog for how this can be done.
One of the issues we have exerienced in the past is that any truncation of a column’s data in fast load will not force the package to fail. What? So a package can succeed when in fact the data itself could potentially not be complete!?! Yes this is certainly the case, lets take a quick look with an example.
Read on for details and potential workarounds.
WebHCat is a web-based REST API for HCatalog, a management layer for dealing with files in HDFS. If you’re looking for configuration settings for WebHCat, you’ll want generally to look for “templeton” in config files, as Templeton was the project name before WebHCat. In Ambari, you can go to the Hive configs and look at webhcat-site.xml for configuration settings. For WebHCat, the default port in HDInsight is 30111, which you should find in the templeton.port configuration setting.
I don’t like the fact that WebHDFS is blocked, but at least WebHCat is functional.
The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.
To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.
Regardless of how you do it, knowing when jobs fail is important enough to build some infrastructure around answering this question.
Create an SSIS package. Add a Variable to your package called FolderSource and assign it the path to your SSIS packages. Add a Script Task to the package and then add @[User::FolderSource] to the ReadOnly parameters.
Double click the script, assuming C#, and when it opens up, use the following script as your Main
Bill continues on with the contents of his script task, so click through for more.
Use Source Control
For anyone who was a developer in their past life, or is one now, this is a no-brainer, no-alternative best practice. In my case, because I come from a management and systems background, I’ve had to learn this the hardway. If this is your first foray into development, get ready, because you’re in for some mistakes, and you’re going to delete or change some code you really wish you didn’t. Whether it be for reference purposes on something you want to change, or something you do by accident, you’re going to need that code you just got rid of yesterday, and we both know you didn’t back up your Visual Studio jobs… Hence, source control. Github and Microsoftoffer great solutions for Visual Studio, and Redgate offers a great solution for SSMS. I highly recommend checking them out and using the tools! There are some other options out there that are free, or will save your code to local storage locations, but the cloud is there for a reason, and many of us are on the go, so having it available from all locations is very helpful.
Regarding source control for Integration Services packages, that’s a good reason to learn Biml—it works much better for source control than the native packages (which change every time you open the package and contain a lot of noise).