DILM In Practice

Andy Leonard continues his series on data integration lifecycle management with a discussion of package workflow:

The “Stage EDW Data” Framework Application is identified by ApplicationID 2. If you recall, ApplicationID 2 is mapped to PackageIDs 4, 5, and 6 (LoadWidgetsFlatFile.dtsx, LoadSalesFlatFile.dtsx, and ArchiveFile.dtsx) in the ApplicationPackages table shown above.

The cardinality between Framework Applications and Framework Packages is many-to-many. We see an Application can contain many Packages. The less-obvious part of the relationship is represented in this example: a single Package can participate in multiple Applications or even in the same Application more than once. Hence the need for a table that resolves this many-to-many relationship. I hope this helps explain why Application Package is the core object of our SSIS Frameworks.

Read on for the rest of the story.

Understanding SSIS Catalog References

Andy Leonard has a detailed post on setting up SSIS catalog environments and references:

Why This Complexity?

My argument for embracing (or ignoring) the complexity of SSIS Catalog configuration is straightforward: Externalization – the act of storing execution-time values outside of the object to be executed – is one way to achieve portability, a software development best practice.

As an architect I loathe complexity. Whenever I encounter (or opt to develop) complexity in a software project, I question it. I ask others to question it. I don’t always arrive at the right conclusion, but I challenge complexity at every turn.

I’ve helped several organizations implement SSIS Catalog portability using Environments and References. We all agree that the solution is complex, but most also agree that the complexity is worth the flexibility achieved thereby.

This is a fairly lengthy blog post, full of good information.

DILM Reports

Andy Leonard walks through executing an Integration Services package and then seeing the results in an SSRS report:

One of the first Data Integration Lifecycle Management (DILM) Suite solutions I built was Catalog Reports. Catalog Reports is a relatively simple and straightforward version of some of the SSIS Catalog Reports embedded in SSMS. The main difference is Catalog Reports is a SQL Server Reporting Services (SSRS) solution.

It’s free.

And it’s open source. Here’s a screenshot of the Overview Report for the same execution above

Check it out.

Understanding Data Integration Lifecycle Management

Andy Leonard explains DILM:

Data Integration Lifecycle Management (DILM) is not about data integration development.

DILM is about everything else:

  • Configurations Management
  • Version Management
  • Deployment
  • Execution

Although DILM is not about development, implementing DILM will impact the design of SSIS solutions.

This is the first part in a multi-part series, and covers some of the conceptual basicsbehind DILM.

REPLACENULL

Louis Davidson shows a quick SSIS function to replace NULL values:

Which I looked up every..single..time I used it. “?” means THEN…not IF? “:” means ELSE? Huh?  I know this comes from one of those cool languages that I have never mastered, but as I was searching for the syntax again a few days ago, I found REPLACENULL. I had never seen this function before, so I figured I might not be the only one. And perhaps if a commenter feels like telling me how dumb I am to not know about other new expression features I will not be offended. REPLACENULL won’t replace every use of the these and the other symbols one must use for SSIS expressions, it does replace one of the more common ones.

Click through for usage.  It’s a bit easier to understand than the ternary operator.  To answer Louis’s question, a ? b : c comes from C# syntax.

Broken References In SSISDB

Andy Leonard explains how broken environment references can come into being within the SSIS Catalog:

If the reference was broken after the SSIS package execution was scheduled, we may see an error similar to that shown below in the SQL Agent log for the job step that attempted to execute the SSIS package:

Failed to execute IS server package because of error 0x80131904. Server: vmSql16\Test, Package path: \SSISDB\Test\ParametersTest\SensitiveTest.dtsx, Environment reference Id: 35.  Description: The environment ‘env2’ does not exist or you have not been granted the appropriate permissions to access it.

Andy has an explanation of what these are, how you might find them, and how to fix them.

Data Flow Sequence Containers

Todd McDermid is excited about data flow groups in Integration Services:

Data Flow Groups

Data Flow Groups is what they’re calling it, and it’s deceptively simple to use.  One of the reasons I’m sure I (and SSIS people I talk to who DID NOT LET ME KNOW IT WAS THERE) missed it is because I was expecting it to be a component in the toolbox.  Not so.
Code up your Data Flow as you normally would.  Then go and select the components that you want to group together – via clicking and dragging a selection window, or click-selecting components.  Any component combinations you want.  Then right-click and select Group.

I admit that I didn’t know it existed either.  This does seem rather useful.

SSIS Fast Load

Chris Taylor runs into an issue with the OLE DB Destination’s fast load option in Integration Services:

What I do want to bring to your attention is the differences between the two when it comes to redirecting error rows, specifically rows that are truncated. One of the beauties of SSIS is the ability to output rows that fail to import through the error pipeline and push them into an error table for example. With fast load there is a downside to this, the whole batch will be output even if there is only 1 row that fails, there are ways to handle this and a tried and tested method is to push those rows into another OLE DB Destination where you can run them either in smaller batches and keep getting smaller or simply push that batch to run in row-by-row to eventually output the 1 error you want. Take a look at Marco Schreuder’s blog for how this can be done.

One of the issues we have exerienced in the past is that any truncation of a column’s data in fast load will not force the package to fail. What? So a package can succeed when in fact the data itself could potentially not be complete!?! Yes this is certainly the case, lets take a quick look with an example.

Read on for details and potential workarounds.

Test Connection With HDInsight

I have a post trying to test a connection using HDInsight:

WebHCat is a web-based REST API for HCatalog, a management layer for dealing with files in HDFS.  If you’re looking for configuration settings for WebHCat, you’ll want generally to look for “templeton” in config files, as Templeton was the project name before WebHCat.  In Ambari, you can go to the Hive configs and look at webhcat-site.xml for configuration settings.  For WebHCat, the default port in HDInsight is 30111, which you should find in the templeton.port configuration setting.

I don’t like the fact that WebHDFS is blocked, but at least WebHCat is functional.

Alert On SQL Jobs Missing Schedules

Brian Hansen wraps up a three-part series on scheduled job alerts:

The first two parts of this series addressed the general approach that I use in an SSIS script task to discover and alert on missed SQL Agent jobs. With apologies for the delay in producing this final post in the series, here I bring these approaches together and present the complete package.

To create the SSIS, start with an empty SSIS package and add a data flow task. In the task, add the following transformations.

Regardless of how you do it, knowing when jobs fail is important enough to build some infrastructure around answering this question.

Categories

April 2017
MTWTFSS
« Mar  
 12
3456789
10111213141516
17181920212223
24252627282930