Press "Enter" to skip to content

Author: Kevin Feasel

Control Flow Package Parts

Todd McDermid explains a feature new to Integration Services 2016:

The basic idea behind package parts makes complete sense to a coder – they’re macros.  You take code you’ve used in several places, put it in a separate file that you then include and “expand” in multiple other files.
If you have multiple packages with parts of the Control Flow that are identical – setting up a database in a certain way, sending emails, calling a set of stored procedures, … – then Control Flow Package Parts can help.
The assistance isn’t just limited to the initial coding, either.  Yes – creating a new package with the “duplicate” code is much easier.  But the real gain of Control Flow Package Parts is when your “standard” code needs changes.  Instead of having to edit multiple packages to address the modifications – you only have to alter the package part.  Deploying the project(s) that depend on this part automatically incorporates those improvements.

I’d be a lot more interested in this if Biml weren’t already a better option.  Read on for Todd’s rundown.

Comments closed

Processing 2016 Tabular From SSIS 2014

Meagan Longoria shows how to process a Tabular Model with a compatibility level of 1200 in SQL Server Integration Services 2014:

Attempting to use the AS Processing Task results in the following error: “[Analysis Services Execute DDL Task] Error: This command cannot be executed on database ‘MySSASDB’ because it has been defined with StorageEngineUsed set to TabularMetadata. For databases in this mode, you must use Tabular APIs to administer the database”

The reason for keeping SSAS processing in an SSIS package was because it kept consistent logging throughout their data refresh process. So we set out to find another solution.

Read on for the explanation and the solution.

Comments closed

.NET Producer For Kafka

I build a simple .NET console app to push messages to a Kafka topic:

That’s the core of our code.  The main function instantiates a new Kafka producer and gloms onto the Flights topic.  From there, we call the loadEntries function.  The loadEntries function takes a topic and filename.  It streams entries from the 2008.csv file and uses the ParallelSeq library to operate in parallel on data streaming in (one of the nice advantages of using functional code:  writing thread-safe code is easy!).  We filter out any records whose length is zero—there might be newlines somewhere in the file, and those aren’t helpful.  We also want to throw away the header row (if it exists) and I know that that starts with “Year” whereas all other records simply include the numeric year value.  Finally, once we throw away garbage rows, we want to call the publish function for each entry in the list.  The publish function encodes our text as a UTF-8 bytestream and pushes the results onto our Kafka topic.

All this plus a bonus F# pitch.

Comments closed

DAX Variables

Chris Webb shows how to define variables in DAX:

Variables are the best thing to happen to DAX since, well forever – they are so cool I’m almost ready to like DAX as much as I like MDX. There are already several good articles and blog posts out there describing how to use them (see here and here), but I was looking at a Profiler trace the other day and saw something I hadn’t yet realised about them: you can declare and use variables in the DEFINE clause of a DAX query. Since my series of posts on DAX queriesstill gets a fair amount of traffic, I thought it would be worth writing a brief post showing how this works.

There are some limitations, but Chris shows a way of getting around one of them.

Comments closed

Hive And Impala

Carter Shanklin and Nita Dembla run a performance comparison of Hive LLAP versus Impala:

Before we get to the numbers, an overview of the test environment, query set and data is in order. The Impala and Hive numbers were produced on the same 10 node d2.8xlarge EC2 VMs. To prepare the Impala environment the nodes were re-imaged and re-installed with Cloudera’s CDH version 5.8 using Cloudera Manager. The defaults from Cloudera Manager were used to setup / configure Impala 2.6.0. It is worth pointing out that Impala’s Runtime Filtering feature was enabled for all queries in this test.

Data: While Hive works best with ORCFile, Impala works best with Parquet, so Impala testing was done with all data in Parquet format, compressed with Snappy compression. Data was partitioned the same way for both systems, along the date_sk columns. This was done to benefit from Impala’s Runtime Filtering and from Hive’s Dynamic Partition Pruning.

I’m impressed with both of these projects.

Comments closed

String Trimming

Richie Lee has a Powershell cmdlet to trim a string:

When building up urls from different parameters in something like TeamCity, or Octopus, it’s simple enough to get double “//” in urls if the parameters are not consistent. So little helper functions are always useful to have imported to manage such things. Below is an example of such a thing!

Click through for the function.

Comments closed

Azure Data Lake Analytics Units

Yan Li explains the Azure Data Lake Analytics Unit:

An Azure Data Lake Analytics Unit, or AU, is a unit of computation resources made available to your U-SQL job. Each AU  gives your job access to a set of underlying resources like CPU and memory. Currently, an AU is the equivalent of 2 CPU cores and 6 GB of RAM. As we see how people want to use the service, we may change the definition of an AU or more options for controlling CPU and memory usage.

How AUs are used during U-SQL Query Execution

When you submit a U-SQL script for execution, the U-SQL compiler parallelizes the U-SQL script into hundreds or even thousands of tasks called vertices. Each vertex is allocated to one AU. The AU is dynamically allocated to the task and released once that particular task is completed.

I appreciate the ADL team’s transparency in how they define a unit.  It’s much nicer to be able to tell someone that an AU is 2 CPU cores + 6 GB of RAM, rather than saying it’s some fuzzy measure of CPU + memory + I/O which has no direct bearing on your operations.

Comments closed

SSISDB Maintenance

Jesse Seymour shows how to trim the SSIS catalog size:

The options we are interested in are OPERATION_CLEANUP_ENABLED and RETENTION_WINDOW.  By default, RETENTION_WINDOW is 365. and OPERATION_CLEANUP_ENABLED is TRUE.

Since we want to set our retention window to 10 days, we need to update RETENTION_WINDOW to 10.  We could do this with a simple update statement, but Microsoft provides us with a stored procedure that will do that for us.  The benefit of the stored procedure over the UPDATE statement is that a vendor-provided stored procedure will typically encapsulate any additional steps required.

I do not at all like the idea of running SHRINKDATABASE and definitely wouldn’t have that plus a backup in the deletion loop, but if you get caught in a nasty situation with SSISDB, this can serve as the starting point for digging yourself out.

Comments closed

Linked Server To Access

Jana Sattainathan walks through issues with setting up a linked server connection to an Access database:

Normally, it is easy enough to setup a Linked Server on SQL Server to other data sources. Problems are usually caused by one of the usual culprits that have to be addressed

  • SQL Logins simply do not work well when trying to do this type of setup

  • The Windows login has to have permissions to the file (on a drive or network share)

  • The appropriate drivers have to be setup (64 bit / 32 bit)

Read on for a few different errors and their solutions.

Comments closed

Database Throughput Units

Randolph West looks at the Azure Database Throughput Unit Calculator:

The DTU Calculator, a third-party service created by Justin Henriksen (a Microsoft employee), will calculate the DTU requirements for our on-premises database that we want to migrate to Azure, by firstly capturing a few performance monitor counters, and then performing a calculation on those results, to provide the recommended service tier for our database.

Justin provides a command-line application or PowerShell script to capture these performance counters:

  • Processor – % Processor Time

  • Logical Disk – Disk Reads/sec

  • Logical Disk – Disk Writes/sec

  • Database – Log Bytes Flushed/sec

For more details on DTUs, John Sterrett looks at the math.

Comments closed