Press "Enter" to skip to content

Author: Kevin Feasel

Formatting SSIS Packages

Tim Mitchell shows us different options available when formatting SSIS packages:

Most folks I know go to one extreme or the other when it comes to the visual layout of SQL Server Integration Services packages: either they don’t care a tiny bit about the appearance, or they insist on an easy-to-read layout before a project is considered complete. I am definitely in the second group, and will almost always spend the time to make sure my packages are properly laid out.

Regardless of which of these groups you are in, it is very easy to apply some simple visual formatting to your packages. In this quick tip, I’ll show you how to use the visual formatting options in SSIS to automate the layout.

I consider it quite important—it gives a future viewer an idea of your vision as developer. If I can’t read what you’re doing, that does not bode well.

Comments closed

Deferred Compilation and Compatibility Level 140 Query Hints

Milos Radivojevic shows that table-valued parameters do not care about your QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140 settings:

The plan is created under CL 140, but the estimation number of rows for the table variable is not 1 but the actual one. Even if you would specify the FORCE_LEGACY_CARDINALITY_ESTIMATION hint, the query will be deferred compiled and behavior of table variable would be the same.

Table variable deferred compilation respects settings at the database scope and ignores all hints except one.

Click through to learn what that one hint is.

Comments closed

Automating JSON to Tables

Dave Mason has a stored procedure for us:

The code for my stored procedure is below. It has a single input parameter for a string of JSON data. The name/value pairs of the first “row” of JSON data is parsed to obtain column names and types for the result set. A query string for the OPENJSON function is constructed and executed via EXEC. JSON’s support for data types is pretty sparse compared to SQL Server (notice the big CASE expression where I attempt to do some mapping). If you’re using SQL 2016, you’ll have to make an edit for the STRING_AGG function.

Click through for the code and demos.

Comments closed

Parameters, Variables, and ForEach Loops in ADF

Cathrine Wilhelmsen has a few more posts in the Azure Data Factory series for us. First up is on parameters:

We can build dynamic solutions!

Creating hardcoded datasets and pipelines is not a bad thing in itself. It’s only when you start creating many similar hardcoded resources that things get tedious and time-consuming. Not to mention, the risk of manual errors goes drastically up when you feel like you create the same resource over and over and over again.

After that is variables:

Parameters are external values passed into pipelines. They can’t be changed inside a pipeline. Variables, on the other hand, are internal values that live inside a pipeline. They can be changed inside that pipeline.

Parameters and variables can be completely separate, or they can work together. For example, you can pass a parameter into a pipeline, and then use that parameter value in a set variable or append variable activity.

And the latest post in the series is all about ForEach loops:

By default, the foreach loop tries to run as many iterations as possible in parallel. You can choose to run them sequentially instead, for example if you need to copy data into a single table and want to ensure that each copy finishes before the next one starts.

If you choose to run iterations in parallel, you can limit the number of parallel executions by setting the batch count. The default number is 20 and the max number is 50.

This has been a very nice series, and it looks like there is a little bit more to go.

Comments closed

Multi-Column Transformations with Power Query

Imke Feldmann shows how we can apply a function to multiple columns at once using Power Query:

You can apply simple transformations to multiple columns at once in Power Query using the UI only. In this article I show how you can apply advanced transformations on multiple columns at once instead. You can also use this to use custom functions instead. And lastly for the lazyefficient fans of custom M-functions: You will get a new “TranformAllMyColumnsAtOnceHowILikeIt”-function as well

Read on for a few examples, including one using custom code.

Comments closed

Fixing the Small File Problem in Hadoop

Guy Shilo takes us through the Hadoop Archive format:

It has hard time handling many small files. The memory footprint of the namenodes becomes high as they have to keep track of many small blocks and the performance of scans goes down.

The best way to fix this situation is, of course to avoid it in first place. This can be done when designing the application or the pipeline that inserts the data into HDFS, for example, by bundling many files into one container such as sequencefile, Avro or Hadoop archive (.har file).

Hadoop archive is somewhat overlooked option that I want to demonstrate today. You will see that it can be very useful in some cases but not so great in others.

Read the whole thing before giving it a try, as there are some downsides.

Comments closed

Sticky Partitioning in Kafka 2.4

Justine Olshan takes us through sticky partitioning in Kafka 2.4:

The sticky partitioner addresses the problem of spreading out records without keys into smaller batches by picking a single partition to send all non-keyed records. Once the batch at that partition is filled or otherwise completed, the sticky partitioner randomly chooses and “sticks” to a new partition. That way, over a larger period of time, records are about evenly distributed among all the partitions while getting the added benefit of larger batch sizes.

It looks like this is an improvement with few downside tradeoffs.

Comments closed

Azure Data Factory Templates and Source Control

Cathrine Wilhelmsen continues a series on Azure Data Factory. First up is source control:

And yeah, I usually recommend that you set up source control early in your project, and not on day 18… However, it does require some external configuration, and in this series I wanted to get through the Azure Data Factory basics first. But by now, you should know enough to decide whether or not to commit to Azure Data Factory as your data integration tool of choice.

Next up is using the template gallery:

You can also create custom templates and share them with your team – or share them externally with others. Custom templates are saved in your code repository and will show up in the template gallery for you and your team. If you want to share them externally, you can easily export them, so others can import them in their Azure Data Factory.

Let’s take a look!

Read on to learn more.

Comments closed

SQL Server Truncating Numbers to Asterisks

Bert Wagner points out that some numeric types handle overflow in a weird way:

Why does SQL Server sometimes error when converting a number into a string, but other times succeeds and returns an asterisk?

I don’t know.

The best (and logical) answer I could find online is from Robert Sheldon, who attributes it to poor error handling practices, “…before error handling got a more reputable foothold.”

This makes it important to check your results. I imagine that there’s somebody who relies upon this exact functionality, but it’s pretty weird.

Comments closed