Press "Enter" to skip to content

Author: Kevin Feasel

Populating Lookup Query Using A Variable

Meagan Longoria shows us how to use a variable to populate a lookup query in SSIS:

I already had my data flow populated with the lookup for MSA. I set it to full cache and entered a query in the connection to initially populate the fields that would be returned (simply my lookup query without the where clause).

Next, I opened the data flow properties, located Expressions and clicked on the ellipses to open the Property Expression  Editor. I found the SQLCommand property for my MSA lookup and set it to my package variable that contained my query.

I had issues in the past with full cached lookups and variables.  Fortunately, you can get around a lot of problems with expressions.

Comments closed

Data Compression

Andy Mallon looks at the costs and benefits of data compression:

The obvious benefit is that compressed data takes up less space on disk. Since you probably keep multiple copies of your database (multiple environments, DR, backups, etc), this space savings can really add up. High-performance enterprise-class storage is expensive. Compressing your data to reduce footprint can have a very real benefit to your budget. I once worked on an SAP ERP database that was 12TB uncompressed, and was reduced to just under 4TB after we implemented compression.

My experience with compression is that the benefit vastly outweighs the cost.  Do your own testing, of course.

Comments closed

Installing SSRS On An Existing Clutered Instance

Corey Beck shows how to install SQL Server Reporting Services on an instance which sits on a Windows Failover Cluster:

You will also notice this is the end of your road for this installation without getting a success on this rule since the ‘Next’ button is grayed out.

Sure, you could go back and just create a new instance in the process to install SSRS on this node, but there has to be another way, right?  If only we could skip this rule in the installation….

We can using command prompt!

Knowing how to install SQL Server from the command line (or Powershell) is important; this is just one reason why.

Comments closed

Type 6 Dimensions With BIML

Meagan Longoria shows us type 6 dimensions with BIML:

In my previous post, I provided the design pattern and BIML for a pure Type 2 Slowly Changing Dimension (SCD). When I say “pure Type 2 SCD”, I mean an ETL process that adds a new row for a change in any field in the dimension and never updates a dimension attribute without creating a new row.  In practice, I tend to create more hybrid Type 2 SCDs where updates to some attributes require a new row and others update the value on the existing rows. A similar pattern that I find I implement more often than a pure Type 2 is a Type 6 SCD. A Type 6 SCD builds on the Type 2 technique by adding current attributes alongside the historical attributes so related measures can be grouped by the historical or current dimension attribute values. The only difference between what I call a hybrid Type 2 and a Type 6 is that in the Type 6, there are no Type 1 attributes in the dimension that do not also have a Type 2 version in the dimension to capture the historical values.

Dear Mr. President:  there are too many types these days.  Please eliminate three.  I am NOT a crackpot.

Comments closed

Fill Factor And Fragmentation

Erik Darling wants to know if fill factor affects index fragmentation:

It’s not just queries that reading extra pages can slow down. DBCC CHECKDB, backups, and index and statistics maintenance all have to deal with all those pages. Lowering fill factor without good reason puts you in the same boat as index fragmentation does, except regular maintenance won’t “fix” the problem.

Like everything else, the appropriate fill factor depends upon your context.

Comments closed

Why Data Lakes?

James Serra explains why you might want to use a data lake:

To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format.  Data is not structured or governed on its way into the data lake.  This eliminates the upfront costs of data ingestion, especially transformation.  Once data is in the lake, the data is available to everyone.  You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it.  Data governorship happens on the way out instead of on the way in.  This makes a data lake very efficient in processing huge volumes of data.  Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.

I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more.  Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.

Comments closed

Strings Are Hard

Kenneth Fisher on varchar versus nvarchar:

In any study of Data Types in SQL Server you are going to have to look at the various string data types. One important component is the difference between nChar vs Char and nVarChar vs VarChar. Most people reading this are probably thinking “Well that’s ridiculously easy.” If you are one of them then I want you to read these two facts about these data types.

Char and VarCharnChar and nVarChar
StoresASCIIUNICODE
SizeAlways one byte per character.Always two bytes per character.

One of these is incorrect. Do you know which one?

The correct answer is “both are wrong.”  Then you get into debates about what a “character” is, how certain languages (like Hebrew and Arabic) have layers of modifiers which modify semantic context, etc. etc.  Strings are probably even harder than dates.

Comments closed

Things A Junior DBA Should Know

Kendra Little has a list of three things a junior DBA should know:

Confession: I was a Junior DBA for a long time before I had a clue about this. It’s not unusual– many DBAs pick up existing databases and it’s natural to accept that the settings are correct.

Except, usually they aren’t. Usually, the last person who set them up just kinda guessed.

Guess what? You’re responsible for whatever they guessed.

Kendra’s three items are definitely junior-level, but we all start somewhere.

Comments closed

Columnstore In 2016

Niko Neugebauer has two new posts up on columnstore index changes with SQL Server 2016.

First, row group merging with clustered columnstore indexes:

Row Group merging & cleanup is a very long waited improvement that came out in SQL Server 2016. Once Microsoft has announced this functionality, everyone who has worked with SQL Server 2014 & Clustered Columnstore Indexes has rejoiced – one of the major problems with logical fragmentation because of the deleted data is solved! Amazing!
Just as a reminder – logical fragmentation is the process when we mark obsolete data in the Deleted Bitmap (in Columnstore Indexes there is no direct data removal from the compressed Segments with Delete command and Update command uses Deleted Bitmap as well marking old versions of rows as deleted).

Second, Stretch DB with columnstore:

Stretch DB or alternatively Stretch Database is a way of spreading your table between SQL Server (on-premises, VM in Azure) and a Azure SQLDatabase. This means that the dat of the table will shared between the SQL Server and the Azure SQLDatabase giving the opportunity to lower the total cost of the local storage, since Azure SQLDatabase is cheap relatively expensive storage typically used on the local SQL Server installations.
This mean that the table data will be separated intoHot Data & Cold Data, where Hot Data is the type of data that is frequently accessed and it extremely important (this is typically some OLTP data) and the Cold Data (this is typically rarely or almost never accessed archival or log data).
For the final user the experience should be the same as before – should he ask for some data that is not on the SQL Server, then it will be read from Azure SQLDatabase by the invocation of remote query, joined with the local results (if any) and then presented to the user.

These two posts are must-reads if you work with columnstore indexes.

Comments closed