SQL Server 2012 Slipstreaming

Cody Konior looks at slipstreaming SQL Server 2012:

There was a blog post by Boris Hristov which had some good images of the various places to look in the GUI to make sure that the updates have been picked up correctly. Using these and through experimentation I was able to answer those questions.

Interesting questions and good answers.

Bad Fixes

David Alcock looks at a few common “fixes” which end up causing their own problems:

I’m seeing lots of CXPACKETS waits, how do I fix these?

Bad Advice = Set the maximum degree of parallelism to 1, no more CXPACKET waits!

I’m seeing index fragmentation occur quite frequently on some of my indexes, what should I do?

Bad Advice = Set fill factor on the server to 70, no more fragmentation problems!

I’m worried about TempDB contention. What should I do?

Bad Advice = Set the number of files to the number of cores, no more contention issues!

Read the post for better advice.

Power BI And R

Kevin Feasel

2015-12-31

Power BI, R

Jan Mulkens has started a series on combining Power BI and R.

Part 1:

Fact is, R is here to stay. Even Microsoft has integrated R with SQL Server 2016 and it has made R scripting possible in it’s great Azure Machine Learning service.
So it was only a matter of time before we were going to see R integrated in Power BI.

Part 2:

From the previous point, it seems that R is just running in the background and that most of the functionality can be used.

Testing some basic functionality like importing and transforming data in the R visual worked fine.
I haven’t tried any predictive modelling yet but I assume that will just work as well.

Part 3:

So instead of printing “Hello world” to the screen, we’ll use a simple graph to say hello to the world.

First we need some data, Power BI enables us to enter some data in a familiar Excel style.
Just select “Enter Data” and start bashing out some data.

I’m looking forward to the rest of the series.

Populating Lookup Query Using A Variable

Meagan Longoria shows us how to use a variable to populate a lookup query in SSIS:

I already had my data flow populated with the lookup for MSA. I set it to full cache and entered a query in the connection to initially populate the fields that would be returned (simply my lookup query without the where clause).

Next, I opened the data flow properties, located Expressions and clicked on the ellipses to open the Property Expression  Editor. I found the SQLCommand property for my MSA lookup and set it to my package variable that contained my query.

I had issues in the past with full cached lookups and variables.  Fortunately, you can get around a lot of problems with expressions.

Data Compression

Andy Mallon looks at the costs and benefits of data compression:

The obvious benefit is that compressed data takes up less space on disk. Since you probably keep multiple copies of your database (multiple environments, DR, backups, etc), this space savings can really add up. High-performance enterprise-class storage is expensive. Compressing your data to reduce footprint can have a very real benefit to your budget. I once worked on an SAP ERP database that was 12TB uncompressed, and was reduced to just under 4TB after we implemented compression.

My experience with compression is that the benefit vastly outweighs the cost.  Do your own testing, of course.

Installing SSRS On An Existing Clutered Instance

Corey Beck shows how to install SQL Server Reporting Services on an instance which sits on a Windows Failover Cluster:

You will also notice this is the end of your road for this installation without getting a success on this rule since the ‘Next’ button is grayed out.

Sure, you could go back and just create a new instance in the process to install SSRS on this node, but there has to be another way, right?  If only we could skip this rule in the installation….

We can using command prompt!

Knowing how to install SQL Server from the command line (or Powershell) is important; this is just one reason why.

Type 6 Dimensions With BIML

Meagan Longoria shows us type 6 dimensions with BIML:

In my previous post, I provided the design pattern and BIML for a pure Type 2 Slowly Changing Dimension (SCD). When I say “pure Type 2 SCD”, I mean an ETL process that adds a new row for a change in any field in the dimension and never updates a dimension attribute without creating a new row.  In practice, I tend to create more hybrid Type 2 SCDs where updates to some attributes require a new row and others update the value on the existing rows. A similar pattern that I find I implement more often than a pure Type 2 is a Type 6 SCD. A Type 6 SCD builds on the Type 2 technique by adding current attributes alongside the historical attributes so related measures can be grouped by the historical or current dimension attribute values. The only difference between what I call a hybrid Type 2 and a Type 6 is that in the Type 6, there are no Type 1 attributes in the dimension that do not also have a Type 2 version in the dimension to capture the historical values.

Dear Mr. President:  there are too many types these days.  Please eliminate three.  I am NOT a crackpot.

Fill Factor And Fragmentation

Erik Darling wants to know if fill factor affects index fragmentation:

It’s not just queries that reading extra pages can slow down. DBCC CHECKDB, backups, and index and statistics maintenance all have to deal with all those pages. Lowering fill factor without good reason puts you in the same boat as index fragmentation does, except regular maintenance won’t “fix” the problem.

Like everything else, the appropriate fill factor depends upon your context.

Why Data Lakes?

James Serra explains why you might want to use a data lake:

To refresh, a data lake is a landing zone, usually in Hadoop, for disparate sources of data in their native format.  Data is not structured or governed on its way into the data lake.  This eliminates the upfront costs of data ingestion, especially transformation.  Once data is in the lake, the data is available to everyone.  You don’t need a priority understanding of how data is related when it is ingested, rather, it relies on the end-user to define those relationships as they consume it.  Data governorship happens on the way out instead of on the way in.  This makes a data lake very efficient in processing huge volumes of data.  Another benefit is the data lake allows for data exploration and discovery, to find out if data is useful or to create a one-time report.

I’m still working on a “data swamp” metaphor, in which people toss their used mattresses and we expect to get something valuable if only we dredge a little more.  Nevertheless, read James’s article; data lakes are going to move from novel to normal over the next few years.

Strings Are Hard

Kenneth Fisher on varchar versus nvarchar:

In any study of Data Types in SQL Server you are going to have to look at the various string data types. One important component is the difference between nChar vs Char and nVarChar vs VarChar. Most people reading this are probably thinking “Well that’s ridiculously easy.” If you are one of them then I want you to read these two facts about these data types.

Char and VarCharnChar and nVarChar
StoresASCIIUNICODE
SizeAlways one byte per character.Always two bytes per character.

One of these is incorrect. Do you know which one?

The correct answer is “both are wrong.”  Then you get into debates about what a “character” is, how certain languages (like Hebrew and Arabic) have layers of modifiers which modify semantic context, etc. etc.  Strings are probably even harder than dates.

Categories

December 2015
MTWTFSS
« Nov Jan »
 123456
78910111213
14151617181920
21222324252627
28293031