Press "Enter" to skip to content

Author: Kevin Feasel

Unboxing ISPACs

It’s an early Christmas for Richie Lee:

The first file that we’re going to look at is the [Content_Types].xml file, and this is the file that confirms that the ZipPackage class is used. There’s an article here that is ten years old but is still valid (scroll down to the System.IO.Packaging INcludes Zip Support to read up on this.) This is because we know that the content_types file is part of the output when using the ZipPackage class to zip up a bunch of files into a .zip. The content_file contains both the extension and content type of the three other files that are included in the ispac:

  • dtsx
  • params
  • manifest

Note that the content_types file does not specify the files, either in quantity or in content, other than the fact that they will contain xml.

Read on for a good amount of detail on what’s included in an Integration Services package.

Comments closed

No Hierarchies In SSAS DirectQuery

Thomas LeBlanc points out that Analysis Services Tabular’s DirectQuery functionality does not include hierarchies on dimensions:

You can go to this like from Microsoft about more limitations.

There was also difference in labeling when designing a Tabular Model in Visual Studio (SQL Server Data Tools – SSDT) and making a change to the Model from SQL Server Management Studio – SSMS.

In SSDT, The Property for the Model is DirectQuery and the values are On and Off.

It’s certainly not the end of the world, but I can see it being a bit annoying to deal with.

Comments closed

I/O Read-Ahead In SQL Server

Kendra Little shows how read-ahead works, using the example of index seeks:

Looking at an actual execution plan, I dig into the index seek operator and it shows me information about the physical IO. Almost all of the requests were read-ahead reads.

Read-ahead is a mechanism that SQL Server can use when it’s pulling a lot of information from disk. Instead of pulling 8K pages onesy-twosy-threesy, SQL Server can suck up big chunks of pages from disk with a vacuum cleaner.

If you’re running developer or enterprise edition, you may get a larger vacuum cleaner.

Read-ahead is a good piece of functionality, but those reads still have a cost associated, and the cheapest read is the read you don’t do.

Comments closed

Finding Progress On A Long-Running Statement

David Fowler shows us how to track how far we’ve gotten on a long-running data modification statement:

Well, it would turn out that there is and to find out we need to turn to our trusty friend, the transaction log.

As we know, the transaction log will squirrel away an entry each time that a row is modified.  We can count up all the LOP_MODIFY_ROW, LOP_INSERT_ROW and LOP_DELETE_ROWS entries for our transaction and that will tell us just how many rows our transaction has altered so far.

Click through for a script, as well as an important disclaimer.

Comments closed

Finding Broken Views

Bill Fellows has a script to test each view to see if it is broken:

Shh, shhhhhh, we’re being very very quiet, we’re hunting broken views. Recently, we were asked to migrate some code changes and after doing so, the requesting team told us we had broken all of their views, but they couldn’t tell us what was broken, just that everything was. After a quick rollback to snapshot, thank you Red Gate SQL Compare, I thought it’d be enlightening to see whether anything was broken before our code had been deployed.

You’ll never guess what we discovered.

Read on to see what they discovered (spoilers:  broken views) and how Bill fixed the problem.

Comments closed

Using Akka In A Streaming Solution

Artem Rukavytsia shows us how you can easily integrate Akka into a solution with Kafka and Spark Streaming:

Akka gives you the opportunity to make logic for producing/consuming messages from Kafka with the Actor model. It’s very convenient if actors are widely used in your code and it significantly simplifies making data pipelines with actors. For example, you have your Akka Cluster, one part of which allows you to crawl of web pages and the other part of which makes it possible to index and send indexed data to Kafka. The consumer can aggregate this logic. Producing data to Kafka looks as follows:

The Actor model, which Akka implements, is something I kind of understand, but have never spent much time trying to implement.  I can see how it’d make perfect sense communicating with Kafka, though, given the scale and independence of consumers within a consumer group that Kafka provides.

Comments closed

What’s New In Spark 2.2?

Geetika Gupta shows us some of the updates in Apache Spark 2.2:

The major addition to this release is Structured Streaming. It has been marked as production ready and its experimental tag has been removed.

Some of the high-level changes and improvements :

  • Production ready Structured Streaming

  • Expanding SQL functionalities

  • New distributed machine learning algorithms in R

  • Additional Algorithms in MLlib and GraphX

Read on for more details.

Comments closed

Errors Using Native Prediction In SQL Server

Sacha Tomey walks us through a few potential issues when converting code which uses SQL Server Machine Learning Services’s sp_execute_external_script procedure to native PREDICT calls:

Stumble One:

Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001. 
Model type is unsupported.

Reason:

Not all models are supported. At the time of writing, only the following models are supported:

  • rxLinMod
  • rxLogit
  • rxBTrees
  • rxDtree
  • rxdForest

sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).

sp_rxPredict does require CLR, but it’s a viable alternative if you need to use a model not currently supported—like rxNeuralNet.

Comments closed

More On The New Service Model

Randolph West summarizes the new SQL Server patching model:

  • Every twelve months after GA, the installation files will be updated to contain all the Cumulative Updates in what is effectively now a service pack, but won’t be called that. This will also become the slipstream update. In other words, you’re more likely to be up to date when installing from scratch, later in the release cycle.

  • Customers on the GDR (General Distribution Release) release cycle will only get important security and corruption fixes, as before. You can switch to the standard CU release cadence any time, but once you do, you can’t switch back to GDR.

Brent Ozar thinks CU12 might become the new SP1 in the minds of managers:

So now fast forward to late 2018, early 2019. You’re about to build a new SQL Server for a project, and you have two choices:

  • SQL Server 2018 – which is basically the new dev branch, getting monthly updates, or
  • SQL Server 2017 (or 2016, or 2014) – which is the stable branch, getting quarterly updates

Once a version has hit CU12, and it only gets updates once a quarter, it might be considered Good Enough For Our Apps. Managers might see 2017/2016/2014 interchangeably at that point – which might be great for the second most recent version’s adoption.

It will be interesting to see how companies adopt this new model.

Comments closed

Active Directory On CentOS

Drew Furgiuele shows how to configure a box running CentOS to work with Active Directory:

Before we start though, there’s a few things you’re going to need to have already set up:

  • An Active Directory Domain to test in, and rights to administer it. Since we’re going to be creating (and possibly deleting, if there are errors) computer objects and a service account, you’ll need a domain account with adequate permissions.

  • My example assumes you have a Microsoft DNS server running alongside your domain services. It is possible to use a separate DNS server to get this to work, but you might need some additional network configuration (see below). Also, depending on your environment, you might need a reverse lookup zone defined. If you notice long ping times or other weird lookups, I’d set one up in your DNS.

  • A machine (virtual or otherwise) that is running CentOS 7 or later (and this guide was written and tested against CentOS 7). For this demo, we’ll be using the Server (minimal install) installation option.  If you’re new to Linux, you might opt a desktop version (server with a GUI). When you download a CentOS disk image to install it, you get all these options on the default media; you won’t need separate downloads

There are a few more prereqs, so read the whole thing.  This route is easier than Ubuntu, as Drew notes.

Comments closed