CISL 1.3.1

Niko Neugebauer has released the newest version of his columnstore index library:

Here is the small description of what is new in this release:

  • The database snapshots (.dacpac) for all platforms are now included in the Releases\DacPacs.

  • Includes new Powershell functions for installing and removing CISL from the Instances:

    • Install-CISL.ps1 will allow you to install the CISL at multiple databases of a SQ Server Instance (or Azure SQLDB).
    • Remove-CISL.ps1 will allow you to remove the CISL from the multiple databases of a SQL Server Instance (or Azure SQLDB).
  • Support for the different collation is included.

  • Includes information on all recent SQL Server updates.

  • Included support of the new Columnstore Indexes Trace Flags in SQL Server 2016.

  • Basic Unit Tests (based on t-sqlt) are included for SQL Server 2012 & SQL Server 2014, guaranteeing the quality of the released code.

  • A good number of bug fixes.

  • Further parameter enhancements for the existing functions.

Sounds like there’s a lot packed into this release.

Deploying SSIS Packages In VS 2015

Neil Gelder notes that you can deploy different versions of SSIS packages using Visual Studio 2015:

For years I’ve dream’t of having one set of tools for developing SSIS packages! not a lot to ask really and  great step towards this from Microsoft was decoupling the development IDE from the main SQL Server install to produce the standalone SSDT (SQL Server data tools)

But like most people I work in an environment which has legacy versions for SQL Server in production, but equally like most tech folk (giddy kids wanting new toys) I always try and use the most current and exciting  version of VS.  This however proves a problem when developing for SSIS, for example if you developed a SSIS package in VS 2013 you’d not be able to deploy this correctly to a SQL Server 2012 version of Integration services catalog.  In the past this resulted in having two IDE’s installed, SSDT 2012 (VS shell) for any 2012 catalog development and VS 2013 installed for other work.

I had one person mention during a talk I gave that this isn’t foolproof, but my experience (limited to SQL Server 2012 and 2014) was that deployment worked fine.  As always, test before making changes.

Troubleshooting Data Factory Errors

Ginger Grant discusses Azure Data Factory errors:

Unfortunately, while developing Data Factory I became very familiar with errors. All of the errors show up at the end and provide very little insight as to what in the process failed. Here’s an example.

Database operation failed on server ‘Sink:DBName01.database.windows.net’ with SQL Error Number ‘40197’. Error message from database execution : The service has encountered an error processing your request. Please try again. Error code 4815. A severe error occurred on the current command. The results, if any, should be discarded.

This sounds like classic Microsoft error messages:  “An error occurred.  Here is a code you can put into Google and hope desperately that someone has already figured out the answer.  Good luck!”

Thoughts On Linked Servers

Dave Mason defends the honor of linked servers:

I seem to be in the minority when it comes to SQL Server linked servers. When it’s another SQL Server instance on the other end, I quite like them for administrative purposes. But other SQL pros have some reservations and gripes. I’ve even seen the word “hate” thrown around freely. Most of the complaints seem to fall into one of these categories: poor performance, insufficient permissions, poorly configured security, and challenges related to remote execution of queries.

I think Dave’s reasoning makes a lot of sense.  Linked servers are not themselves evil.  I think it’s likely a mistake to incorporate them into your mainline application (a mistake I’ve made in the past), but for the kinds of administrative tasks Dave mentions, it’s certainly not a bad idea.

Thoughts On Dynamic Data Masking

John Martin on Dynamic Data Masking:

For the rest of this blog post, I will be working with the following scenario:

  • I have an SSRS Server, hosting a number of reports that display information about my SQL Server estate. From performance metrics through to details of failed jobs and poorly performing queries. I want to add an additional layer of security, restricting who can see the names of servers, databases, and other internal infrastructure information. Permission to view these reports will be granted to both support teams and business users, with the business users not being permitted to see the sensitive data.

John is much more optimistic about this feature than I am.

Hive Data Ingestion In AWS

Kevin Feasel

2016-08-26

Hadoop

Songzhi Liu shows how to use the AWS stack to move data into Hive:

S3 bucket
In this framework, S3 is the start point and the place where data is landed and stored. You will configure the S3 bucket notifications as the event source that triggers the Lambda function. When a new object is stored/copied/uploaded in the specified S3 bucket, S3 sends out a notification to the Lambda function with the key information.

Lambda function
Lambda is a serverless technology that lets you run code without a server. The Lambda function is triggered by S3 as new data lands and then adds new partitions to Hive tables. It parses the S3 object key using the configuration settings in the DynamoDB tables.

DynamoDB table
DynamoDB is a NoSQL database (key-value store) service. It’s designed for use cases requiring low latency responses, as it provides double-digit millisecond level response at scale. DynamoDB is also a great place for metadata storage, given its schemaless design and low cost when high throughput is not required. In this framework, DynamoDB stores the schema configuration, table configuration, and failed actions for reruns.

EMR cluster
EMR is the managed Hadoop cluster service. In the framework, you use Hive installed on an EMR cluster.

This is a detailed post, but well worth a read if you’re on AWS.

Logging Python-Style

Kevin Feasel

2016-08-26

R

Jonathan Callahan wants to generate nice-looking logs in R:

Our real world scenario involves R scripts that process raw smoke monitoring data that is updated hourly. The raw data comes from various different instruments, set up by different agencies and transmitted over at least two satellites before eventually arriving on our computers.

Data can be missing, delayed or corrupted for a variety of reasons before it gets to us. And then our R scripts perform QC based on various columns available in the raw (aka “engineering level”) data.

Logging is one of the differences between toy code (even very useful toy code) and production-quality code.  Read on for an easy way to do this in R.

R And Power BI

Kevin Feasel

2016-08-26

Power BI, R

David Smith points out some of the work the Power BI team has done to integrate R into their product:

Power BI, Microsoft’s data visualization and reporting platform, has made great strides in the past year integrating the R language. This Computerworld article describes the recent advances with Power BI and R. In short, you can:

Click through for more things you can do, as well as additional links and resources.

Database Containment Checks

Parikshit Savjani discusses a nice little performance optimization in the latest versions of SQL Server 2014 and 2016:

Starting SQL 2014 SP1 CU8, SQL 2014 SP2 CU1 and SQL 2016 CU1, the spinlock to check the database containment property is replaced by the “load acquire and store release” lock semantics, which is a non-blocking lock-free synchronization mechanism between the concurrent threads. This avoids exclusive spinlocks and thereby avoids the spinlock collisions between the concurrent threads executing stored procedures from same database as described earlier. This change improves the overall concurrency and scalability of the system especially if all the worker threads are simultaneously executing a stored procedure from same database.

On extremely busy systems, this might be a reason to update.

Documenting A Data Warehouse

Jesse Seymour discusses a few forms of documentation for a data warehouse:

Extended properties are a great way to internally document the data warehouse.  The key advantage here is that the values of these extended properties can be retrieved with a T-SQL query.  This allows us to access this information with a view as needed.  My favorite method of using this is to create an SSRS report that end users can run to look up the attributes and comments I store in the extended property.  Data warehouse tools take some of the pain out of the process.  Unfortunately, not all tools support use of extended properties.  Make sure your tool does or consider changing tools.  Be sure to document the names and use cases for each property you create.  Consistency is the key to the value here.

I’ve never been a big fan of extended properties, mostly because I typically don’t work with tools which expose that information easily.  Regardless, there are other important forms of documentation, so read on.

Categories

November 2018
MTWTFSS
« Oct  
 1234
567891011
12131415161718
19202122232425
2627282930