Press "Enter" to skip to content

Month: November 2016

Polybase With Compression

I have a post looking at Polybase support for different compression formats:

This is a very interesting set of results.  First, 7Zip archived files do not work with the default encoding.  I’m not particularly surprised by this result, as 7Zip support is relatively scarce across the board and it’s a niche file format (though a very efficient format).

The next failure case is tar.  Tar is a weird case because it missed the first row in the file but was able to collect the remaining 776 records.  Same goes for .tar.gz.  I unpackaged the .tar file and the constituent SecondBasemen.csv file did in fact have all 777 records, so it’s something weird about the codec.

Stick to BZip2 and GZip if you’re using flat files.

Comments closed

Flow From Power BI

Chris Webb uses Microsoft Flow’s REST API to pass messages via Power BI:

So basically, in this case I’ve used Flow to create a web service without writing a single line of code. I can see a lot of potential uses for this and I suspect I’ll be blogging about Flow a lot in the future. A word of warning though: do not try to use this as a way of updating a data source. As I mentioned last time, when you run your query you’ll find Power Query/Power BI calls the web service twice. For example, I created a Flow similar to the one above that used the Insert Row step to take text sent to a Request trigger and add it to a table in an Excel workbook, and of course every time I refreshed my query I got two identical rows in my Excel table.

Read the whole thing.

Comments closed

Building Your 2016 VM

James Serra looks at putting together a VM for 2016:

With Windows Server 2016 just been released, now is the perfect time to build an Azure VM with SQL Server 2016 on Windows Server 2016.  In a matter of minutes you can be playing and learning both platforms.  Below I will document the steps I took to build the VM along with the additional software I installed.  This is a fully-loaded VM that I use for demo’s and to build small projects:

(Software updates as of 11/4/2016)

There’s a lot of good software here.  And Java.

Comments closed

Considerations For Backups

Kevin Hill goes back to basics on backups:

Backing up SQL Server databases has been documented in thousands of books, Microsoft documentation and blog posts since the product was released in the 90s.

I’m not about to try and tell you HOW to backup your SQL databases. Rather, I want to take you through the basic considerations of deciding what to back up, when, how often, etc. in clear language that non-SQL folks can work through.

Understanding how to take and tune backups is important; this is more fundamental advice.

Comments closed

Making Cortana Work

Rob Sewell shows how to integrate Cortana with dbareports via Power BI:

When you start using Cortana to query your data you will find that at times it is very frustrating. My wife was in fits of giggles listening to me trying to record the video below as Cortana refused to understand that I was saying “servers” and repeatedly searched Bing for “service” Whilst you can negate the effect by using the alternate names for the Q and A settings it is still a bit hit and miss at times.

This is an interesting post.

Comments closed

External File Formats

I look at file formats in Polybase:

Delimited text is exactly as it sounds:  you can use a comma, tab, pipe, tilde, or any other delimiter (including multi-character delimiters).  So let’s go through the options here.  First, FORMAT_TYPE must be DELIMITEDTEXT.  From there, we have a few FORMAT_OPTIONS.  I mentioned FIELD_TERMINATOR, which is how we separate the values in a record.  We can also use STRING_DELIMITER if there are quotes or other markers around our string values.

DATE_FORMAT makes it easier for Polybase to understand how dates are formatted in your file.  The MSDN document gives you hints on how to use specific date formats, but you can’t define a custom format today, or even use multiple date formats.

It feels like there’s a new Hadoop file format every day.

Comments closed

Temporal Tables For Error Recovery

Dimitri Furman looks at using temporal tables to recover from user errors:

Have you ever got that sinking feeling after hitting the Execute button in SSMS, thinking “I should not have done that”? DML statements with missing WHERE clause, DROP statements accidentally targeting slightly mistyped (but existing) tables or databases, RESTORE statements overwriting databases with new data that haven’t been backed up, are all examples of actions prompting an “Oops…” (or worse) shortly thereafter. “Oops recovery” is the term that became popular to describe the process of fixing the consequences.

For most of these scenarios, the usual, and often the only, recovery mechanism is to restore the database from backup to a point in time just before the “oops”, known as point-in-time recovery (PITR). Even though PITR remains the most general and the most effective recovery mechanism, it does have some drawbacks and limitations: the recovery process requires a full database restore, taking the time proportional to the size of the database; a sequence of restores may be needed if multiple “oops” transactions have occurred; in the general case, there will be difficulties reconciling recovered data with data modified after the “oops” point in time, etc. Nevertheless, PITR remains the most widely applicable recovery method for SQL Server databases, both on-premises and in the cloud.

The solution entails a non-trivial amount of effort, but it is interesting.

Comments closed

Sparklyr On HDInsight

Ali Zaidi has a walkthrough on using sparklyr on HDInsight:

The majority of Spark is written in Scala (~80% of Spark core), which is a functional programming language. Functional programming languages emphasize functional purity (the output only depends on the inputs) and strive to avoid side-effects. One important component of most functional programming languages is their lazy evaluation. While it might seem odd that we would appreciate laziness from our computing tools, lazy evaluation is an effective way of ensuring computations are evaluated in the most efficient manner possible.

Lazy evaluation allows Spark SQL to highly optimize the queries. When a user submits a query to Spark SQL, Spark composes the components of the SQL query into a logical plan. The logical plan is basically a recipe Spark SQL creates in order to evaluate the desired query. Spark SQL then submits the logical plan to its highly optimized engine called Catalyst, which optimizes this plan into a physical plan of action that is executed inside Spark computation engine (a series of coordinating JVMs).

Read on for more description and code.

Comments closed