Press "Enter" to skip to content

Author: Kevin Feasel

External Tables To Hadoop

I have a post looking at creating external tables in Polybase to hit a Hadoop folder:

The DATA_SOURCE and DATA_FORMAT options are easy:  pick you external data source and external file format of choice.

The last major section deals with rejection.  We’re going from a semi-structured system to a structured system, and sometimes there are bad rows in our data, as there are no strict checks of structure before inserting records.  The Hadoop mindset is that there are two places in which you can perform data quality checks:  in the original client (pushing data into HDFS) and in any clients reading data from HDFS.  To make things simpler for us, the Polybase engine will outright reject any records which do not adhere to the quality standards you define when you create the table.  For example, let’s say that we have a Age column for each of our players, and that each age is an integer.  If the first row of our file has headers, then the first row will literally read “Age” and conversion to integer will fail.  Polybase rejects this row (removing it from the result set stream) and increments a rejection counter.  What happens next depends upon the reject options.

Creating an external table is pretty easy once you have the foundation prepared.

Comments closed

Growing New Speakers

Andy Yun hosted this month’s T-SQL Tuesday and it was a huge success:

Welcome to this month’s T-SQL Tuesday Round-Up! A few weeks ago, I sent out a call for bloggers and must say that I’m utterly blown away by the response. A whopping FORTY bloggers responded last week with contributions for Growing New Speakers!  Four – zero!  You people are all amazing!!!

There’s a lot to read here.  If you’ve ever thought about speaking, give it a try; there are 40 people trying to convince you this month.

Comments closed

SQL Server In Containers

Andrew Pruski shows how to install Docker on Windows Server 2016 and pull down a SQL Express container:

But what about connecting remotely? This isn’t going to be much use if we can’t remotely connect!

Actually connecting remotely is the same as connecting to a named instance. You just use the server’s IP address (not the containers private IP) and the non-default port that we specified when creating the container (remember to allow access to the port in the firewall).
Easy, eh?

Containers are great, though I do have trouble wrapping my head around containerized databases and have had struggles getting containerized Hadoop to work the way I want.

Comments closed

DBCC OPTIMIZER_WHATIF

Derik Hammer shows how to use DBCC OPTIMIZER_WHATIF to get an idea of how your query would run with different hardware:

DBCC OPTIMIZER_WHATIF can be used to pull down your resources or augment them. Often the differences in the execution plans have to do with parallelism and memory grants. This is an example of an execution plan running on an under powered development machine.

This is a good tool to help figure out what an execution plan probably would look like in production when your test environment is much smaller.

Comments closed

SQL Server R Service Users

John Pertell shows how to figure out which user account is running SQL Server R Services code:

You’re not running as yourself, even though that’s the account you signed into SSMS as.

You’re not running under the server account that SQL or SQL Launchpad run under.

You’re running as a new account created when you installed SQL R Service In Database for the purpose of running R code.

John also looks at a couple ways of showing which user is running this code and notes that this solves his file share issue.

Comments closed

Get-DbaTcpPort

Steve Jones looks at one Powershell function inside dbatools:

I like using PoSh for some tasks, especially when I don’t have an easy way to do something in SSMS or want to run a task across a variety of instances. In this case, as I glanced through the September updates, I found a good one.

Get-DbaTcpPort

I don’t love the mixed naming, and I’ll get used to it, but I do love the autocomplete in PoSh.

Steve has lots of screenshots walking you through this function.

Comments closed

Tar And Polybase

I look at what the deal is with Polybase and Tar files:

The select statement returned 3104 records, exactly 4 shy of the 3108 I would have expected (777 * 4 = 3108).  In each case, the missing row was the first, meaning when I search for LastName = ‘Turgeon’ (the first player in my data set), I get zero rows.  When I search for another second basemen in the set, I get back four rows, exactly as I would have expected.

What’s really interesting is the result I get back from Wireshark when I run a query without pushdown:  it does actually return the row for Casey Turgeon.

This isn’t an ideal scenario, but it did seem to be consistent in my limited testing.

Comments closed

Ways To Crash Elasticsearch

Roi Ravhon shows how to take down an Elasticsearch instance:

Cardinality aggregation is used to count distinct values in a data set. For example, if you want to know the number of IPs used in your system, you can use this aggregation on an IP field and then count the results.

Despite the usefulness, cardinality can also be a touchy Elasticsearch feature to use. Performing a unique count on a field with a multitude of possible values when configuring a visualization, for example, can bring Elasticsearch to a halt.

Most of it comes down to writing good queries.  But if you don’t know what good Elasticsearch queries look like, read on.

Comments closed

Index Create Dates

Kenneth Fisher looks to see when his indexes were created (or at least updated):

SQL Server stores a create date and a change date for each object in the sys.objects system view.

Unfortunately while tables, views and even constraints are objects, indexes are not. Or at least they aren’t stored in the sys.objects system view. And the sys.indexes system view doesn’t have any dates associated with it. So how do we get the create/update date on an index? Well, short answer is you don’t. Long answer is that in some cases you can get some information.

These aren’t ideal answers, but they can be better than nothing.

Comments closed