Press "Enter" to skip to content

Author: Kevin Feasel

Polybase With HDP 2.5

I run into some issues with Polybase and Hortonworks Data Platform 2.5:

First, it’s interesting to note that the Polybase engine uses “pdw_user” as its user account.  That’s not a blocker here because I have an open door policy on my Hadoop cluster:  no security lockdown because it’s a sandbox with no important information.  Second, my IP address on the main machine is 192.168.58.1 and the name node for my Hadoop sandbox is at 192.168.58.129.  These logs show that my main machine runs a getfileinfo command against /tmp/ootp/secondbasemen.csv.  Then, the Polybase engine asks permission to open /tmp/ootp/secondbasemen.csv and is granted permission.  Then…nothing.  It waits for 20-30 seconds and tries again.  After four failures, it gives up.  This is why it’s taking about 90 seconds to return an error message:  it tries four times.

Aside from this audit log, there was nothing interesting on the Hadoop side.  The YARN logs had nothing in them, indicating that whatever request happened never made it that far.

Here’s hoping there’s a solution in the future.

Comments closed

ReaderWriterSpinlock

Ewald Cress looks at the new ReaderWriterSpinlock in SQL Server 2016 CU2:

As a quick refresher, a traditional SQLOS spinlock is a 32-bit integer, or of course 64-bit as of 2016, with a value of either zero (lock not acquired) or the 32-bit Windows thread ID of the thread that owns it. All very simple and clean in terms of atomic acquire semantics; the only fun part is the exponential backoff tango that results from a collision.

We have also observed how the 2016 flavour of the SOS_RWLock packs a lot of state into 64 bits, allowing more complicated semantics to be implemented in an atomic compare-and-swap. What seems to be politically incorrect to acknowledge is that these semantics boil down to a simplified version of a storage engine latch, who is the unloved and uncool grandpa nowadays.

Clearly a lot can happen in the middle of 64 bits.

Definitely worth a read, as it seems that this is going to get more play in the years to come.

Comments closed

Blank Pages On SSRS Report

Vladimir Oselsky fixes an odd (at first) Reporting Services issue with printing of blank pages:

Recently run into an issue that caused me spend more time trying figure out what to do that it did to fix it. I got a very simple ticket. Client reports that extra pages are being printed on SSRS report when it is being sent to a specific printer but other printers are fine, additionally printing to PDF is fine.

After some research, I found multiple articles online that talk about improper page and body setup that results in extra pages. Since I’m not used to working on SSRS report inside BIDS (Bussiness Intelligence Development Studio) which was a precursor to SSDT (SQL Server Data Tools), It took me for longer than I would expect to accomplish a simple task. Therefore I’m hoping the following screenshots will save someone (most likely me) time in fixing this issue.

Click through for screenshots.

Comments closed

Proportional Fill Algorithm

Paul Randal discusses the proportional fill algorithm that SQL Server uses for extent allocation:

Proportional fill works by assigning a number to each file in the filegroup, called a ‘skip target’. You can think of this as an inverse weighting, where the higher the value is above 1, the more times that file will be skipped when going round the round robin loop. During the round robin, the skip target for a file is examined, and if it’s equal to 1, an allocation takes place. If the skip target is higher than 1, it’s decremented by 1 (to a minimum value of 1), no allocation takes place, and consideration moves to the next file in the filegroup.

(Note that there’s a further twist to this: when the -E startup parameter is used, each file with a skip target of 1 will be used for 64 consecutive extent allocations before the round robin loop progresses. This is documented in Books Online here and is useful for increasing the contiguity of index leaf levels for very large scans – think data warehouses.)

Read on for some implementation details as well as a good scenario for why it’s important to know about this.

Comments closed

SSMS And Memory

Daniel Janik looks into those out-of-memory errors Management Studio blesses us with:

Has SSMS (SQL Server Management Studio) been crashing on you? Have you been getting Out of Memory messages when attempting to run queries?

You may have noticed that this tends to occur after you’ve opened and closed 40 to 50 query windows. I’ve noticed this when I have had as little as 5 query windows open after having already opened and closed 30 or so other query windows.

It’s crazy that Management Studio is still a 32-bit application after all of these years.

Comments closed

Connect Items Around Temporal Tables

Adam Machanic has a roundup of Connect items pertaining to temporal tables in SQL Server 2016:

I’ve been thinking a lot about SQL Server 2016 temporal tables of late. I think it’s possibly the most compelling feature in the release, with broad applications across a number of different use cases. However, just like any v.1 feature, it’s not without its faults.

I created a couple of new Connect items and decided to see what other things people had submitted. I combed the list and came up with a bunch of interesting items, all of which I think have great merit. Following is a summary of what I found. I hope you’ll consider voting these items up and hopefully we can push Microsoft to improve the feature in forthcoming releases.

I particularly like the idea about dropped column retention, at least as an optional feature.  If temporal tables are interesting to you, click through and check out these Connect items.

Comments closed

Stream Processing With Kafka And Spark

Satendra Kumar has a slide deck looking at combining Spark Streaming with Kafka:

Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are:

  1. Overview of Spark Streaming.

  2. Fault-tolerance Semantics & Performance Tuning.

  3. Spark Streaming Integration with  Kafka.

Click through for the slide deck.  Combine that with the AWS blog post on the same topic and you get a pretty good intro.

Comments closed

Sparklyr

RStudio has announced an interface between R and Apache Spark, named sparklyr:

Over the past couple of years we’ve heard time and time again that people want a native dplyr interface to Spark, so we built one! sparklyr also provides interfaces to Spark’s distributed machine learning algorithms and much more. Highlights include:

  • Interactively manipulate Spark data using both dplyr and SQL (via DBI).

  • Filter and aggregate Spark datasets then bring them into R for analysis and visualization.

  • Orchestrate distributed machine learning from R using either Spark MLlib or H2O SparkingWater.

  • Create extensions that call the full Spark API and provide interfaces to Spark packages.

  • Integrated support for establishing Spark connections and browsing Spark DataFrames within the RStudio IDE.

So what’s the difference between sparklyr and SparkR?

This might be the package I’ve been awaiting.

Comments closed