Press "Enter" to skip to content

Day: October 5, 2016

Graph Analytics With Spark

Mirko Kämpf looks at using GraphFrames on Spark:

Next, we’ll define a DataFrame by loading data from a CSV file, which is stored in HDFS.

Our datafile facebook_combined.txt contains two columns to represent links between network nodes. The first column is called source (src), and the second is the destination (dst) of the link. (Some other systems, such as Gephi, use “source” and “target” instead.)

First we define a custom schema, and than we load the DataFrame, using SQLContext.

It sounds like Spark graph database engines are early in their lifecycle, but they might already be useful for simple analysis.

Comments closed

Hadoop’s S3 Support

Steve Loughran and Sanjay Radia give us a history lesson on Hadoop’s support for Amazon S3:

Hadoop’s ability to work with Amazon S3 storage goes back to 2006 and the issue HADOOP-574, “FileSystem implementation for Amazon S3”. This filesystem client, “s3://” implemented an inode-style filesystem atop S3: it could support bigger files than S3 could then support, some its operations (directory rename and delete) were fast. The s3 filesystem allowed Hadoop to be run in Amazon’s EMR infrastructure, using S3 as the persistent store of work. This piece of open source code predated Amazon’s release of EMR, “Elastic MapReduce” by over two years. It’s also notable as the piece of work which gained Tom White, author of “Hadoop, the Definitive Guide”, committer status.

It’s interesting to see how this project has matured over the past decade.

Comments closed

Polybase With HDP 2.5

I run into some issues with Polybase and Hortonworks Data Platform 2.5:

First, it’s interesting to note that the Polybase engine uses “pdw_user” as its user account.  That’s not a blocker here because I have an open door policy on my Hadoop cluster:  no security lockdown because it’s a sandbox with no important information.  Second, my IP address on the main machine is 192.168.58.1 and the name node for my Hadoop sandbox is at 192.168.58.129.  These logs show that my main machine runs a getfileinfo command against /tmp/ootp/secondbasemen.csv.  Then, the Polybase engine asks permission to open /tmp/ootp/secondbasemen.csv and is granted permission.  Then…nothing.  It waits for 20-30 seconds and tries again.  After four failures, it gives up.  This is why it’s taking about 90 seconds to return an error message:  it tries four times.

Aside from this audit log, there was nothing interesting on the Hadoop side.  The YARN logs had nothing in them, indicating that whatever request happened never made it that far.

Here’s hoping there’s a solution in the future.

Comments closed

ReaderWriterSpinlock

Ewald Cress looks at the new ReaderWriterSpinlock in SQL Server 2016 CU2:

As a quick refresher, a traditional SQLOS spinlock is a 32-bit integer, or of course 64-bit as of 2016, with a value of either zero (lock not acquired) or the 32-bit Windows thread ID of the thread that owns it. All very simple and clean in terms of atomic acquire semantics; the only fun part is the exponential backoff tango that results from a collision.

We have also observed how the 2016 flavour of the SOS_RWLock packs a lot of state into 64 bits, allowing more complicated semantics to be implemented in an atomic compare-and-swap. What seems to be politically incorrect to acknowledge is that these semantics boil down to a simplified version of a storage engine latch, who is the unloved and uncool grandpa nowadays.

Clearly a lot can happen in the middle of 64 bits.

Definitely worth a read, as it seems that this is going to get more play in the years to come.

Comments closed

Blank Pages On SSRS Report

Vladimir Oselsky fixes an odd (at first) Reporting Services issue with printing of blank pages:

Recently run into an issue that caused me spend more time trying figure out what to do that it did to fix it. I got a very simple ticket. Client reports that extra pages are being printed on SSRS report when it is being sent to a specific printer but other printers are fine, additionally printing to PDF is fine.

After some research, I found multiple articles online that talk about improper page and body setup that results in extra pages. Since I’m not used to working on SSRS report inside BIDS (Bussiness Intelligence Development Studio) which was a precursor to SSDT (SQL Server Data Tools), It took me for longer than I would expect to accomplish a simple task. Therefore I’m hoping the following screenshots will save someone (most likely me) time in fixing this issue.

Click through for screenshots.

Comments closed

Proportional Fill Algorithm

Paul Randal discusses the proportional fill algorithm that SQL Server uses for extent allocation:

Proportional fill works by assigning a number to each file in the filegroup, called a ‘skip target’. You can think of this as an inverse weighting, where the higher the value is above 1, the more times that file will be skipped when going round the round robin loop. During the round robin, the skip target for a file is examined, and if it’s equal to 1, an allocation takes place. If the skip target is higher than 1, it’s decremented by 1 (to a minimum value of 1), no allocation takes place, and consideration moves to the next file in the filegroup.

(Note that there’s a further twist to this: when the -E startup parameter is used, each file with a skip target of 1 will be used for 64 consecutive extent allocations before the round robin loop progresses. This is documented in Books Online here and is useful for increasing the contiguity of index leaf levels for very large scans – think data warehouses.)

Read on for some implementation details as well as a good scenario for why it’s important to know about this.

Comments closed

SSMS And Memory

Daniel Janik looks into those out-of-memory errors Management Studio blesses us with:

Has SSMS (SQL Server Management Studio) been crashing on you? Have you been getting Out of Memory messages when attempting to run queries?

You may have noticed that this tends to occur after you’ve opened and closed 40 to 50 query windows. I’ve noticed this when I have had as little as 5 query windows open after having already opened and closed 30 or so other query windows.

It’s crazy that Management Studio is still a 32-bit application after all of these years.

Comments closed

Connect Items Around Temporal Tables

Adam Machanic has a roundup of Connect items pertaining to temporal tables in SQL Server 2016:

I’ve been thinking a lot about SQL Server 2016 temporal tables of late. I think it’s possibly the most compelling feature in the release, with broad applications across a number of different use cases. However, just like any v.1 feature, it’s not without its faults.

I created a couple of new Connect items and decided to see what other things people had submitted. I combed the list and came up with a bunch of interesting items, all of which I think have great merit. Following is a summary of what I found. I hope you’ll consider voting these items up and hopefully we can push Microsoft to improve the feature in forthcoming releases.

I particularly like the idea about dropped column retention, at least as an optional feature.  If temporal tables are interesting to you, click through and check out these Connect items.

Comments closed