Press "Enter" to skip to content

Month: October 2016

Blank Pages On SSRS Report

Vladimir Oselsky fixes an odd (at first) Reporting Services issue with printing of blank pages:

Recently run into an issue that caused me spend more time trying figure out what to do that it did to fix it. I got a very simple ticket. Client reports that extra pages are being printed on SSRS report when it is being sent to a specific printer but other printers are fine, additionally printing to PDF is fine.

After some research, I found multiple articles online that talk about improper page and body setup that results in extra pages. Since I’m not used to working on SSRS report inside BIDS (Bussiness Intelligence Development Studio) which was a precursor to SSDT (SQL Server Data Tools), It took me for longer than I would expect to accomplish a simple task. Therefore I’m hoping the following screenshots will save someone (most likely me) time in fixing this issue.

Click through for screenshots.

Comments closed

Proportional Fill Algorithm

Paul Randal discusses the proportional fill algorithm that SQL Server uses for extent allocation:

Proportional fill works by assigning a number to each file in the filegroup, called a ‘skip target’. You can think of this as an inverse weighting, where the higher the value is above 1, the more times that file will be skipped when going round the round robin loop. During the round robin, the skip target for a file is examined, and if it’s equal to 1, an allocation takes place. If the skip target is higher than 1, it’s decremented by 1 (to a minimum value of 1), no allocation takes place, and consideration moves to the next file in the filegroup.

(Note that there’s a further twist to this: when the -E startup parameter is used, each file with a skip target of 1 will be used for 64 consecutive extent allocations before the round robin loop progresses. This is documented in Books Online here and is useful for increasing the contiguity of index leaf levels for very large scans – think data warehouses.)

Read on for some implementation details as well as a good scenario for why it’s important to know about this.

Comments closed

SSMS And Memory

Daniel Janik looks into those out-of-memory errors Management Studio blesses us with:

Has SSMS (SQL Server Management Studio) been crashing on you? Have you been getting Out of Memory messages when attempting to run queries?

You may have noticed that this tends to occur after you’ve opened and closed 40 to 50 query windows. I’ve noticed this when I have had as little as 5 query windows open after having already opened and closed 30 or so other query windows.

It’s crazy that Management Studio is still a 32-bit application after all of these years.

Comments closed

Connect Items Around Temporal Tables

Adam Machanic has a roundup of Connect items pertaining to temporal tables in SQL Server 2016:

I’ve been thinking a lot about SQL Server 2016 temporal tables of late. I think it’s possibly the most compelling feature in the release, with broad applications across a number of different use cases. However, just like any v.1 feature, it’s not without its faults.

I created a couple of new Connect items and decided to see what other things people had submitted. I combed the list and came up with a bunch of interesting items, all of which I think have great merit. Following is a summary of what I found. I hope you’ll consider voting these items up and hopefully we can push Microsoft to improve the feature in forthcoming releases.

I particularly like the idea about dropped column retention, at least as an optional feature.  If temporal tables are interesting to you, click through and check out these Connect items.

Comments closed

Stream Processing With Kafka And Spark

Satendra Kumar has a slide deck looking at combining Spark Streaming with Kafka:

Knoldus organized a Meetup on Friday, 9 September 2016. Topics which were covered in this meetup are:

  1. Overview of Spark Streaming.

  2. Fault-tolerance Semantics & Performance Tuning.

  3. Spark Streaming Integration with  Kafka.

Click through for the slide deck.  Combine that with the AWS blog post on the same topic and you get a pretty good intro.

Comments closed

Sparklyr

RStudio has announced an interface between R and Apache Spark, named sparklyr:

Over the past couple of years we’ve heard time and time again that people want a native dplyr interface to Spark, so we built one! sparklyr also provides interfaces to Spark’s distributed machine learning algorithms and much more. Highlights include:

  • Interactively manipulate Spark data using both dplyr and SQL (via DBI).

  • Filter and aggregate Spark datasets then bring them into R for analysis and visualization.

  • Orchestrate distributed machine learning from R using either Spark MLlib or H2O SparkingWater.

  • Create extensions that call the full Spark API and provide interfaces to Spark packages.

  • Integrated support for establishing Spark connections and browsing Spark DataFrames within the RStudio IDE.

So what’s the difference between sparklyr and SparkR?

This might be the package I’ve been awaiting.

Comments closed

Finding Long-Running Queries

Peter Schott has an Extended Event to find long-running queries:

This has been something I’ve wanted to investigate for a while now. I’ve know you could use Profiler and set up server-side traces to capture long-running events, but was curious how to do the same with Extended Events. I then came across this post from Pinal Dave ( b | t ) that pointed me in the right direction. I followed along with the guidelines he was suggesting and was having trouble finding the “Duration” filter. Turns out I had a bit too much selected in my filtering options or perhaps the Wizard was giving me fits seeing it, but I eventually selected just the Batch Completed or RPC Completed events to see and set the Duration filter. The one change that I’d make from Dave’s script is to set the duration to 500,000 because Duration in SQL 2012 is in microseconds, not milliseconds. I also want queries longer than 5 seconds to start.

Click through for the script.

Comments closed

Kafka Plus Spark Streaming

Prasad Alle shows how to integrate Kafka with Spark Streaming on AWS:

Stream processing walkthrough

The entire pattern can be implemented in a few simple steps:

  1. Set up Kafka on AWS.

  2. Spin up an EMR 5.0 cluster with Hadoop, Hive, and Spark.

  3. Create a Kafka topic.

  4. Run the Spark Streaming app to process clickstream events.

  5. Use the Kafka producer app to publish clickstream events into Kafka topic.

  6. Explore clickstream events data with SparkSQL.

This is a pretty easy-to-follow walkthrough with some good tips at the end.

Comments closed

SKLearn To Azure ML

David Crook shows how to build a model using Python’s SciKit library and then operationalize it in Azure ML:

Why Model Outside Azure ML?

Sometimes you run into things like various limitations, speed, data size or perhaps you just iterate better on your own workstation.  I find myself significantly faster on my workstation or in a jupyter notebook that lives on a big ol’ server doing my experiments.  Modelling outside Azure ML allows me to use the full capabilities of whatever infrastructure and framework I want for training.

So Why Operationalize with Azure ML?

AzureML has several benefits such as auto-scale, token generation, high speed python execution modules, api versioning, sharing, tight PaaS integration with things like Stream Analytics among many other things.  This really does make life easier for me.  Sure I can deploy a flask app via docker somewhere, but then, I need to worry about things like load balancing, and then security and I really just don’t want to do that.  I want to build a model, deploy it, and move to the next one.  My value is A.I. not web management, so the more time I spend delivering my value, the more impactful I can be.

Read the whole thing.

Comments closed