Saving To Excel From Azure Data Studio

Bob Pusateri shows us how you can export to Excel from Azure Data Studio:

In SQL Server Management Studio, there’s no single-step way to save a result set to Excel. Most commonly I will just copy/paste a result set into a spreadsheet, but depending on the size of the result set and the types of data involved, that doesn’t always play nicely.

But Azure Data Studio does it WAY better, trust me. If you want that result set in a spreadsheet, just save it as one and poof – you have an Excel file!

Considering that Excel is the most popular BI tool, it makes sense to support it.

Things To Know About Databricks UAP

Kara Annanie has five things you should know about the Databricks Unified Analytics Platform:

4.     A Spark Dataframe is not the same as a Pandas/R Dataframe
Spark Dataframes are specifically designed to use distributed memory to perform operations across a cluster whereas Pandas/R Dataframes can only run on one computer. This means that you need to use a Spark Dataframe to realize the benefits of the cluster when coding in Python or R within Databricks.

This is a nice set of tips for people getting started with Spark and Databricks.

Updating Power BI Reports Based On Website Changes

Kasper de Jonge shows how you can scrape a webpage using an M function and check for specific updates:

A quick hack today. Got this question from someone who needed to be able to update a report and show users that something had changed. This is easy when you have access to a database and can add data to it but in this case that was not possible.
So I came up with a hacky (and great :P) way to do this. and wanted to share it in case it came handy in your box of tricks :).

It is a little bit hacky, but much less so on a website with a last updated date visible someplace.

When SLEEP_TASK Waits Are Important

Ginger Keys gives an example where the SLEEP_TASK wait indicates a performance problem:

Ordinarily SLEEP_TASK is a nonspecific wait type in SQL Server which occurs when a task sleeps while waiting for a generic event to occur, according to Microsoft documentation.  This wait type can usually be safely ignored, however on some occasions it can happen when a script does not execute completely or hangs up for long periods of time.

The SLEEP_TASK wait means that a thread is waiting on a resource or waiting for some event to occur, and could indicate background task scheduling, a query plan exchange operator that isn’t tracked by CXPACKET, or it could be a hashing operation that spills to tempdb.

Read the whole thing. For a bit more information, check out the SQLskills description of this wait type.

SOS_SCHEDULER_YIELD And Hypervisors

Jonathan Kehayias has an interesting article on generating SOS_SCHEDULER_YIELD waits due to CPU over-subscription on the host:

The test environment that I used for this is a portable lab I’ve used for demos for VM content over the last eight years teaching our Immersion Events. The ESX host has 4 cores and 8GB RAM and hosts three virtual machines, a 4vCPU SQL Server with 4GB RAM, and two 2vCPU Windows Server VM’s with 2GB RAM that are used strictly to run Geekbench to produce load.  Within SQL Server, I a reproducible workload that drives parallelism and is repeatable consistently, that I have also used for years in teaching classes.
For the tests, I first ran a baseline where the SQL Server VM is the only machine executing any tasks/workload at all, but the other VMs are powered on and just sitting there on the hose.  This establishes a base metric in the host and had 1% RDY time average during the workload execution, after which I collected the wait stats for SOS_SCHEDULER_YIELD.

From there, Jonathan starts cranking up the load on the application servers and sees what it does to SQL Server ready time. This is a great reason not to over-subscribe on CPUs on mission-critical hosts.

Curated SQL Returns Wednesday

In honor of almost everybody being out of the office today and tomorrow, Curated SQL returns for its Boxing Day extravaganza, which looks exactly like any other day to me, but my producer promises me it will be extravagant.

What’s New With Kafka 2.1

Stephane Maarek updates us on the goings-on with Apache Kafka:

Kafka 2.1 is quite a special upgrade because you cannot downgrade due to a schema change in the consumer offsets topics. Otherwise the procedure to upgrade Kafka is still the same as before, see: https://kafka.apache.org/documentation/#upgrade

One of the big changes is support for Java 11. It’s a shame that Spark currently doesn’t support versions past 8.

What’s New With Cloudera Enterprise 6.1.0

Krishna Maheshwari announces Cloudera Enterprise 6.1.0:

Platform Support & Security
Cloudera now supports deploying with OpenJDK 8 in addition to Oracle’s JDK. With this release, we also support AWS CloudHSM for HDFS encryption-at-rest.
As customers are increasingly implementing security, we are changing defaults to be secure in order to reduce setup complexity and configuration misses. As a part of this release, several defaults in Kafka, Impala, Sqoop, and Flume have been changed to be more secure and added BDR replication from insecure to secure (Kerberized) clusters to ease the transition to secure clusters.

There are several improvements and new features here worth checking out.

Window Functions And Default Frames

Steve Jones elaborates on the default frame that a window function has:

There is a framing clause that I can use after the ORDER BY in the OVER clause. The default frame is RANGE UNBOUNDED PRECEDING AND CURRENT ROW. At least, this is what appears when you include an ORDER BY clause. Many of us do this, but still get confused with the LAST_VALUE() and FIRST_VALUE functions.
What I really want is a complete set of data, which is either starting from the current row to the end, or  includes all values. If I modify my framing clause, I’ll get what I expect.

Click through for a demonstration.

Building A Secure Microservice Which Uses Kafka Streams

George Vetticaden has a reference architecture with sample code for a secured microservice running atop Kafka Streams:

One of the key benefits of using Kafka Streams over other streaming engines is that the stream processing apps / microservices don’t need a cluster. Rather, each microservice can be run as a standalone app (e.g: jvm container). You can then spin multiple instances of each to scale up the microservice. Kafka will treat this as a single consumer group with multiple instances. Kafka streams takes care of consumer partition reassignments for scalability.
You can see how to start these three microservices here.

If you’re trying to wrap your head around streaming apps, give this a try.  George has all of the code in his GitHub repo.

Categories

February 2019
MTWTFSS
« Jan  
 123
45678910
11121314151617
18192021222324
25262728