Press "Enter" to skip to content

Month: November 2017

SQL On Docker And Running SQL Queries

Rob Sewell loads SQL on Windows & Linux via Docker and shows how to run a query against it using Powershell:

In only a few seconds you have a SQL 2017 instance up and running (Take a look at Andrews blog at dbafromthecold.com for a great container series with much greater detail)

Now that we have our container we need to connect to it. We need to gather the IPAddress. We can do this using docker command docker inspect but I like to make things a little more programmatical. This works for my Windows 10 machine for Windows SQL Containers. There are some errors with other machines it appears but there is an alternative below

Read the whole thing.

Comments closed

Rendering Ten Million Points With ggplot2

Antonio Sánchez Chinchón shows how to draw Clifford attractors in R:

From a technical point of view, the challenge is creating a data frame with all locations, since it must have 10 milion rows and must be populated sequentially. A very fast way to do it is using Rcpp package. To render the plot I use ggplot, which works quite well. Here you have the code to play with Clifford Attractors if you want:

Click through for the code, as well as sample output images.

Comments closed

Dynamic Pivoting In Redshift

Maria Zakourdaev is not Redshift’s biggest fan:

Several days ago I have spent a few hours of my life figuring out how to do dynamic pivot in Amazon Redshift. To tell you the truth, I have expected much more from this DBMS SQL language.

Redshift is based on Postgre SQL 8.0.2 ( which was released in 2005 !!!! )

Anything you would want for this, not too difficult task,  does not exits.  No stored procedures. No JSON datatype. No variables outside of UDF, no queries inside UDFs. “UDF can be used to calculate values but cannot be used to call SQL functions”. Python UDFs also cannot query the data, only perform calculations.

Finally I have found one useful function LISTAGG that helped me to get distinct values of all pivoted columns.

Read on to see how Maria solved this problem.  And to tell the truth, I’m not Redshift’s biggest fan either.

Comments closed

Testing Event Hub To Stream Analytics Performance

Rolf Tesmer tries a few different settings for optimizing performance when streaming data from Azure Event Hub to Azure Stream Analytics:

When you configure Azure Stream Analytics you only have 2 levers;

  • Streaming Units (SU) – Each SU is a blend of compute, memory and throughput between 1 and 48 (or more by contacting support).  The factors that impact SU are query complexity, latency, and volume of data. SU can be used to scale out a job to achieve higher throughput. Depending on query complexity and throughput required, more SU units may be necessary to achieve your performance requirements.  A level of SU6 assigns an entire Stream Analytics node.   For our test we wont change SU

  • SQL Query Design – Queries are expressed in a SQL-like query language. These queries are documented in the query language reference guide and includes several common query patterns.  The design of the query can greatly affect the job throughput, in particular if and/or how the PARTITION BY clause is used.

Rolf tests along three margins:  2 versus 16 input partitions, 2 versus 16 output partitions, and whether to partition the data or not.  Read on to see which combination was fastest.

Comments closed

Query Store And Availability Groups FAQ

Erin Stellato has a few follow-up questions from her Query Store sessions:

Q: Can you enable Query Store for a read-only replica?

A: No.  Because the replica is read-only, and Query Store inherently writes data TO the database, you cannot enable it to capture queries that are executed against that read-only copy.  I did create a Connect item for this request.  If this is of interest to you, please up-vote it: Enable Query Store for collection on a read-only replica in an Availability Group.  The more votes this has, the better the possibility that Microsoft will implement it, so feel free to share with your friends and have them vote too!

Read on for more questions and answers, and if you’re interested in it, vote on the Connect item above.

Comments closed

How Columnstore Delta Stores Get Created

Joe Obbish has a list of ways that delta stores get created:

I briefly reviewed the documentation written by Microsoft concerning the appearance of delta stores. Here’s a quote:

Rows go to the deltastore when they are:
Inserted with the INSERT INTO VALUES statement.
At the end of a bulk load and they number less than 102,400.
Updated. Each update is implemented as a delete and an insert.

There are also a few mentions of how partitioning can lead to the creation of multiple delta stores from a single insert. It seems as if the document is incomplete or a little misleading, but I admit that I didn’t exhaustively review everything. After all, Microsoft hides columnstore documentation all over the place.

This is a great compendium of ways in which you can shoot yourself in the foot with clustered columnstore indexes.

Comments closed

Migrating DBMail Settings

Frank Gill has a T-SQL script to help with database mail migration:

This week, I was working on a migration for a client.  The migration was moving databases from a stand-alone instance to a two-node Availability Group.  When it came to moving the Database Mail settings, I discovered they had 21 sets of profiles and accounts.  Not wanting to manually create 42 Database Mail profiles, I set out to automate the process.  A web search yielded this blog post by Iain Elder. This script does what I was looking for, but would only generate settings for a single Database Mail profile.  Using Iain’s code as a starting point, I modified it to create Database Mail settings for all profiles on an instance.  The script is listed below. I hope this simplifies your SQL Server migrations.

Click through for Frank’s script, and you might also be interested in Iain Elder’s script, linked above.

Comments closed

How Per-Second AWS Billing Helps With Data Processing

Prakash Chockalingam explains how AWS per-second billing can make resource allocation easier:

Because of the hourly increments in billing, users spend a lot of time playing a giant game of Tetris with their big data workloads — figuring out how to pack jobs to use every minute of the compute hour. Examples:

  • If a job could be run on 10 nodes and finished in 20 minutes, it was better to run it on fewer nodes so that it takes around 50 minutes. As a result, you would pay less.
  • Running two 10 node jobs that took 20 minutes in parallel would cost two times more than running them sequentially.

The above problem got compounded if there were many such jobs to be run. To handle this challenge, many organizations turned to a resource scheduler like YARN. Organizations were following the traditional on-premises model of setting up one or more big multi-tenant cluster on the cloud and running YARN to bin-pack the different jobs.

Read on to see how this has changed as a result of per-second billing.

Comments closed

Tracking Kafka Consumer Lag

Simarpreet Kaur Monga has a Scala-based example showing how to calculate Kafka offset lag for consumers:

The Consumer can subscribe to multiple topics, you need to pass the list of topics you want to consume from. For the sake of simplicity, I have just passed a single topic to consume from.

Now that the consumer has subscribed to the topic, it can consume from that topic.

The consumer maintains an offset to keep the track of the next record it needs to read.

Now, let us see how we can find the consumer offsets.

The Consumer offsets can be found using the method offset of class ConsumerRecord. This offset points to the record in a Kafka partition. The consumer consumes the records from the topic in the form of an object of class ConsumerRecord. The class ConsumerRecord also consists of a topic name and a partition number from which the record is being received, and a timestamp as marked by the corresponding ProducerRecord (the record sent by the producer).

Click through for the rest of the story.

Comments closed