Press "Enter" to skip to content

Day: November 8, 2017

Rendering Ten Million Points With ggplot2

Antonio Sánchez Chinchón shows how to draw Clifford attractors in R:

From a technical point of view, the challenge is creating a data frame with all locations, since it must have 10 milion rows and must be populated sequentially. A very fast way to do it is using Rcpp package. To render the plot I use ggplot, which works quite well. Here you have the code to play with Clifford Attractors if you want:

Click through for the code, as well as sample output images.

Comments closed

Dynamic Pivoting In Redshift

Maria Zakourdaev is not Redshift’s biggest fan:

Several days ago I have spent a few hours of my life figuring out how to do dynamic pivot in Amazon Redshift. To tell you the truth, I have expected much more from this DBMS SQL language.

Redshift is based on Postgre SQL 8.0.2 ( which was released in 2005 !!!! )

Anything you would want for this, not too difficult task,  does not exits.  No stored procedures. No JSON datatype. No variables outside of UDF, no queries inside UDFs. “UDF can be used to calculate values but cannot be used to call SQL functions”. Python UDFs also cannot query the data, only perform calculations.

Finally I have found one useful function LISTAGG that helped me to get distinct values of all pivoted columns.

Read on to see how Maria solved this problem.  And to tell the truth, I’m not Redshift’s biggest fan either.

Comments closed

Testing Event Hub To Stream Analytics Performance

Rolf Tesmer tries a few different settings for optimizing performance when streaming data from Azure Event Hub to Azure Stream Analytics:

When you configure Azure Stream Analytics you only have 2 levers;

  • Streaming Units (SU) – Each SU is a blend of compute, memory and throughput between 1 and 48 (or more by contacting support).  The factors that impact SU are query complexity, latency, and volume of data. SU can be used to scale out a job to achieve higher throughput. Depending on query complexity and throughput required, more SU units may be necessary to achieve your performance requirements.  A level of SU6 assigns an entire Stream Analytics node.   For our test we wont change SU

  • SQL Query Design – Queries are expressed in a SQL-like query language. These queries are documented in the query language reference guide and includes several common query patterns.  The design of the query can greatly affect the job throughput, in particular if and/or how the PARTITION BY clause is used.

Rolf tests along three margins:  2 versus 16 input partitions, 2 versus 16 output partitions, and whether to partition the data or not.  Read on to see which combination was fastest.

Comments closed

Query Store And Availability Groups FAQ

Erin Stellato has a few follow-up questions from her Query Store sessions:

Q: Can you enable Query Store for a read-only replica?

A: No.  Because the replica is read-only, and Query Store inherently writes data TO the database, you cannot enable it to capture queries that are executed against that read-only copy.  I did create a Connect item for this request.  If this is of interest to you, please up-vote it: Enable Query Store for collection on a read-only replica in an Availability Group.  The more votes this has, the better the possibility that Microsoft will implement it, so feel free to share with your friends and have them vote too!

Read on for more questions and answers, and if you’re interested in it, vote on the Connect item above.

Comments closed

How Columnstore Delta Stores Get Created

Joe Obbish has a list of ways that delta stores get created:

I briefly reviewed the documentation written by Microsoft concerning the appearance of delta stores. Here’s a quote:

Rows go to the deltastore when they are:
Inserted with the INSERT INTO VALUES statement.
At the end of a bulk load and they number less than 102,400.
Updated. Each update is implemented as a delete and an insert.

There are also a few mentions of how partitioning can lead to the creation of multiple delta stores from a single insert. It seems as if the document is incomplete or a little misleading, but I admit that I didn’t exhaustively review everything. After all, Microsoft hides columnstore documentation all over the place.

This is a great compendium of ways in which you can shoot yourself in the foot with clustered columnstore indexes.

Comments closed

Migrating DBMail Settings

Frank Gill has a T-SQL script to help with database mail migration:

This week, I was working on a migration for a client.  The migration was moving databases from a stand-alone instance to a two-node Availability Group.  When it came to moving the Database Mail settings, I discovered they had 21 sets of profiles and accounts.  Not wanting to manually create 42 Database Mail profiles, I set out to automate the process.  A web search yielded this blog post by Iain Elder. This script does what I was looking for, but would only generate settings for a single Database Mail profile.  Using Iain’s code as a starting point, I modified it to create Database Mail settings for all profiles on an instance.  The script is listed below. I hope this simplifies your SQL Server migrations.

Click through for Frank’s script, and you might also be interested in Iain Elder’s script, linked above.

Comments closed