Press "Enter" to skip to content

Month: March 2019

Interactive ggplot Plots with plotly

Laura Ellis takes us through ggplotly:

As someone very interested in storytelling, ggplot2 is easily my data visualization tool of choice. It is like the Swiss army knife for data visualization. One of my favorite features is the ability to pack a graph chock-full of dimensions. This ability is incredibly handy during the data exploration phases. However, sometimes I find myself wanting to look at trends without all the noise. Specifically, I often want to look at very dense scatterplots for outliers. Ggplot2 is great at this, but when we’ve isolated the points we want to understand, we can’t easily examine all possible dimensions right in the static charts.

Enter plotly. The plotly package and ggploty function do an excellent job at taking our high quality ggplot2 graphs and making them interactive.

Read on for several quality, interactive visuals.

Comments closed

Goodbye, gather and spread; Hello pivot_long and pivot_wide

John Mount covers a change in tidyr which mimics Mount and Nina Zumel’s pivot_to_rowrecs and unpivot_to_blocks functions in the cdata package:

If you want to work in the above way we suggest giving our cdatapackage a try. We named the functions pivot_to_rowrecs and unpivot_to_blocks. The idea was: by emphasizing the record structure one might eventually internalize what the transforms are doing. On the way to that we have a lot of documentation and tutorials.

This is your regular reminder that the Tidyverse is very useful, but it is not the entirety of R.

Comments closed

Near-Zero Downtime Identity Column Changes

I’m getting close to the end of my series on near-zero downtime deployments. This latest post involves identity column changes:

There are some tables where you create an identity value and expect to cycle through data. An example for this might be a queue table, where the data isn’t expected to live permanently but it is helpful to have a monotonically increasing function to determine order (just watch out for those wrap-arounds and you’re fine). An example of reseeding is below:

DBCC CHECKIDENT('dbo.MyTable', RESEED, 1);

This operation needs to take a LCK_M_SCH_M lock, otherwise known as a schema modification lock. Any transactions which are writing to the table will block your transaction but so will any readers unless you have Read Committed Snapshot Isolation turned on or the reader is in the READ UNCOMMITTED or SNAPSHOT transaction isolation level.

If you are using RCSI and don’t have extremely long-running transactions, this is an in-and-out operation, so even though there’s a little bit of blocking, it’s minimal.

Not all changes are this easy, though.

Comments closed

The State of SQL Server Monitoring Survey

Kendra Little would like a few minutes of your time:

Calling all Database Administrators, Developers, Analysts, Consultants, and Managers: Redgate has a survey open asking how you monitor your SQL Servers.

Take the survey before April 5, 2019.

Your time is valuable. The survey will take 5 – 10 minutes to complete. That’s not a ton of time, but it’s a noticeable part of your day, and there should be something in it for you. Here’s why it’s worthwhile to take the survey.

Read the whole thing and take that survey.

Comments closed

Exporting SQL Server Tables to Excel with Powershell

Aaron Nelson shows how you can export the tables in a SQL Server database to Excel, using a warehouse as an example:

Obviously, you have to have the module installed, and a copy of AdventureWorksDW2017 db restored to a SQL Server.  After that,  all you have to do is loop through the tables, ‘query’ them with the Read-SqlTableData cmdlet, and pipe the results to the Export-Excel cmdlet.

I did some trial and error with this yesterday.  I settled on exporting all of the Dimension tables to separate Worksheets within the same Excel file, and exporting all of the Fact tables to their own individual files (since they tend to be much larger).

Click through for Aaron’s script.

Comments closed

Postgres Support in Azure Data Studio

Constantine Kokkinos is excited about a big addition to Azure Data Studio:

I know I have been writing a lot about ADS recently, but this is even bigger than the Notebook announcement.

A Postgres plugin has been announced in the insider release of ADS, and it just works!

If the term Postgres is unfamiliar – PostgreSQL is one of the preeminent open source database solutions and is showing wide adoption due to its quality and of course, price.

Read on for additional notes.

Comments closed

Using the StreamSets Snowflake Destination

Dash Desai shows how you can use StreamSets to write data into SnowflakeDB:

In particular, we’ll look at an example scenario that addresses Data Drift – where new information is added mid-stream and when that occurs the new table structure and new column values are created in Snowflake automatically.

To illustrate, let’s take HTTP web server logs generated by Apache web server (for example) as our main source of data. Here’s what a typical log line looks like:
150.47.54.136 - - [14/Jun/2014:10:30:19 -0400] "GET /department/outdoors/category/kids'%20golf%20clubs/product/Polar%20Loop%20Activity%20Tracker HTTP/1.1" 200 1026 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36"

Click through for the demonstration.

Comments closed

Database-First or Kafka-First for Event Streaming

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense:

Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more.

If you were to go Kafka-first… well, it isn’t impossible. But all those responsibilities now belong to you as a developer. And if you are thinking there may be multiple webservers handling user requests and passing them to Kafka, you have to solve fairly challenging problems.

Read the whole thing.

Comments closed

Tuning Azure SQL Database

Tim Radney walks us through some of the tools we have available to tune Azure SQL Databases:

Many instance-level items that you have been used to configuring on full installations are off limits. Some of these items include:
– Setting min and max server memory
– Enabling optimize for ad hoc workloads
– Changing cost threshold for parallelism
– Changing instance-level max degree of parallelism
– Optimizing tempdb with multiple data files
– Trace flags

Tim does point out workarounds for some of these and gives us the list of things which are possible, so check that out.

Comments closed

Calculating Weighted Averages in SQL

Lukas Eder shows how you can calculate weighted averages using SQL:

As can be seen, this schema is slightly denormalised as the number of lines per transaction are precalculated in the transactions.lines column. This will turn out to be quite useful for this calculation, but it isn’t strictly necessary.

Now, in the previously linked Stack Overflow question, a report was desired that would calculate:
– An aggregation of sums as provided by the line items
– An aggregation of averages as provided by the transactions

As Lukas points out, doing this in two queries is easy, but doing it in one is sublime.

Comments closed