Press "Enter" to skip to content

Curated SQL Posts

ggplot2 Mappings And Geoms

I continue my ggplot2 series:

We have used a new geom here, geom_smooth.  The geom_smooth function creates a smoothed conditional mean.  Basically, we’re drawing some line as a result of a function based on this input data.  Notice that there are two parameters that I set:  method and se.  The method parameter tells the function which method to use.  There are five methods available, including using a Generalized Additive Model (gam), Locally Weighted Scatterplot Smoothing (loess), and three varieties of Linear Models (lm, glm, and rlm).  The se parameter controls whether we want to see the standard error bar.

I don’t cover all of the mapping options and all of the geoms, but I think it’s enough to get a grip on the concept.

Comments closed

Using FreeTDS To Connect To SQL Server

Steph Locke embraces the pain of FreeTDS:

If you use SQL Server (or Azure SQL DB) as your data store and you need to connect to the databasse from shinyapps.io, you’re presently stuck with FreeTDS. If you have any control over infrastructure I cannot recommend highly enough the actual ODBC Driver on Linux for ease. Alas, shinyapps.io does not let you control the infrastructure. We have to make do with with FreeTDS and it can be pretty painful to get right.

Due to how obtuse the error messages you end up getting back from FreeTDS in your shiny app and the time to deploy an app, you might just want to cry a little. I know I did. Determined to succeed, here is my solution to getting a working database connection that you can also use to test you’re doing it right. If you’re on a particularly old version of SQL Server though, I can’t guarantee this will work for you.

Read on for more.  I also have an older post on working with FreeTDS, though I ended up using TDS_Version = 8.0 instead of 7.4.

Comments closed

Power BI Cumulative Totaling

Martin Schoombee runs into an interesting issue with cumulative totals in Power BI:

A common practice in the data warehousing world is to use a Date Key as unique identifier in a date dimension. This attribute is usually a number in the format yyyymmdd. I’m not going to dive into all the reasons why it is used in data warehouse environments here, but (for fun) let’s change our data model to use the Date Keyattribute in the relationship between the two tables.

If we look at our visualizations again, we see a very different picture. Sales by date still looks the same, but the sales by month seems a little out of whack (image below). If you had cumulative sales at any other aggregated level (quarter, year, etc.) it would also have been incorrect.

The answer is not immediately intuitive, so it’s good to know this ahead of time rather than have to struggle with it later.

Comments closed

Processing Tabular Models With Helpful Information

Ust Oldfeld has a stored procedure which runs a SQL Agent job and provides notice when processing completes:

Recently, at a client, I was challenged to create a stored procedure that would process a tabular model. This stored procedure would then be executed from a web application. The process behind it being: a user enters data into a web application, which gets written to a database. That data then needs to be immediately surfaced up into reports, with additional calculations and measures along the way. Therefore the tabular model, which does all the additional calculation and measures, needs to be processed by a user from the web application.

Click through for the script.

Comments closed

Ungrouped Results In Query Store

Erin Stellato explains why you might see two rows for the same query plan in Query Store’s run-time stats:

You can see that there are two rows for the same query_id and plan_id, but the count_executions is different, as are the avg_duration and avg_logical_io_reads values .  The data is not truly a duplicate.  This behavior occurs because the Query Store data is stored in memory before it is flushed to disk, and when you query the data SQL Server is pulling it from both locations (and doing a UNION ALL) and displaying it in the output.  If I waited a bit and ran the query again, the two rows for that interval would probably disappear – most likely because the in memory data had been flushed to disk.

Read the whole thing.

Comments closed

Wrapping Up SSMS Tricks

Wayne Sheffield wraps up his 31 SSMS tricks in 31 days series.  First, he shows off the presentation settings in SSMS.  Presenter mode isn’t nearly as full-featured as I’d like to see it be, but it’s an improvement over the old version, at least.

He then shows standard reports built into SSMS.

The last post in the series is all about the Activity Monitor:

The Activity Monitor can be useful for seeing a mile-high view of a SQL Server instance. However, leaving it running can be as big a drag on the instance as is the use of SQL Profiler. The Resource Waits section of Activity Monitor, which would have been one of the strongest features, has been dumbed down by the filtering of wait types, and that many others are grouped together into categories. Sure, using the Activity Monitor convenient – but spend the time to develop your own scripts or XE sessions to get this information in a more efficient, with less impact. Overall, refrain from using it… and especially don’t set up SSMS to open it automatically on startup.

Agreed.  For all of Wayne’s tips and tricks, check out his index page.

Comments closed

Structural Topic Models In R

Julia Silge has a great post on building Structural Topic Models in R using stm and tidytext:

The stm package has a summary() method for trained topic models like these that will print out some details to your screen, but I want to get back to a tidy data frame so I can use dplyr and ggplot2 for data manipulation and data visualization. I can use tidy() on the output of an stm model, and then I will get the probabilities that each word is generated from each topic.

I haven’t watched the video yet, but that’s on my to-do list for today.

Comments closed

The Grammar of Graphics

I’ve started a new series:

Instead, we will start with Wickham’s paper on ggplot2.  This gives us the basic motivation behind the grammar of graphics by covering what a grammar does for us:  “A grammar provides a strong foundation for understanding a diverse range of graphics. A grammar may also help guide us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics. This is easy to see by analogy to the English language: good grammar is just the first step in creating a good sentence” (3).

With a language, we have different language components like nouns (which can be subjects, direct objects, or indirect objects), verbs, adjectives, adverbs, etc.  We put together combinations of those individual components to form complete sentences and transmit ideas.  Our particular word choice and language component usage will affect the likelihood of success in idea transmission, but to an extent, we can work iteratively on a sentence, switching words or adding phrases to get the point across the way we desire.

With graphics, we can do the same thing.  Instead of thinking of “a graph” as something which exists in and of itself, we should think of different objects that we combine into its final product:  a graph.

I call this first post the poor man’s literature review.  The rest of the series is code- and visual-heavy.

Comments closed

The Cost Of Downtime

Allan Hirt points out a new rule in the UK:

However, as of this week, if you have a company or work in the UK, things just got a whole lot more interesting. The UK government officially released a statement on January 28 which affects “critical industries”. Long story short: if you fall under the classification which seems to be limited right now to energy, transport, water, and health firms, you could be fined up to £17 million ($24 million in US Dollars at today’s exchange rate) in the event of a cyber attack taking you down. It was the WannaCry outages that precipitated the response (as an example, FedEx says WannaCry cost them about $300 million US Dollars). Remember this doozie from British Airways? Also covered under this new Network and Information Systems (NIS) Directive; it’s not just about security, but includes other things like power outages, hardware failure, and environmental hazards.

Read the whole thing.

Comments closed

Finding Errors In A SQL Query Using SSMS

Bert Wagner points out a useful feature of SSMS:

While all of these error messages point me in the right direction, I’m too lazyto try and remember how each version of SSMS refers to the error location. This becomes particularly problematic when jumping around environments and not being able to use a consistent version of SSMS.

There is one feature that works consistently though that makes navigating to the specific error lines easily.

Click through for that answer.  It doesn’t work for dynamic SQL, but it can be quite helpful when it does work.

Comments closed