Press "Enter" to skip to content

Curated SQL Posts

Basic Forensic Accounting Techniques

I continue my series on forensic accounting techniques:

Growth analysis focuses on changes in ratios over time. For example, you may plot annual revenue, cost, and net margin by year. Doing this gives you an idea of how the company is doing: if costs are flat but revenue increases, you can assume economies of scale or economies of scope are in play and that’s a great thing. If revenue is going up but costs are increasing faster, that’s not good for the company’s long-term outlook.

For our data set, I’m going to use the following SQL query to retrieve bus counts on the first day of each year. To make the problem easier, I add and remove buses on that day, so we don’t need to look at every day or perform complicated analyses.

I get into quite a bit in this post, including a quick tour of multicollinearity, which is only my second-favorite of the three linear regression amigos (heteroskedasticity being my favorite and autocorrelation the hanger-on).

Comments closed

Scaling Power BI Premium Capacity

Matt Allington gives us instructions on how to scale Power BI Premium capacity:

This is the third article in my series about how to make Power BI Premium more affordable for small to medium sized enterprises (SMEs).  In my first article I explained the problem and the logic behind how to configure a workable solution. In my second article I provided step by step instructions on how to configure Flow to start/stop Power BI Premium capacities.  In the article today I am covering a way to scale the capacity up/down either on demand, or on a timed schedule.

The cloud is generally more expensive than on-prem, though it can potentially become cheaper if you are smart about scaling and have more scaling-friendly workloads. Matt even provides a really cool cost analysis to help you figure out what (if anything) you end up saving using this technique.

Comments closed

T-SQL Tuesday 113 Roundup

Todd Kleinhans takes us through T-SQL Tuesday #113:

Wow, we had a variety of responses to the April 2019 topic of “What Do YOU Use Databases For?

I think the overall response to the question and the theme is both mixed and varied.

I have been struggling with the personal use of databases for a long time. Things I wish would have been easier but seems to just get more complicated over time. Ever heard of GDPR? Although we think we have absolute control and access to data about ourselves, we really do not. The right to be forgotten is NOT the same as having access to all of the data about ourselves in all of the systems before they disappear. Sometimes companies will delete your data about you before you ask.

Todd starts out with an essay and then moves on to the roundup.

Comments closed

When Scans are Superior to Seeks

Brent Ozar shows that index seeks are not always better than index scans:

Somewhere along the way in your career, you were told that:
– Index seeks are quick, lightweight operations
– Table scans are ugly, slow operations

And ever since, you’ve kept an eye on your execution plans looking for those performance-intensive clustered index scans. When you see ’em, you go root ’em out, believing you’ve got a performance problem.

Thing is, … they lied to you. Seeks aren’t necessarily good, nor are scans necessarily bad. To straighten you out, we’re going to walk through a series of demos.

The rule of thumb I like to use is: if you need to go through more than 20% of the data, you’re generally better off scanning. If you need to go through less than 0.5% of the data, you’re generally better off seeking. Everything in between is the “it depends” zone.

Comments closed

Clustered Columnstore Index Memory Timeouts

Joe Obbish takes a deep look at a clustered columnstore index insertion scenario:

Why should we care about memory grant timeouts for CCI insert queries? Simply put, lots of bad things can happen when those queries can time out, both for serial and for parallel inserts. For serial insert queries, I’ve observed deadlocks, extremely poor performance along with long SLEEP_TASK waits, and extremely long rollbacks. For parallel insert queries, I’ve observed queries that run seemingly forever, poor performance of the SELECT part, and error 8645. You probably don’t want any of that occurring in production. It would be very helpful if it was possible to extend the 25 second time-out for queries that insert into columnstore tables.

Read through as Joe learns the true meaning of Christmas a KB article.

Comments closed

Reviewing the Stack Overflow Developer Survey

Michael Toth looks at the recently-released 2019 Stack Overflow Developer Survey:

Since 2011, Stack Overflow has been surveying their users each year to answer questions about the technologies they use, their work experience, their compensation, and their satisfaction at work. Given Stack Overflow’s place in the broader programming world, they are able to draw quite the audience for their annual surveys.

This year, nearly 90,000 developers participated in the survey! There’s a lot in this survey, and I recommend reviewing it yourself, but I wanted to surface some of the key findings that I thought were particularly relevant to data professionals here.

Stack Overflow says they will be releasing the underlying data for this survey in the coming weeks, so I hope to return to this for a deeper analysis once that’s made available. For now, let’s get into the results!

Michael’s lede involves R versus Python in terms of salaries, but for me, the top line is that functional programmers make more money. Clojure, F#, Scala, Elixir, and Erlang make the top 10 on the global list, including positions 1, 2, 4, and 5. Within the US, Scala, Clojure, Erlang, Kotlin, F#, and Elixir make the top 10, including positions 1, 2, and 4. H/T R-Bloggers

Comments closed

Understanding DNS for Developers

RJ Zaworski explains DNS for web developers:

DNS can use a similar TCP/IP stack, but being parts of a simple system, most DNS operations can also travel the wire on the Internet’s favorite Roulette wheel: the User Datagram Protocol, UDP.

On a good day, UDP is fast, simple, and stripped bare of unnecessary niceties like delivery guarantees and congestion management. But a UDP message may also never be delivered, or it may be delivered twice. It may never get a response, which makes for fun client design–particularly coming from the relatively safe and well-adjusted world of HTTP. With TCP, you get an established connection and all kinds of accommodations when Things Inevitably Go Wrong. UDP? “Best effort” delivery. Which means a packet thrown over the fence with a prayer for a soft landing.

It’s a good read if you’re new to DNS.

Comments closed

Automating Command Line Sessions with Powershell

James Livingston shows us how to redirect the standard in flow with .NET, using Powershell as the example language:

Command Line Interface (CLI) tools can be very useful for interacting with certain applications. However, some CLIs do not let a user pass in parameters which makes it difficult to automate. Instead, they lock a user into an interactive session and force the user to enter commands. Fortunately, some programming languages allow for a redirection of the standard input, output, and error of an running process. A developer co-worker of mine has been very successful doing this in Java, which got me thinking… And turns out, using the .NET libraries, we can implement this functionality for any CLI in PowerShell.

Automating interactions with CLI programs is often not ideal but when you’re stuck, it does the trick.

Comments closed