Press "Enter" to skip to content

Author: Kevin Feasel

Truncate Table And Stats

Kendra Little shows that TRUNCATE TABLE does not always reset stats:

You might expect to see that the statistic on Quantity had updated. I expected it, before I ran through this demo.

But SQL Server never actually had to load up the statistic on Quantity for the query above. So it didn’t bother to update the statistic. It didn’t need to, because it knows that the table is empty, and this doesn’t show up in our column or index specific statistics.

Check it out.

Comments closed

Wiring A Raspberry Pi 3

Drew Furgiuele begins his project to build an easy button for backups:

I should also pause for a second and talk about wiring hobby boards like this. Good news first: you won’t electrocute yourself on it. I mean, if you do something really dumb like try to wire it underwater or eat it or something then maybe you could but you shouldn’t ever receive a shock while working with a board like this, even plugged in. The bad news is that even though you won’t damage yourself, you could very well damage the board if you just randomly plug things in. Here’s a hard and fast rule: if you’re not an electronics expert or an electrical engineer, leave it to experts to tell you where and how to wire. I’m not calling myself an expert here, but I have sort of a basic understanding of how to wire these things up. The point I’m attempting at making is: if you want to really learn and understand circuit design, there are lots of great resources of where to get started. And it’s quite a rabbit hole to go down, but it’s well worth your time if you want to learn more.

Read the whole thing.  Over a weekend, with your Pi 3.

Comments closed

Recurring Server-Side Traces

Kevin Hill shows how to set up a server-side trace which runs periodically:

How to set up a recurring Server-side SQL trace that runs every hour for 10 minutes.

Issues:

  • 6 people in the room are staring at me waiting for the last second request to be done at the end of an 11 hour day (3 of them from the VBV – Very Big Vendor)

  • Trace file names must be different, or you get errors

  • Trace files cannot end with a number

  • I can’t tell time when I am hungry and tired

Extended Events are still the preferred method over server-side traces for getting information, but when a vendor demands traces, the scope for saying “There’s a better way” diminishes quickly, and it’s good to know how to create a server-side trace so you aren’t opening Profiler regularly.

Comments closed

Querying Genomic Data With Athena

Aaron Friedman explains how to use Amazon Athena to query S3 files:

Recently, we launched Amazon Athena as an interactive query service to analyze data on Amazon S3. With Amazon Athena there are no clusters to manage and tune, no infrastructure to setup or manage, and customers pay only for the queries they run. Athena is able to query many file types straight from S3. This flexibility gives you the ability to interact easily with your datasets, whether they are in a raw text format (CSV/JSON) or specialized formats (e.g. Parquet). By being able to flexibly query different types of data sources, researchers can more rapidly progress through the data exploration phase for discovery. Additionally, researchers don’t have to know nuances of managing and running a big data system. This makes Athena an excellent complement to data warehousing on Amazon Redshift and big data analytics on Amazon EMR 

In this post, I discuss how to prepare genomic data for analysis with Amazon Athena as well as demonstrating how Athena is well-adapted to address common genomics query paradigms.  I use the Thousand Genomes dataset hosted on Amazon S3, a seminal genomics study, to demonstrate these approaches. All code that is used as part of this post is available in our GitHub repository.

This feels a lot like a data lake PaaS process where they’re spinning up a Hadoop cluster in the background, but one which you won’t need to manage. Cf. Azure Data Lake Analytics.

Comments closed

Importing CSV Files In Power BI

Gil Raviv explains the new “combine binaries” feature of Power BI Desktop:

The Power BI team has recently released an enhanced “combine binaries” experience as part of November 2016 update to Power BI Desktop. (Jargon Alert:  “Combine Binaries” is a scary term.  Instead it should be named “Magically combine multiple files together into one table and make me SUPER happy.”)  The improved experience can drastically help you to import multiple Excel or other files from a folder and avoid writing advanced query functions. But today we will focus on a specific scenario, which is so common that it deserves this special post – Handling CSV files.

In fact, today’s blog post is actually the first post in “The CSV Series”. I hope you will enjoy it. To celebrate the November update of Power BI Desktop, we will review the improved experience, and will walk you through one of the most common scenarios that is now so easy to implement – Importing multiple CSV files from a folder, including parts of their filenames.

This looks very useful.

Comments closed

Query Optimizer Hotfixes

SQL Scotsman covers the query optimizer hotfixes which you can turn on with trace flag 4199:

The query optimiser hotfixes contained under Trace Flag 4199 are intentionally not enabled by default.  This means when upgrading from SQL Server 2008 R2 to SQL Server 2012 for example, new query optimiser logic is not enabled.   The reason behind this according to the article linked above is to prevent plan changes that could cause query performance regressions.  This makes sense for highly optimised environments where application critical queries are tuned and rely on specific execution plans and any change in query optimiser logic could potentially cause unexpected / unwanted query regressions.

Read the whole thing.

Comments closed

Switching In Powershell

Chrissy LeMaire explain the switch command in Powershell:

Even less code and makes total sense. Awesome. There’s even more to switch — the evaluations can get full on complex, so long as the evaluation ultimately equals $true. Take this example from sevecek. Well, his example with Klaas’ enhancement.

The refrain with switch is, always make sure you cover every case and don’t let cases fall through when you don’t intend them to.  Fortunately, Powershell doesn’t allow fallthrough, so that makes it easier.

Comments closed

Microsoft R Server 9.0

David Smith reports that Microsoft R Server 9.0 is now available:

Microsoft R Server 9.0, Microsoft’s R distribution with added big-data, in-database, and integration capabilities, was released today and is now available for download to MSDN subscribers. This latest release is built on Microsoft R Open 3.3.2, and adds new machine-learning capabilities, new ways to integrate R into applications, and additional big-data support for Spark 2.0.

There’s also a new version of Microsoft R Client and Microsoft R Open.

Comments closed

Range-Based Dimensions

Jana Sattainathan has a couple blog posts on range dimensions.  First is durations:

The data is in increments of 300 seconds going from 0 to 31536000 seconds (1 year). So, this table can be used to analyze activities that take less than 1 year. The last row’s Dimension value should be used for everything that takes over one year (or you can generate more rows based on your need).

The second is size ranges:

In the middle there, one of the bar charts is “Backup Count & Duration by Size”. As the title says, this chart helps me determine which backups are small/large and determine how many backups are in each of those “Duration” buckets. The duration bucket that I used in this case could have been easily changed from GB ranges to TB ranges. For example, I filtered the chart to check counts of backups that are over 1 TB.  As one can see, I have a couple of databases that are in the 2.5 to 3 TB backup size range.

Often times, ranges are enough for analysis and that greater detail of a backup being 12.8 GB versus 12.81 GB obscures more useful information.

Comments closed