Press "Enter" to skip to content

Month: October 2017

Power BI Conditional Formatting

Matt Allington shows us how to implement conditional formatting in Power BI:

Conditional formatting is one of the easiest ways to turn tables of boring data into a visual that almost makes the numbers jump out at you on a page.  There is nothing worse than looking at pages and pages of numbers and then trying to find insights from those numbers.  Conditional formatting helps you format your tables (and matrices) of data so that the patterns and outliers in the data are easier to spot at a glance.  Take a look at the before and after images below and see how much easier it is to see the variations in performance.

It turns out to be pretty easy, so check it out.

Comments closed

The Database Dialectic

Rob Farley sees a series of database syntheses, and the Big Data movement is a part of that:

When CLR came in, people said it was a T-SQL killer. I remember a colleague of mine telling me that he didn’t need to learn T-SQL, because CLR meant that he would be able to do it all in .Net. Over time, we’ve learned that CLR is excellent for all kinds of things, but it’s by no means a T-SQL killer. It’s excellent for a number of reasons – CLR stored procedures or functions have been great for things like string splitting and regular expressions – and we’ve learned its place now.

I don’t hear people talking about NoSQL like they once did, and it’s been folded somehow into BigData, but even that seems to have lost a little of its lustre from a year or two ago when it felt like it was ‘all the rage’. And yet we still have data which is “Big”. I don’t mean large, necessarily, just data that satisfies one of the three Vs – volume, velocity, variety.

Rob brings an interesting perspective to the topic, particularly as one of the early Parallel Data Warehouse bloggers.

Comments closed

The Market For IoT Analytics

Kennie Nybo Pontoppidan tells a story for T-SQL Tuesday:

At the time in Rehfeld R&D, we experimented with making Effektor a metadata repository for a Hadoop data warehouse, where instead of generating tables and ETL processes in the different data warehouse layers, the synchronization engine in the product would generate the Hive objects on top of Hadoop tables. We never made more than an overall spec and a prototype, but the experiment gave us some insight into the technologies around Hadoop.

Around that time, Phillips released the Hue lightbulbs, and our COO bought us two packs to play with. The idea was to create a physical BI dashboard, where lightbulbs would display KPIs, and change color according to its value and the KPI threshold. I still think that was a brilliant idea, and I would love to see more use of consumer electronics in enterprise BI.

His basic maturity model makes sense; as much as I really want to disagree with the maturity model, I can’t.  Good read.

Comments closed

Automatic Identity Value Reseeding

Tracy Boggiano shows how to build an automated identity column reseeding solution:

You have tables that have a lot of data inserted into them and deleted that use identity values and run out integers to use.  I have over 3000+ databases where this can occur so we have an alerts setup that checks the tables then checks a table to see if setup to be auto reseeded based on rather of database engineers have indicated it is safe to do so.  If it is that table is auto reseed either to one the maximum negative number for the datatype else we are alerted and we check with our database engineers on how to handle that table.  Keep in mind we are reseeding tables that have been deemed OK to reseed automatically.

Click through for the code, which includes reseeding logic, a job to run reseed operations, and a whitelist table for the tables which you want to allow automatic reseeding.

Comments closed

NOLOCK Doesn’t Mean No Locks

Bert Wagner points out that SELECT queries with NOLOCK can still cause blocking to occur:

This is where my understanding of NOLOCK was wrong: while NOLOCKwon’t lock row level data, it will take out a schema stability lock.

schema stability (Sch-S) lock prevents the structure of a table from changing while the query is executing. All SELECT statements, including those in the read uncommitted/NOLOCK isolation level, take out a Sch-S lock. This makes sense because we wouldn’t want to start reading data from a table and then have the column structure change half way through the data retrieval.

However, this also means there might be some operations that get blocked by a Sch-S lock. For example, any command requesting a schema modification (Sch-M) lock gets blocked in this scenario.

Read on to see which types of commands take schema modification locks, and ways to minimize the pain.

Comments closed

tibbletime: Time-Aware Data Sets In R

At Business Science, they’ve announced a new R package:

We are excited to announce the release of tibbletime v0.0.2 on CRAN. Loads of new functionality have been added, including:

  • Generic period support: Perform time-based calculations by a number of supported periods using a new ~period formula~.
  • Creating series: Use create_series() to quickly create a tbl_time object initialized with a regular time series.
  • Rolling calculations: Turn any function into a rolling version of itself with rollify().
  • A number of smaller tweaks and helper functions to make life easier.

As we further develop tibbletime, it is becoming clearer that the package is a tool that should be used in addition to the rest of the tidyverseThe combination of the two makes time series analysis in the tidyverse much easier to do!

Check out their demos comparing New York and San Francisco weather.  It looks like it’ll be a useful package.  H/T R-bloggers

Comments closed

ANOVA

Mala Mahadevan explains what ANOVA is and why it’s interesting:

ANOVA – or analysis of variance, is a term given to a set of statistical models that are used to analyze differences among groups and if the differences are statistically significant to arrive at any conclusion. The models were developed by statistician and evolutionary biologist Ronald Fischer. To give a very simplistic definition – ANOVA is an extension of the two way T-Test to multiple cases.

ANOVA is an older test and a fairly simple process, but is quite useful to understand.

Comments closed

Updating A Collection Of Variables In Powershell

Klaas Vanbenberghe shows how to update multiple variables in the same way using Powershell:

We turn to two cmdlets: Get-Variable and Set-Variable. They may seem redundant as we get and set variables all the time without those. Maybe you have even never heard of these two cmdlets.
Well here they prove their usefulness:
we will set the variables using their names, and use the current value as a starting point.
An important rule to remember is this:

$ is a token to indicate we’re dealing with a variable, but it is not part of the variable name!

It’s worth reading the whole thing.

Comments closed

Neural Nets Optimizing For Imperfect

John Cook describes a paradox with neural nets:

Deep neural networks have enough parameters to overfit the data, but there are various strategies to keep this from happening. A common way to avoid overfitting is to deliberately do a mediocre job of fitting the model.

When it works well, the shortcomings of the optimization procedure yield a solution that differs from the optimal solution in a beneficial way. But the solution could fail to be useful in several ways. It might be too far from optimal, or deviate from the optimal solution in an unhelpful way, or the optimization method might accidentally do too good a job.

Conceptually, this feels a little weird but isn’t really much of a problem, as we have other analogues:  rational ignorance in economics (where we knowingly choose not to know something because the benefit is not worth the opportunity cost of learning), OPTIMIZE FOR UNKNOWN with SQL Server (where we knowingly do not use the passed-in parameter because we might get stuck in a lesser path), etc.  But the specific process here is interesting.

Comments closed