The Risk Of Data Silos

Kevin Feasel

2018-08-23

Data

Gaurav Dhillon argues that data silos are a major impediment to effective use of data:

The greatest stumbling block our respondents identified as hindering their attempts at better utilizing data is one that has existed for some time but seems to have worsened as data volumes have grown – data silos. Only 2 percent of our respondents considered their business to be completely effective at data sharing – for the rest, data silos are a real problem.

The causes for this are numerous, and span inconsistency of systems being used (42 percent), different data formats (38 percent), and a lack of coordinated data strategy (37 percent). On top of this, over a third highlight a lack of technology integration (36 percent) and/or legacy technology barriers (36 percent) as blocking attempts to effectively share data.

My first response is to say that this is in part due to the growth of microservices architecture, which seems to push data siloing.  But at the same time, this has been the case for a long time, so I don’t think it’s either a necessary or a sufficient explanation.

Solving Linear Optimization Problems In R

Kevin Feasel

2018-08-23

R

Mic walks us through a linear optimization problem and solves it with the lpSolve package:

I’m going to implement in R an example of linear optimization that I found in the book “Modeling and Solving Linear Programming with R” by Jose M. Sallan, Oriol Lordan and Vincenc Fernandez.  The example is named “Production of two models of chairs” and can be found at page 57, section 3.5. I’m going to solve only the first point.

The problem text is the following

A company produces two models of chairs: 4P and 3P. The model 4P needs 4 legs, 1 seat and 1 back. On the other hand, the model 3P needs 3 legs and 1 seat. The company has a initial stock of 200 legs, 500 seats and 100 backs. If the company needs more legs, seats and backs, it can buy standard wood blocks, whose cost is 80 euro per block. The company can produce 10 seats, 20 legs and 2 backs from a standard wood block. The cost of producing the model 4P is 30 euro/chair, meanwhile the cost of the model 3P is 40 euro/chair. Finally, the company informs that the minimum number of chairs to produce is 1000 units per month. Define a linear programming model, which minimizes the total cost (the production costs of the two chairs, plus the buying of new wood blocks).

I remember solving this exact problem (down to the four legs versus three legs bit) in grad school.  We used LINGO to do this, though I haven’t seen that language since.  H/T R-Bloggers

The Luminance Illusion With gganimate

Kevin Feasel

2018-08-23

R

David Smith highlights an example of the luminance illusion:

Colin created this animation in R using the gganimate package (available on GitHub from author Thomas Lin Pederson), and the process is delightfully simple. It begins with a chart of 10 “points”, each being the same grey square equally spaced across the shaded background. Then, a simple command animates the transitions from one point to the next, and interpolates between them smoothly:

library(gganimate)
gg_animated <- gg + transition_time(t) + ease_aes('linear')

Check it out, both as a parlor trick and a way of getting a grip on the gganimate package.

Styling In ggplot2

The folks at Jumping Rivers show an example of creating a nice-looking plot with ggplot2:

The changes we’ve made so far would impossible for any package to do for us – how would the package know the plot title? We can now improve the look and feel of the plot. There are two ways of complementary ways of doing this: scales and themes. The ggplot scales control things like colours and point size. In the latest version of ggplot2, version 3.0.0, the Viridis colour palette was introduced. This palette is particularly useful for creating colour-blind friendly palettes

g + scale_colour_viridis_d() # d for discrete

With a few lines of code, those default graphs can look a lot nicer.

Formatting Queries As JSON With FOR JSON

Kevin Feasel

2018-08-23

JSON

Eduardo Pivaral shows off the FOR JSON functionality in SQL Server 2016 and later:

For most of real-world applications, the JSON AUTO will not give you the control you could need over your file format, for having more control over it, you must use the JSON PATH option, along with the ROOT option as follows:

SELECT TOP 10 id, dataVarchar, dataNumeric, dataInt, dataDate
FROM [dbo].[MyTestTable]
FOR JSON PATH, ROOT('TestTable')

Eduardo has several examples along these lines.

KPIs In A Table With Power BI

Kasper de Jonge shows us how to embed SVGs into a Power BI table or matrix to show visual forms of KPIs:

Got a question today on how we can do KPI’s in the matrix or table with Power BI, just like we can with PowerPivot through the KPI functionality there.

This is where the great SVG trick from David Eldersveld comes into play, you can read all about it here in his blog posts: https://dataveld.com/2018/01/13/use-svg-images-in-power-bi-part-1 and this post by Chris Webb with even more cool examples.

In this case I am using his trick and a post I found on the community site to add a KPI to my table.

Read on for an example.

Azure SQL Database Elastic Job False Successes

Kevin Feasel

2018-08-23

Cloud

Arun Sirpal walks us through a case where an Azure SQL Database elastic job appears to be running correctly but isn’t:

I use elastic pools. They are a fabulous way of saving money when running many Azure SQL Databases, that is assuming you understand the resource utilization patterns of the databases involved.

I recently created couple of elastic jobs for my pool and went through a successful setup (so I thought) and proceeded to execute the query. The query being elastic in nature basically runs across all my databases from a single point. This is where it got puzzling, if you look at the below screen shot you can clearly see that the status is RUNNING, drilling into it, it says RETRYING. This obviously is not right.

Click through to see the root cause of this problem.

Causing tempdb Spills

Kendra Little shows us a quick and easy way to cause tempdb spills:

Sometimes it’s useful to know how to cause a problem.

Maybe you’ve never encountered the problem, and want to get hands-on experience. Maybe you’re testing a monitoring tool, and want to see if a condition flags an alert. Maybe you’re testing out a new client tool, and want to see how it displays it.

I recently was going through some demos in SQL Operations Studio, and I found that a spill on a sort operator wasn’t causing a warning to visibly show in the graphic execution plan.

I wanted to file an issue on this and let the Ops Studio team know that would be helpful – but my demo code was somewhat complex and required restoring a rather large database. So I set up a quick code sample to cause a spill that could be run in any database.

It’s important to know how to cause problems if you want to make sure you’ve solved them correctly.

Categories

August 2018
MTWTFSS
« Jul Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031