# Day: August 23, 2018

The greatest stumbling block our respondents identified as hindering their attempts at better utilizing data is one that has existed for some time but seems to have worsened as data volumes have grown – data silos. Only 2 percent of our respondents considered their business to be completely effective at data sharing – for the rest, data silos are a real problem.

The causes for this are numerous, and span inconsistency of systems being used (42 percent), different data formats (38 percent), and a lack of coordinated data strategy (37 percent). On top of this, over a third highlight a lack of technology integration (36 percent) and/or legacy technology barriers (36 percent) as blocking attempts to effectively share data.

My first response is to say that this is in part due to the growth of microservices architecture, which seems to push data siloing.  But at the same time, this has been the case for a long time, so I don’t think it’s either a necessary or a sufficient explanation.

I’m going to implement in R an example of linear optimization that I found in the book “Modeling and Solving Linear Programming with R” by Jose M. Sallan, Oriol Lordan and Vincenc Fernandez.  The example is named “Production of two models of chairs” and can be found at page 57, section 3.5. I’m going to solve only the first point.

The problem text is the following

A company produces two models of chairs: 4P and 3P. The model 4P needs 4 legs, 1 seat and 1 back. On the other hand, the model 3P needs 3 legs and 1 seat. The company has a initial stock of 200 legs, 500 seats and 100 backs. If the company needs more legs, seats and backs, it can buy standard wood blocks, whose cost is 80 euro per block. The company can produce 10 seats, 20 legs and 2 backs from a standard wood block. The cost of producing the model 4P is 30 euro/chair, meanwhile the cost of the model 3P is 40 euro/chair. Finally, the company informs that the minimum number of chairs to produce is 1000 units per month. Define a linear programming model, which minimizes the total cost (the production costs of the two chairs, plus the buying of new wood blocks).

I remember solving this exact problem (down to the four legs versus three legs bit) in grad school.  We used LINGO to do this, though I haven’t seen that language since.  H/T R-Bloggers

Colin created this animation in R using the gganimate package (available on GitHub from author Thomas Lin Pederson), and the process is delightfully simple. It begins with a chart of 10 “points”, each being the same grey square equally spaced across the shaded background. Then, a simple command animates the transitions from one point to the next, and interpolates between them smoothly:

```library(gganimate)
gg_animated <- gg + transition_time(t) + ease_aes('linear')```

Check it out, both as a parlor trick and a way of getting a grip on the gganimate package.

The changes we’ve made so far would impossible for any package to do for us – how would the package know the plot title? We can now improve the look and feel of the plot. There are two ways of complementary ways of doing this: scales and themes. The ggplot scales control things like colours and point size. In the latest version of ggplot2, version 3.0.0, the Viridis colour palette was introduced. This palette is particularly useful for creating colour-blind friendly palettes

``g + scale_colour_viridis_d() # d for discrete``

With a few lines of code, those default graphs can look a lot nicer.

For most of real-world applications, the JSON AUTO will not give you the control you could need over your file format, for having more control over it, you must use the JSON PATH option, along with the ROOT option as follows:

```SELECT TOP 10 id, dataVarchar, dataNumeric, dataInt, dataDate
FROM [dbo].[MyTestTable]
FOR JSON PATH, ROOT('TestTable')```

Eduardo has several examples along these lines.

Got a question today on how we can do KPI’s in the matrix or table with Power BI, just like we can with PowerPivot through the KPI functionality there.

This is where the great SVG trick from David Eldersveld comes into play, you can read all about it here in his blog posts: https://dataveld.com/2018/01/13/use-svg-images-in-power-bi-part-1 and this post by Chris Webb with even more cool examples.

In this case I am using his trick and a post I found on the community site to add a KPI to my table.

I use elastic pools. They are a fabulous way of saving money when running many Azure SQL Databases, that is assuming you understand the resource utilization patterns of the databases involved.

I recently created couple of elastic jobs for my pool and went through a successful setup (so I thought) and proceeded to execute the query. The query being elastic in nature basically runs across all my databases from a single point. This is where it got puzzling, if you look at the below screen shot you can clearly see that the status is RUNNING, drilling into it, it says RETRYING. This obviously is not right.

Click through to see the root cause of this problem.