Press "Enter" to skip to content

Author: Kevin Feasel

IoT Versus Event Hub

James Serra clarifies the differences between Azure’s IoT Hub and its Event Hub:

The majority of the time, if the data is coming directly from the devices, either directly or via a field-based gateway, IoT Hub will be the more appropriate choice.  Event Hub will generally be the more appropriate choice if either the data will not be coming to Azure directly from the devices, but rather either cloud-to-cloud through another provider, intra-cloud, or if the data is already landing on-premise and needs to be streamed to the cloud from a small number of endpoints internally.  There are exceptions to both conditions, of course.

Both solutions offer very high throughput data ingestion and can handle tremendous streaming data volumes.  In fact, today, IoT Hub is primarily a set of additional services that wrap an underlying Event Hub.

Read on for more scenarios and limitations in each.  They definitely serve different use cases.

Comments closed

Understanding Deadlock Priority

Kenneth Fisher explains deadlock priority:

Everyone deals with deadlocks from time to time. But sometimes we need to control who’s the deadlock victim and who isn’t. For example, I’m doing a big delete on a table in a 24×7 environment, I can’t afford downtime to do it so I’m doing my delete in small chunks to reduce transaction size and blocking time. My delete needs to happen but I’m in no hurry and I really can’t afford to deadlock some other transaction. So how do I make sure?

Or on the other hand, I’m running an update that absolutely has to happen right now. It’s going to take a bit and I can’t afford the time for it to be started over. A deadlock would be a disaster. What do I do?

That’s where deadlock priority comes into play.

Click through for the explanation.

Comments closed

Jepsen: MongoDB 3.4.0-rc3

Kyle Kingsbury takes a new look at MongoDB:

In April 2015, we discussed stale and dirty reads in MongoDB 2.6.7. However, writes appeared to be safe; update-only workloads with majority write concern were linearizable. This conclusion was not entirely correct. In this Jepsen analysis, we develop new tests which show the MongoDB v0 replication protocol is intrinsically unsafe, allowing the loss of majority-committed documents. In addition, we show that the new v1 replication protocol has multiple bugs, allowing data loss in all versions up to MongoDB 3.2.11 and 3.4.0-rc4. While the v0 protocol remains broken, patches for v1 are available in MongoDB 3.2.12 and 3.4.0, and now pass the expanded Jepsen test suite. This work was funded by MongoDB, and conducted in accordance with the Jepsen ethics policy.

Mongo has grown up when it comes to data integrity, though be sure you’re using the v1 replication protocol.

Comments closed

Dual KPI Custom Visual

Adam Saxton has a quick video demonstrating a dual KPI custom visual:

The Dual KPI efficiently visualizes two measures over time. It shows their trend based on a joint timeline, while absolute values may use different scales, for example Profit and Market share or Sales and Profit.

Each KPI can be visualized as line chart or area chart. The visual has dynamic behavior and can show historical value and the change from the latest value when you hover over it. It also has small icons and labels to convey KPI definitions and alerts about data freshness.

I looks cool, but I dunno; my philosophy is that man cannot serve two KPIs.

Comments closed

Gap Analysis Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Gap Analysis Power BI Custom Visual.  The Gap Analysis visual is used to analyze the difference between two different groups of data you have.  For example, you might use it to analyze the gap between two answers people gave in survey response data.

I like the gap analysis visual; it works well as a cross-category comparison visual, giving you an idea of the relative importance of each category as well as the change from one time period to the next.  It’s a good way of fitting two useful pieces of information into the same visual.

Comments closed

Query Store Space Allocation

Grant Fritchey demonstrates how Query Store allocates disk space:

I love the questions I get while I’m presenting because they force me to think and learn. The question in the title is one I received recently. The answer, now that I’m not standing in front of people, is easy. Of course the space is not pre-allocated. Query Store tables are just system tables. They have a limit on how big they can grow (100mb by default), but that space isn’t going to be pre-allocated in any way. The space will just get used as and when it’s needed, just like any other system table. However, don’t take my word for it, let’s prove that.

Read on for the proof.

Comments closed

BCP Output In JSON Line-Delimited Format

Jovan Popovic shows how to use FOR JSON PATH to output rows in a table to JSON line-delimited format:

Although this is not a valid JSON format, many system use it to exchange data.

One advantage of line-delimited JSON format compared to the standard JSON is the fact that you can append new JSON objects at the end of the file without removing closing array bracket as in the standard JSON.

This might be a niche use case, but I’m sure that in this post-XML-all-the-things era, this is more common than you might first expect.

Comments closed

Looping In R

Klodian Dhana explains how to build a for loop in R:

I used linear mixed effect model and therefore I loaded the lme4 library. The loop should work with other regression analysis (i.e. linear regression), if you modify it according to your regression model. If you don’t know which part to modify, leave a comment below and I will try to help.

As other loops, this call variables of interest one by one and for each of them extract and store the betas, standard error and p value. Remember, this code is specific for linear mixed effect models.

Read the whole thing.  It’s good to keep in mind, though, that set-based R operations tend to perform best, so save looping for cases in which you can’t build a set-based function.

Comments closed

Table Variables And TF2453

Tara Kizer investigates Trace Flag 2453:

I recently saw a server with trace flag 2453 configured. I hadn’t come across this trace flag before, so I did a little research. Microsoft says it allows “a table variable to trigger recompile when enough number of rows are changed”. This can lead to a more efficient execution plan. Trace flag 2453 is available in SP2 or greater for SQL Server 2012, CU3 or greater for SQL Server 2014 and RTM or greater for SQL Server 2016.

I was curious how a query using a table variable performed as compared to the “same” query using:

  • trace flag 2453

  • OPTION (RECOMPILE)

  • a temporary table

Click through for a relative performance comparison.

Comments closed

Mapping Geospatial Data

The folks at Sharp Sight Labs have a great blog post on mapping geospatial data using R:

If you’ve learned the basics of data visualization in R (namely, ggplot2) and you’re interested in geospatial visualization, use this as a small, narrowly-defined exercize to practice some intermediate skills.

There are at least three things that you can learn and practice with this visualization:

  1. Learn about color: Part of what makes this visualization compelling are the colors. Notice that in the area surrounding the US, we’re not using pure black, but a dark grey. For the title, we’re not using white, but a medium grey. Also, notice that for the rivers, we’re not using “blue” but a very specific hexadecimal color. These are all deliberate choices. As an exercise, I highly recommend modifying the colors. Play around a bit and see how changing the colors changes the “feel” of the visualization.

  2. Learn to build visualizations in layers: I’ve emphasized this several times recently, but layering is an important principle of data visualization. Notice that we’re layering the river data over the USA country map. As an exercise, you could also layer in the state boundaries between the country map and the rivers. To do this, you can use map_data().

  3. Learn about ‘Spatial’ data: R has several classes for dealing with ‘geospatial’ data, such as ‘SpatialLines‘, ‘SpatialPoints‘, and others. Spatial data is a whole different animal, so you’ll have to learn its structure. This example will give you a little experience dealing with it.

I also like the iterative approach they discuss.  You’ll almost never get it right the first go-around, but one of the nice things about ggplot2 is that it’s designed to be iterative:  you layer your changes on, making it a bit easier to fiddle with them to get the visualization just right.

Comments closed