Press "Enter" to skip to content

Month: January 2020

Using Trellis Charts to Display Small Multiples Over Time

Mike Cisneros shows us the evolution of three-point shooting in the NBA using a trellis chart:

 This small multiple chart shows two variables for each team in the league for each of the last 30 seasons: on the x-axis, the number of 3-pointers attempted per game; on the y-axis, the percent of attempted 3-point shots that were successful. Each point is a single team in a single season. The individual panels step you forward in time as the data changes and evolves. They help you see how the pack of all NBA teams is inexorably moving towards more and more 3-point attempts per game (the data points shift rightwards as you progress through the frames). We can also see that there are no longer any teams with sub-30% shooting percentages on those attempts (illustrated by tighter clustering upwards as you move forward in time).

This is a good way of showing movement over time in a static medium, like a printed page. If you’re giving a presentation, this would probably be a bubble chart with a play axis.

Comments closed

Solving Sudoku with R

Tomaz Kastrun builds a validation function for Sudoku:

Function validater will validate for the sudoku board a particular solution at a particular position:

validater(sudoku, 1, c(1,4))

In matrix, at position x=1, y=4, where there is 0, it will test if number 1 is valid or not. If the number is valid, it returns TRUE (number) to outer function for finding complete solution.

This function iterates through all the possible 0-positions and iterates through solutions that are still available based on the rules:

Click through for that validation function.

Comments closed

Schema Evolution in Kafka

The Hadoop in Real World group takes us through schema changes in Apache Kafka:

Meetup.com went live with this new way of distributing RSVPs – that is through Kafka. Both the producer and consumer agrees on the Schema and everything is great. It is silly to think that the schema would stay like that forever. Let’s say meetup.com didn’t feel the value in providing member_id field and removes it. What do you think will happen – will it affect consumers? 

member_id field doesn’t have a default value and it is considered a required column so this change will affect the consumers. When a producer removes a required field, the consumer will see an error something like below –

Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 63
Caused by: org.apache.avro.AvroTypeException: found com.hirw.kafkaschemaregistry.producer.Rsvp,
expecting com.hirw.kafkaschemaregistry.producer.Rsvp, missing required field member_id

This is an interesting review of the schema registry in Kafka and what the different modes allow for.

Comments closed

Loading Event Hubs from Cosmos DB

Annie Xu shows us how we can use Azure Functions to take data from Cosmos DB and populate Event Hubs:

One way to load data from Cosmos DB to Event hub is to use Azure Function. But although there is many coding samples out there to create such Azure Function. If you are like me do not have much application development experience, reading those code samples is bit channenging. Luckly, Azure Portal made is so easy.

Annie has a step-by-step walkthrough which makes it easy.

Comments closed

Toggling Query Results with Azure Data Studio

Taiob Ali shows us how you can hide the results and messages panes in Azure Data Studio:

I know CTRL+R works for the same when using SQL Server Management Studio (SSMS). That hotkey did not work in Azure Data Studio (ADS). Knowing how brilliant technologists Derik is, I realized that a standard google search will not work.

I know ADS is an open-source project. Someone might have requested this feature already.

That functionality is in there, as Taiob shows.

Comments closed

Failed to Open Loopback Connection

Steve Stedman diagnoses an error:

Recently while working on a SQL Server for a client I came across the following error when I attempted to view the SQL Server logs in SSMS.

From there I decided I needed to track this down, so I attempted to run the sp_readerrorlog stored procedure, which also failed, but it provided more informaiton.

The full error message was:

Msg 22004, Level 16, State 1, Line 0

Failed to open loopback connection. Please see event log for more information.

Msg 22004, Level 16, State 1, Line 0

Error log location not found.

There are several potential solutions, and the one Steve landed on definitely was not my first thought.

Comments closed

Monitoring Power BI On-Premises Gateways

Jeff Pries shows us how we can monitor Power BI Enterprise Gateways:

To tie all of these steps together, I put together a brief C# console application which is capable of using stored credentials or prompting for credentials interactively, then performs the 4 steps listed above. Finally, the application receives the JSON response and de-serializes it and outputs it in a friendly way to the screen (or optionally inserts it into a SQL database table.

For the basics on how I perform the authentication and API query in the application, see my blog post here.

There are a few more steps than I would have anticipated.

Comments closed

Investigating the Big Data Cluster Data Pool

Mohammad Darab takes us through Big Data Cluster data pools:

Data pools enable the creation of scale-out data marts. Whether your data is being ingested from Spark jobs or SQL, it is stored into the data pool. Data is distributed across one, or two, SQL Server instances running queries against it is more efficient.

Whether the data is being ingested from IoT device, Kafka, another relational data source (like Oracle or Teradata), it all is stored into the data pool instances and are available as “data marts” for the consumer to work with. There is no need to go back out to the original data source each time you want to query the data. It is all available inside the data pool instances.

This lets you cache data brought in via PolyBase and spread it across a number of instances. That’s pretty powerful.

Comments closed

Don’t Miss These Settings in Azure SQL DB

Arun Sirpal takes us through a few things administrators tend to miss in Azure SQL Database:

2. Allow Azure Services and resources to access this server setting set to on/off?

I always set this to off. I do not like it ON.

Why? Because I like to control things via vnets (maybe IPs if really needed – it depends on your solution). Nowadays you can use private endpoint connections which allow connections from within a vnet to a private IP.  Sure, you may want to use IP addresses, if you do then I suggest database level firewall rules over server level, especially if you use failover groups.

There are several good ones here.

Comments closed

Static Analysis of Hadoop Libraries

Maxim Stefanov ran a static analysis of several Hadoop libraries and here are the findings:

After the analysis was completed, I chose the most interesting warnings and noticed that I had the same number of warnings in production code and in tests. Normally, I don’t consider analyzer warnings from tests. But when I divided them, I couldn’t leave ‘tests’ warnings unattended. “Why not take a look at them,” I thought, “because bugs in tests might also have adverse consequences.” They can lead to incorrect or partial testing, or even to mishmash. 

After I selected the most intriguing warnings, I divided them by the following groups: production, test and the four main Hadoop modules. And now I’m glad to offer the review of analyzer warnings.  

Read on for the list. Hopefully Maxim submitted a few pull requests or at least Jira tickets for the projects.

Comments closed