Press "Enter" to skip to content

Author: Kevin Feasel

Stoppable, Async Shiny Interfaces

Ian at Fells Stats wants to make a long-running Shiny app a bit more user-friendly:

Shiny operates in a reactive programming framework. Fundamentally this means that any time any UI element that affects the result changes, so does the result. This happens automatically, with your analysis code running every time a widget is changed. In a lot of cases, this is exactly what you want and it makes Shiny programs concise and easy to make; however in the case of long running processes, this can lead to frozen UI elements and a frustrating user experience.

The easiest solution is to use an Action Button and only run the analysis code when the action button is clicked. Another important component is to provide your user with feedback as to how long the analysis is going to take. Shiny has nice built in progress indicators that allow you to do this.

There are a couple of false starts in there but by the time you reach the third act, the story makes sense.  H/T R-Bloggers

Comments closed

Classes And Vectors In R

Dave Mason continues his journey toward learning R.  He looks next at the class() function:

Note the value assigned to horse_power is a whole number (integer) and the value assigned to miles_per_gallon is a rational number. But R tells us they are both of the “numeric” class. R does have an integer class. A variable’s class will be an integer if the value is followed by a capital “L”. Let’s reassign a value to horse_power to demonstrate:

> horse_power <- 240L
> class(horse_power)
[1] "integer"

Another way to determine the class of a variable is to use one of the is.*() functions. For example, is.integer() and is.numeric() tell us the miles_per_gallon is not an integer, and is a numeric:

> is.integer(miles_per_gallon)
[1] FALSE
> is.numeric(miles_per_gallon)
[1] TRUE

There’s also the typeof() function and the mode() function, and all three can differ under certain circumstances.

Next up, Dave hits vectors, the simplest of the interesting data types in R:

It’s important to know that the elements of a vector must be of the same class (data type). If the values passed to the c() function are of different classes, some of them will be coerced to a different class to ensure all classes of the vector are the same. Below, the parameter classes passed to the c() function include character, numeric, and integer. The corresponding numeric and integer parameter values are coerced to character within the vector:

> some_data <- c("a", "b", 7.5, 25L)
> some_data
[1] "a"   "b"   "7.5" "25" 
>

Read on for more about vectors.

Comments closed

Configuring SQL Server Management Studio

Brent Ozar shares his configuration settings for SQL Server Management Studio:

Under Query Results, SQL Server, Results to Grid, I change my XML data size to unlimited so that it brings back giant query plans. (Man, does my job suck sometimes.)

A lot of presenters like to check the box for “Display results in a separate tab” and “Switch to results tab after the query executes” because this gives them more screen real estate for the query and results. I’m just really comfortable with Control-R to hide the results pane.

And I just went and removed a bunch of menu bar icons I never use…  Good advice from Brent.

Comments closed

Pivoting And Unpivoting Data In T-SQL

Jeanne Combrinck shows how to use the PIVOT and UNPIVOT operators in SQL Server:

One thing that I still get confused about writing is pivot queries. I find myself needing to lookup the syntax every time. Basically you use Pivot and Unpivot to change the output of a table. If you would like rows turned into columns you can use pivot and for the opposite you can use unpivot.

One thing to note is the column identifiers in the unpivot clause follow the catalog collation. For SQL Database, the collation is always SQL_Latin_General_CP1_CI_AS. For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC. If the column is combined with other columns, then a collate clause (COLLATE DATABASE_DEFAULT) is required to avoid conflicts.

Click through for an example of each.

Comments closed

Blocking A Truncate Statement

Arun Sirpal shows that the TRUNCATE command needs to take locks like any other data modification command:

The truncate option is fast and efficient but did you know that it takes a certain lock where you could actually be blocked?

What am I talking about? When you issue a truncate it takes a Sch-M lock and it uses this when it is moving the allocation units to the deferred drop queue. So if it takes this lock and you look at the locking compatibility matrix below you will see what can cause a conflict (C).

Arun includes an image which shows what can block what, and also shows us an example.

Comments closed

The Blocking Monitoring Framework

Dmitri Korotkevitch announces a new tool:

Troubleshooting of the blocking and concurrency issues is, in the nutshells, a simple process. You need to identify the processes involved in blocking conditions or deadlocks and analyze why those processes acquire the locks on the same resources. In majority of cases, you need to analyze queries and their execution plans identifying possible inefficiencies that led to excessive number of locks being acquired.

Collecting this information is not a trivial task. The information is exposed through DMVs (you can download the set of scripts here); however, it requires you to run the queries at time when blocking occurred. Fortunately, SQL Server allows you to capture blocking and deadlock conditions with the blocked process report and deadlock graph, analyzing them later.

There is the caveat though. Neither blocked process report nor deadlock graph provide you execution plans of the statements. Nor do they always include affected statements in the plain text. You may need to query plan cache and other DMVs to get this information and longer you wait lesser is the chance that the information is available. Moreover, SQL Server may generate enormous number of blocked process reports in cases of prolonged blocking and complex blocking chains, which complicates the analysis.

Confirmed to work with SQL Server 2012 and later, but might work on earlier versions as well.  Dmitri has released it to the public, so check it out.

Comments closed

In Defense Of Inline Table-Valued Functions

Riley Major defends the honor of inline table-valued functions:

So no, user-defined functions are not the devil. Scalar user-defined functions can cause big problems if misused, but generally inline user-defined functions do not cause problems.

The real rule of thumb is not to avoid functions, but rather to avoid adorning your index fields with logic or functions. Because when you hide your intentions from the optimizer with complex syntax, you risk not getting the better performing index seek.

Riley shows an example where his inline table-valued UDF was just as efficient an execution plan as without the UDF.

Comments closed

Things Not To Do In SQL Server

Randolph West has a how-not-to guide for SQL Server:

Don’t use TIMESTAMP

We covered this in detail in a previous post, What about TIMESTAMP? It’s better to pretend that this data type doesn’t exist.

Why not?

It is not what you think it is. TIMESTAMP is actually a row version value based on the amount of time since SQL Server was started. If you need to record an actual date and time, use DATETIME2 instead.

When should we?

Never.

I appreciate that Randolph includes a “when should you not listen to my overall pronouncement?” bit, as there are commonly exceptions to “do not do X” style rules.

Comments closed

Joining Streams Of Data

Chuck Blake gives an example of joining two streams of data together in Wallaroo:

The joining event streams pattern takes multiple data pipelines and joins them to produce a new signal message that can be acted upon by a later process.

This pattern can is used in a variety of use cases. Here are a few examples:

  • Merging data for an individual across a variety of social media accounts.

  • Merging click data from a variety of devices (e.g. mobile and desktop) for an individual user.

  • Tracking locations of delivery vehicles and assets that need to be delivered.

  • Monitoring electronic trading activity for clients on a variety of trading venues.

Conceptually, it’s very similar to normal join operations, but there is a time element which complicates things.

Comments closed

Confluent Platform 5.0 Released

Raj Jain and Michael Noll walk through the latest version of Confluent Platform, Confluent’s Kafka solution:

With Confluent Platform 5.0, operators can secure infrastructure using the new, easy-to-use LDAP authorizer plugin and can deliver faster disaster recovery (DR) thanks to automatic offset translation in Confluent Replicator. In Confluent Control Center, operators can now view broker configurations and inspect consumer lag to ensure that they are getting the most out of Kafka and that applications are performing as expected.

We have also introduced advanced capabilities for developers. In Confluent Control Center, developers can now better understand the data in Kafka topics due to the new topic inspection feature and Confluent Schema Registry integration. Control Center presents a new graphical user interface (GUI) for writing KSQL, making stream processing more effortless and intuitive as well. The latest version of KSQL itself introduces exciting additions, such as support for nested data, user-defined functions (UDFs), new types of joins and an enhanced REST API. Furthermore, Confluent Platform 5.0 includes the new Confluent MQTT Proxy for easier Internet of Things (IoT) integration with Kafka. The latest release is built on Apache Kafka 2.0, which features several new functionalities and performance improvements.

Looks like there have been some nice incremental improvements here.

Comments closed