Press "Enter" to skip to content

Month: June 2020

Text Customization with ggtext

Abdul Majed Raja shows an example of using the ggtext library:

ggplot2 is go-to R package for anyone who wants to make beautiful static visualizations in R. But most ggplot2 gplots look almost the same and little many data analysts or data scientists care about customizing it, primarily because it’s not very intuitive to do so. That’s where ggplot2 extensions come in very handy. ggtext is an R package (by Claus O. Wilke) that helps in customizing the text present in ggplot2 plots. It could be the text outside the plot canvas or the text (annotation) within the plot canvas.

Click through for the code sample and video. H/T R-Bloggers.

Comments closed

Measuring the User Experience

Kayode Osinusi takes us through different methods for measuring user experience:

Task success rate is one of the most widely used and easily understood UX metrics. It shows the percentage of participants that successfully complete a task and helps designers identify user experience issues. As long as tasks have clearly defined goals, success rates can be measured.

Tasks like completing a signup process or adding a specific item to a shopping cart are well-suited for this metric. Keep in mind, success rate doesn’t explain how well users perform tasks or why they fail them.

Read on for several additional measures and frameworks.

Comments closed

Improving Async Stats Update Concurrency

Dimitri Furman announces a change in Azure SQL Database:

In Azure SQL Database and Azure SQL Managed Instance, the background process that updates statistics asynchronously can now wait for the schema modification lock on a low priority queue. This improves concurrency for workloads with frequent query plan (re)compilations.

New behavior is enabled with the ASYNC_STATS_UPDATE_WAIT_AT_LOW_PRIORITY database-scoped configuration. This feature is currently in public preview.

Dimitri does a good job of explaining what this means and how it can make life a little better for people querying tables with statistics updates.

Comments closed

Handling Messages with Service Broker

Chris Johnson continues a series on Service Broker:

This stored procedure is used by the queue to handle messages that arrive. When a message arrives the queue will execute the procedure over and over until there are no more messages on the queue. Your stored procedure therefore needs to be removing messages as it goes. You have the option to have multiple versions of the query executing at once, to clear down a queue faster or to keep up with a high volume of messages, using the MAX_QUEUE_READERS setting. You can turn the stored procedure on or off using the STATUS, while this is set to OFF nothing will happen but as soon as it is set to ON the query will start processing messages again. Finally you need to specify what user the query will execute under. The options here are SELF, as the current user (the person who runs the CREATE or ALTER script), OWNER, as the person who owns the queue, or a username that the current user has impersonate permissions for.

Chris provides us an example of what a procedure could look like and also covers briefly the concept of poison message handling.

Comments closed

Batch Mode on Rowstore in SQL Server

Monica Rathbun introduces us to one of the biggest internal improvements for SQL Server 2019:

Under compatibility level 150, in both SQL Server 2019 and Azure SQL Database, you now can use batch mode for CPU-bound analytic type workloads without requiring columnstore indexes. There is no action needed to turn on batch mode aside from being on the proper compatibility mode. You also have the ability to enable it as a database scoped configuration option (as shown below), and you can hint individual queries to either use or not use batch mode (also shown below). If you recall in my earlier blogs on columnstore, it is batch mode in conjunction with page compression that drastically increases query performance. This feature, Batch Mode on Rowstore, allows all operators enabled for batch mode to operate as such in SQL Server.

What does this mean? It means that query operations can process data faster, more efficiently and mimic what makes columnstore so fast. Instead of reading row by row (row store mode) it can read in chunks i.e. batches of 900 rows at a time. The performance impact of this can be tremendous which effectively uses CPU more efficiently.

Read on to see the limitations and benefits.

Comments closed

Dynamic Measures in DAX

Kristyna Hughes explains the concept of dynamic measures in DAX:

For example, you may have three major stakeholders looking at a report: CEO, CFO, and COO. The CEO may be more interested in future profits and looks at trends in the number of loads booked to see where we are headed. The CFO wants to see profit trends on loads that have already been completed (actualized profit). Additionally, the COO would rather see what’s currently in transit and look at only picked up loads. You could make three pages, one based on booked loads for the CEO, another based on delivered loads for the CFO, and one based on picked up loads for the COO. But what happens if your report is already going to be multiple pages and needs room to grow? You would need to triple your page count to accommodate the needs of your users or build three completely separate reports! Don’t worry, dynamic measures can solve this problem for you without clogging up your reports or workspaces with extra measures, visuals, and reports.

Click through for the demonstration on how to combine this with a slicer to change reported measures.

Comments closed

Frequently Asked Availability Group Questions

Sean Gallardy answers the voices swarming in his head:

Q3: Do my secondary replicas really need to be the same level of hardware as my primary replicas?
Sean: They should, this assumes you are eventually going to fail over to them or they might need to run your workload. It does *not* assume they will be used for readable workloads. Remember that the secondary replica is already accepting and writing log data, redoing that data, and then if you’re running a bunch of read workloads on top of that it might actually be *busier* than your primary. Yes, I’ve witnessed secondary replicas getting hammered harder than their associated primary.

Read on for informative and entertaining answers, and avoid those rusty nails.

Comments closed

Using Apache Flink in Zeppelin Notebooks

Jeff Zhang walks us through reviewing data streamed through Apache Flink in an Apache Zeppelin notebook:

In this post, we explained how the redesigned Flink interpreter works in Zeppelin 0.9.0 and provided some examples for performing streaming ETL jobs with Flink and Zeppelin. In the next post, I will talk about how to do streaming data visualization via Flink on Zeppelin. Besides that, you can find an additional tutorial for batch processing with Flink on Zeppelin as well as using Flink on Zeppelin for more advance operations like resource isolation, job concurrency & parallelism, multiple Hadoop & Hive environments and more on our series of posts on Medium. And here’s a list of Flink on Zeppelin tutorial videos for your reference.

Click through for the demo, and stay tuned for part 2.

Comments closed

The Architecture of Apache Kafka

Michael Carter walks us through the key components and concepts behind Apache Kafka:

Despite its name’s suggestion of Kafkaesque complexity, Apache Kafka’s architecture actually delivers an easier to understand approach to application messaging than many of the alternatives. Kafka is essentially a commit log with a very simplistic data structure. It just happens to be an exceptionally fault-tolerant and horizontally scalable one.

The Kafka commit log provides a persistent ordered data structure. Records cannot be directly deleted or modified, only appended onto the log. The order of items in Kafka logs is guaranteed. The Kafka cluster creates and updates a partitioned commit log for each topic that exists. All messages sent to the same partition are stored in the order that they arrive. Because of this, the sequence of the records within this commit log structure is ordered and immutable. Kafka also assigns each record a unique sequential ID known as an “offset,” which is used to retrieve data.

Read on to learn more about the key ideas, such as producers, consumers, and partitions.

Comments closed