Press "Enter" to skip to content

Curated SQL Posts

What Is Kafka?

I start a new series on Apache Kafka:

The broker serves several purposes:

  1. Know who the producers are and who the consumers are.  This way, the producers don’t care who exactly consumes a message and aren’t responsible for the message after they hand it off.
  2. Buffer for performance.  If the consumers are a little slow at the moment but don’t usually get overwhelmed, that’s okay—messages can sit with the broker until the consumer is ready to fetch.
  3. Let us scale out more easily.  Need to add more producers?  That’s fine—tell the broker who they are.  Need to add consumers?  Same thing.
  4. What about when a consumer goes down?  That’s the same as problem #2:  hold their messages until they’re ready again.

So brokers add a bit of complexity, but they solve some important problems.  The nice part about a broker is that it doesn’t need to know anything about the messages, only who is supposed to receive it.

This is an introduction to the product and part one of an eight-part series.

Comments closed

Dashboards Or Reports

Reza Rad compares dashboards to reports:

Dashboard: General

Stephen Few‘s definition of Dashboard: A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.

Report: General

A Report on the other hand is any informational work. This information can be at any format. Table, Chart, text, number or anything else.

Reza then ties it back to Power BI, showing how to take advantage of both of these concepts.

Comments closed

Interactive Graphics With ggiraph

David Smith sheds some light on the ggiraph project:

R’s ggplot2 package is a well-known tool for producing beautiful static data visualizations that you can include in a printed report. But what if you want to include a ggplot2 graphic on a webpage and provide the ability for the user to interact with the data? The ggiraph package by David Gohel  (available for installation via CRAN). WIth ggiraph, you can take an existing ggplot2 bar chart, scatterplot, boxplot, map, or many other types of chart and add one or both of the following iteractions:

  • Display a tooltip of your choice (e.g. data values or labels) when the cursor hovers over sections of the chart

  • Perform an action (a javascript function you provide: jump to another page, for example) when the viewer clicks on an element of the chart

I like it.

Comments closed

Identity As A Service

Cristian Satnic argues that we should look at Identity as a Service solutions for our applications:

What exactly is Azure Active Directory B2C?

  • Cloud identity service with support for social accounts and app-specific (local) accounts

  • For enterprises and ISVs building consumer facing web, mobile & native apps

  • Builds on Azure Active Directory – a global identity service serving hundreds of millions of users and billions of sign-ins per day (same directory system used by Microsoft online properties – Office 365, XBox Live and so on)

  • Worldwide, highly-available, geo-redundant service – globally distributed directory across all of Microsoft Azure’s datacenters

I am a big fan of OAuth and making it easy for line-of-business developers to deal with authentication (lest they get harebrained ideas like rolling their own encryption algorithms).

Comments closed

Word Cloud Visual

Devin Knight shows off the word cloud custom visual in Power BI:

Key Takeaways

  • Great for parsing unstructured data

  • Utilize stop words to remove commonly used filler words like a, the, an, etc…

    • You can use the default stop word that are provided and add your own that you would like to remove from the visual.
  • The size of the words in the visual tell you how frequently the word is used.

Cf. yesterday’s word cloud example.  I’m not sure how truly valuable word clouds are for visualization purposes, but at least they’re fun to peruse.

Comments closed

Nonclustered Columnstore Indexes On Indexed Views

Niko Neugebauer notes that non-clustered columnstore indexes can now sit on top of indexed views, as of SQL Server 2016:

From the perspective of the disk access, this is where you will definitely win at least a couple of times with the amount of the disk access while processing the information, amount of memory that you will need to store and process (think hashing and sorting for the late materialisation phases), and you will pay less for the occupied storage.

Another noticeable thing was that the memory grants for the Indexed Views query was smaller compared to the query that was processing the original columnstore table FactOnlineSales.

Clustered indexes are currently not available as an option; we’ll see if that changes in the next version of SQL Server.

Comments closed

Auditing In Power BI

Ginger Grant shows off some of the auditing capabilities within Power BI:

As you can see by looking at the available Power BI options, there are a number of options to choose from. If you select the top item PowerBI activities, then everything gets selected. After doing that click outside of the menu for the menu to go away. Select a date and time range of your choosing, select specific users if you wish, then click on the Search button. Depending on how big your date range is, this may take some time to load. Once you see the results, you have the ability to filter as well.

Another day, another two dozen new Power BI features…  This one’s a good one.

Comments closed

Word Clouds In Python

Allison Tharp shows how to generate a word cloud using Python:

Every week, someone on Reddit posts a “word cloud” on all of the NFL team’s subreddits.  These word clouds show the most used words on that subreddit for the week (the larger the word, the more it was used).  These word plots are always really fascinating to me, so I wanted to try to make some for myself.  In this tutorial, we’ll be making the following word cloud from my board game stats twitter feed, @BGGStats

Looks like the implementation is fairly straightforward, so check it out.

Comments closed

Azure Data Lake Updates

Saveen Reddy points out a few updates to Azure Data Lake Store & the Azure Data Lake Analytics portal:

Use Custom Delimeters when Previewing Files

Previously, we had supported comma, colon, space, tab, ampersand, and bar delimiters. With the many different kinds of files used in Azure Data Lake Store and Azure Storage, we’ve added a “Custom” delimiter options for you to define your own delimiter.

To change the delimiter on the Azure Portal:

  1. Open the file you want to preview using Data Explorer.

  2. Click on Format

  3. Under Delimiter, click the dropdown and change it to Custom

  4. A new Custom Delimiter field will appear, type in your delimiter here

  5. Click OK

Read on for more updates.

Comments closed

Change With Automatic Stats Update

Jack Li notes that SQL Server 2016 has changed when automatic statistics update gets called:

Old threshold: it takes 20% of row changes before auto update stats kicks (there are some tweaks for small tables, for large tables, 20% change is needed).  For a table with 100 million rows, it requires 20 million row change for auto stats to kick in. For vast majority of large tables, auto stats basically doesn’t do much.

New threshold: Starting SQL 2008 R2 SP1, we introduced a trace flag 2371 to control auto update statistics better (new threshold).  Under trace flag 2371, percentage of changes requires is dramatically reduced with large tables.  In other words, trace flag 2371 can cause more frequent update.  This new threshold is off by default and is enabled by the trace flag.  But in SQL 2016, this new threshold is enabled by default for a database with compatibility level 130.

Important to know.

Comments closed