Press "Enter" to skip to content

Curated SQL Posts

NetworkD3

Vessy combines Javascript and R to visualize networks:

The networkD3 package provides a function called igraph_to_networkD3, that uses an igraph object to convert it into a format that networkD3 uses to create a network representation. As I used igraph object to store my network, including node and edge properties, I was hoping that I may only need to use this function to create a visualization of my network. However, this function does not work exactly like that (which is not that surprising, given the differences in how D3.js works and how igraph object is defined). Instead, it extracts lists of nodes and edges from the igraph object, but not the information about all node and edges properties (the exception is a priori specified information about nodes membership groups/clusters, which can be derived from one or more network properties, e.g., node degree). Additionally, the igraph_to_networkD3 function does not plot the network itself, but only extracts parameters that are later used in theforceNetwork function that plots the network.

This is the kind of thing I want to see when working with network data.  It doesn’t necessarily scale, but given how well the human eye tracks relationships, this is very useful.

Comments closed

Understanding Page Splits

Wayne Sheffield goes into detail on page splits:

In considering which of these methods is preferred, we need to consider whether page splits impact these methods – especially nasty page splits. Furthermore, how will index maintenance affect each choice? So let’s think this through.

When there are negative values in this column, and the index is rebuilt, there will be a page with both negative and positive values in it. If the identity column is set to (-1, -1), there won’t be a gap (excluding the 0) in the values, and newly added rows will get a new page allocated – a good page split. If the identity column is set to (-2147483648 , 1), then there will be a full page with the records for the most recently used identity value, and with the values starting with 1 – a rather large gap.

This is worth reading in its entirety.

Comments closed

Log Buffers

Mark Broadbent has started a series on transaction durability.  His first topic is the log buffer:

SQL Server is a highly efficient transaction processing platform and nearly every single operation performed by it, is usually first performed within memory. When operations are performed within memory, the need to touch physical resources (such as physical disk IOPS) are also reduced, and reducing the need to touch physical resources means those physical boundaries (and their limitations) have less impact to the overall system performance. Cool right?!

Click through to read more about how log buffers work and why they help improve SQL Server’s performance.

Comments closed

Navigating Complex Tabular Models

Bill Anton has a method for understanding large tabular models on legacy platforms:

Unfortunately, the Tabular Model Explorer is only available for 2016 (compatibility 1200) tabular models – which many folks haven’t moved over to just yet (despite the overwhelming list of reasons why SQL 2016 is one of the best releases in a very long time).

Those of us stuck with 2012/2014 environments have no other option than to comb through the diagram view for that one table we’re looking for…or scan the unordered list of tables/columns in grid view, or arrow-key through a bunch of cells in the calculation pain pane to find a particular measure… or so I thought up until a few weeks ago when I discovered a better way!

At least 60-70% of the DBA population would chafe at the idea that is a “legacy” platform, so maybe I’m overstepping it a little with calling 2014 legacy.  But seriously, 2016 is a huge improvement, well worth the jump.

Comments closed

Sentiment Analysis With Nifi

Timothy Spann ties together a bunch of interesting things with Apache Nifi, including integrations with Twitter, Slack, Tensorflow, and Zeppelin:

First up, I used GetTwitter to read tweets and filtered on these terms:

strata, stratahadoop, strataconf, NIFI, FutureOfData, ApacheNiFi, Hortonworks, Hadoop, ApacheHive, HBase, ApacheSpark, ApacheTez, MachineLearning, ApachePhoenix, ApacheCalcite,ApacheStorm, ApacheAtlas, ApacheKnox, Apache Ranger, HDFS, Apache Pig, Accumulo, Apache Flume, Sqoop, Apache Falcon

Input:

InvokeHttp: I used this to download the first image URL from tweets.

It’s interesting to see this all tie together relatively easily.

Comments closed

Documenting Tabular Models

Meagan Longoria shows how to document a tabular model using Excel and Power BI:

The Database Info tab uses cube functions to display the values shown. The rest of the tabs use pivot tables and slicers.

You can download and use this model if you have Excel 2016 or 2013 with Power Query. Just download the file, change the values in the TabularInstanceName and TabularDBName queries, and refresh. I was able to open the file and update the queries with no issues and no changes in Excel 2013.

This is great if you just need documentation or a quick way to look up specific information. But we could really analyze this using visualizations other than tables in Power BI Desktop.

That’s pretty fancy.  Check it out.

Comments closed

Running U-SQL Jobs

Ginger Grant shows how to run a U-SQL job locally from Visual Studio:

There are a few steps required before any code is run. If the Data Lake Analytics Tools are not installed within Visual Studio, download themhere and install them. When the tools are installed, the menu item Data Lake appears in Visual studio. The second step is to model your PC with the same file structure as your data lake. The default location which the Data Lake tools will look for your data structure is C:\Users\<<insertyourname>>\AppData\Local\USQLDataRoot . What this means is if you have folders and subfolders created in your data lake, your PC needs to have the same structure, including the data.

There is also a way to test these jobs locally before you spend that Azure money spinning up Data Lake jobs.

Comments closed

Rounding

Kenneth Fisher looks at decimal truncation methods:

ROUND

This is the most complicated of the three. It does a standard rounding. If value is .5 or over then you get back 1. If it’s less than .5 you get back 0. On top of that you get to pass the place you want to round to. So for example 0 rounds to the nearest ones place, -1 rounds to the tens place, 2 rounds to the hundredths.

There’s a bonus here:  if you use ROUND in T-SQL, the results are different than the default Round method in .NET; that method uses banker’s rounding by default whereas the T-SQL rounding does not.

Comments closed

MOSL

Niko Neugebauer has a new script library, this time for memory-optimized objects:

Today, I am announcing the initial availability of the brand new library for diving into the internals of SQL Server the MOSL – Memory Optimised Scripts Library, the library that will help you to discover, maintain & troubleshoot the Hekaton in SQL Server 2014, 2016 & Azure SQLDatabase. At the moment, I am releasing the scripts for SQL Server 2014 but at the PASS Summit I am hoping to have all 3 versions online.

I am inviting everyone working with Memory-Optimised Technology to join me in this project, which is targeted on helping everyone who is working or aspiring to work with this amazing and ground-breaking technology.

Check them out if you use memory-optimized tables in your environment.

Comments closed

Graph Analytics With Spark

Mirko Kämpf looks at using GraphFrames on Spark:

Next, we’ll define a DataFrame by loading data from a CSV file, which is stored in HDFS.

Our datafile facebook_combined.txt contains two columns to represent links between network nodes. The first column is called source (src), and the second is the destination (dst) of the link. (Some other systems, such as Gephi, use “source” and “target” instead.)

First we define a custom schema, and than we load the DataFrame, using SQLContext.

It sounds like Spark graph database engines are early in their lifecycle, but they might already be useful for simple analysis.

Comments closed