Press "Enter" to skip to content

Author: Kevin Feasel

Understanding Page Splits

Wayne Sheffield goes into detail on page splits:

In considering which of these methods is preferred, we need to consider whether page splits impact these methods – especially nasty page splits. Furthermore, how will index maintenance affect each choice? So let’s think this through.

When there are negative values in this column, and the index is rebuilt, there will be a page with both negative and positive values in it. If the identity column is set to (-1, -1), there won’t be a gap (excluding the 0) in the values, and newly added rows will get a new page allocated – a good page split. If the identity column is set to (-2147483648 , 1), then there will be a full page with the records for the most recently used identity value, and with the values starting with 1 – a rather large gap.

This is worth reading in its entirety.

Comments closed

Log Buffers

Mark Broadbent has started a series on transaction durability.  His first topic is the log buffer:

SQL Server is a highly efficient transaction processing platform and nearly every single operation performed by it, is usually first performed within memory. When operations are performed within memory, the need to touch physical resources (such as physical disk IOPS) are also reduced, and reducing the need to touch physical resources means those physical boundaries (and their limitations) have less impact to the overall system performance. Cool right?!

Click through to read more about how log buffers work and why they help improve SQL Server’s performance.

Comments closed

Navigating Complex Tabular Models

Bill Anton has a method for understanding large tabular models on legacy platforms:

Unfortunately, the Tabular Model Explorer is only available for 2016 (compatibility 1200) tabular models – which many folks haven’t moved over to just yet (despite the overwhelming list of reasons why SQL 2016 is one of the best releases in a very long time).

Those of us stuck with 2012/2014 environments have no other option than to comb through the diagram view for that one table we’re looking for…or scan the unordered list of tables/columns in grid view, or arrow-key through a bunch of cells in the calculation pain pane to find a particular measure… or so I thought up until a few weeks ago when I discovered a better way!

At least 60-70% of the DBA population would chafe at the idea that is a “legacy” platform, so maybe I’m overstepping it a little with calling 2014 legacy.  But seriously, 2016 is a huge improvement, well worth the jump.

Comments closed

Sentiment Analysis With Nifi

Timothy Spann ties together a bunch of interesting things with Apache Nifi, including integrations with Twitter, Slack, Tensorflow, and Zeppelin:

First up, I used GetTwitter to read tweets and filtered on these terms:

strata, stratahadoop, strataconf, NIFI, FutureOfData, ApacheNiFi, Hortonworks, Hadoop, ApacheHive, HBase, ApacheSpark, ApacheTez, MachineLearning, ApachePhoenix, ApacheCalcite,ApacheStorm, ApacheAtlas, ApacheKnox, Apache Ranger, HDFS, Apache Pig, Accumulo, Apache Flume, Sqoop, Apache Falcon

Input:

InvokeHttp: I used this to download the first image URL from tweets.

It’s interesting to see this all tie together relatively easily.

Comments closed

Documenting Tabular Models

Meagan Longoria shows how to document a tabular model using Excel and Power BI:

The Database Info tab uses cube functions to display the values shown. The rest of the tabs use pivot tables and slicers.

You can download and use this model if you have Excel 2016 or 2013 with Power Query. Just download the file, change the values in the TabularInstanceName and TabularDBName queries, and refresh. I was able to open the file and update the queries with no issues and no changes in Excel 2013.

This is great if you just need documentation or a quick way to look up specific information. But we could really analyze this using visualizations other than tables in Power BI Desktop.

That’s pretty fancy.  Check it out.

Comments closed

Running U-SQL Jobs

Ginger Grant shows how to run a U-SQL job locally from Visual Studio:

There are a few steps required before any code is run. If the Data Lake Analytics Tools are not installed within Visual Studio, download themhere and install them. When the tools are installed, the menu item Data Lake appears in Visual studio. The second step is to model your PC with the same file structure as your data lake. The default location which the Data Lake tools will look for your data structure is C:\Users\<<insertyourname>>\AppData\Local\USQLDataRoot . What this means is if you have folders and subfolders created in your data lake, your PC needs to have the same structure, including the data.

There is also a way to test these jobs locally before you spend that Azure money spinning up Data Lake jobs.

Comments closed

Rounding

Kenneth Fisher looks at decimal truncation methods:

ROUND

This is the most complicated of the three. It does a standard rounding. If value is .5 or over then you get back 1. If it’s less than .5 you get back 0. On top of that you get to pass the place you want to round to. So for example 0 rounds to the nearest ones place, -1 rounds to the tens place, 2 rounds to the hundredths.

There’s a bonus here:  if you use ROUND in T-SQL, the results are different than the default Round method in .NET; that method uses banker’s rounding by default whereas the T-SQL rounding does not.

Comments closed

MOSL

Niko Neugebauer has a new script library, this time for memory-optimized objects:

Today, I am announcing the initial availability of the brand new library for diving into the internals of SQL Server the MOSL – Memory Optimised Scripts Library, the library that will help you to discover, maintain & troubleshoot the Hekaton in SQL Server 2014, 2016 & Azure SQLDatabase. At the moment, I am releasing the scripts for SQL Server 2014 but at the PASS Summit I am hoping to have all 3 versions online.

I am inviting everyone working with Memory-Optimised Technology to join me in this project, which is targeted on helping everyone who is working or aspiring to work with this amazing and ground-breaking technology.

Check them out if you use memory-optimized tables in your environment.

Comments closed

Graph Analytics With Spark

Mirko Kämpf looks at using GraphFrames on Spark:

Next, we’ll define a DataFrame by loading data from a CSV file, which is stored in HDFS.

Our datafile facebook_combined.txt contains two columns to represent links between network nodes. The first column is called source (src), and the second is the destination (dst) of the link. (Some other systems, such as Gephi, use “source” and “target” instead.)

First we define a custom schema, and than we load the DataFrame, using SQLContext.

It sounds like Spark graph database engines are early in their lifecycle, but they might already be useful for simple analysis.

Comments closed

Hadoop’s S3 Support

Steve Loughran and Sanjay Radia give us a history lesson on Hadoop’s support for Amazon S3:

Hadoop’s ability to work with Amazon S3 storage goes back to 2006 and the issue HADOOP-574, “FileSystem implementation for Amazon S3”. This filesystem client, “s3://” implemented an inode-style filesystem atop S3: it could support bigger files than S3 could then support, some its operations (directory rename and delete) were fast. The s3 filesystem allowed Hadoop to be run in Amazon’s EMR infrastructure, using S3 as the persistent store of work. This piece of open source code predated Amazon’s release of EMR, “Elastic MapReduce” by over two years. It’s also notable as the piece of work which gained Tom White, author of “Hadoop, the Definitive Guide”, committer status.

It’s interesting to see how this project has matured over the past decade.

Comments closed