Kevin Feasel – Page 1543

Automatic Soft-NUMA In SQL Server

Published 2017-01-03 by Kevin Feasel

Robert Davis wants to find information on soft-NUMA in his SQL Server instance:

So having read up on automatic soft-NUMA, I was eager to see what it did with my main production servers when I upgraded them. My main pair of production servers (they are paired into an Availability Group) have 4 NUMA nodes with 16 physical cores per node and hyperthreading for a total of 32 logical cores per node with 1.5 TB of RAM. Obviously, we are using core-based Enterprise Edition for these servers. I thought I knew what automatic soft-NUMA would do, and wanted to confirm if my expectations were right.

Read on, but it looks like there’s a “to be continued…” here.

Comments closed

Split Query Processing In Polybase

Published 2017-01-02 by Kevin Feasel

David DeWitt, et al, describe the Polybase engine in an academic article:

When compiling a SQL query that references an external table stored in an HDFS file, the PDW Engine Service contacts the Hadoop Namenode for information about the file. This information, combined with the number of DMS instances in the PDW cluster, is used to calculate the portion (offset and length) of the input file(s) each DMS instance should read from HDFS. This information is passed to DMS in the HDFS Shuffle step of the DSQL (distributed SQL) plan along with other information needed to read the file, including the file’s path, the location of the appropriate Namenode, and the name of the RecordReader that the bridge should use.

The system attempts to evenly balance the number of bytes read by each DMS instance. Once the DMS instances obtain split information from the Namenode, each can independently read the portion of the file it is assigned, directly communicating with the appropriate Datanodes without any centralized control.

This is a very clear paper which helps describe the core constructs of Polybase. Highly recommended.

Comments closed

Dijkstra’s Algorithm

Published 2017-01-02 by Kevin Feasel

One of the most important algorithms for graphs is Dijkstra’s Algorithm. Melissa Yan has a nice presentation on it, which I recommend reading before the paper itself, which is only a couple pages long.

Comments closed

No Curation Today

Published 2017-01-02 by Kevin Feasel

Today is New Year’s Day observed, so instead of linking to blog posts, there will be a couple links to academic papers coming up.

Comments closed

Forecasting Restaurant Inspection Failures

Published 2016-12-30 by Kevin Feasel

David Smith writes about an R model which predicts which restaurants are more likely to fail inspection:

Chicago’s Department of Public Health used the R language to build and deploy the model, and made the code available as an open source project on GitHub. The reasons given are twofold:

An open source approach helps build a foundation for other models attempting to forecast violations at food establishments. The analytic code is written in R, an open source, widely-known programming language for statisticians. There is no need for expensive software licenses to view and run this code.

Read on for more details and check out their GitHub repo.

Comments closed

7 Visualizations In R

Published 2016-12-30 by Kevin Feasel

Dikesh Jariwala provides sample R code for seven common visualizations:

In your day-to-day activities, you’ll come across the below listed 7 charts most of the time.

Scatter Plot

Histogram

Bar & Stack Bar Chart

Box Plot

Area Chart

Heat Map

Correlogram

We’ll use ‘Big Mart data’ example as shown below to understand how to create visualizations in R. You can download the full dataset from here.

That’s a nice set of visuals, covering a broad swath of potential visualization scenarios.

Comments closed

SQL Server Regex

Published 2016-12-30 by Kevin Feasel

Dev Nambi has a new open-source project:

Databases store text, and the best way to manipulate text is to use a regular expression (‘regex’). Using regular expressions in SQL queries has been possible in many database engines for decades.

Now you can use regular expressions in SQL Server queries, too. I’ve created an open-source project, sql-server-regex, that lets you run regular expressions in T-SQL queries using scalar and table-valued functions.

This is a set of CLR functions which use the built-in .NET regular expressions functionality. That makes it pretty easy to see how the code works.

Comments closed

Power Saving Mode

Published 2016-12-30 by Kevin Feasel

Mike Walsh recommends ensuring your servers are not using any form of power saving mode:

Balanced power mode has a major impact on SQL Server performance. Simply put, you’ve told Windows Server (through Control Panel) or your server hardware (through BIOS settings) to sacrifice a few performance-minded things for the sake of using a little less power. In fact, in plenty of studies and blog posts by folks in the community (including this post by Glenn Berry), you can see the effect of CPU power saving, especially. In essence, the CPUs will run at a lower clock multiplier when demand isn’t deemed high enough, and that clock speed will only increase when demand is high enough for long enough. This results in a slower CPU speed during normal operations. That setting actually works well on my laptop or tablet when I want to conserve battery life and don’t have a workload that is sensitive to CPU speed.

For a SQL Server, though? That is horrible for performance. Windows balanced mode also can cause other components to run slower or behave differently than when the server is running in High Performance mode. For instance, USB ports can be underpowered and network interfaces can be under-powered or even go to sleep. Frankly, for a SQL Serve, nothing good comes out of these modes.

Read on for a few methods for checking whether your servers are affected.

Comments closed

SSRS Data Preview

Published 2016-12-30 by Kevin Feasel

Kathi Kellenberger points out a potential risk with the new Data Preview functionality in SQL Server Reporting Services 2016:

One of the features that took me by surprise is the ability to view data directly from a shared dataset. This feature is called Data Preview, and is available to anyone who has permission to view the dataset and the security at the data source works out. I’m not sure how often shared datasets have been used in previous versions of SSRS. They were not actually needed in many cases, and I generally recommended them for datasets that would be frequently reused such as common parameter lists. This advice will have to change with 2016, because shared datasets are required for the new KPI reports and Mobile Reports. Stored credentials will be used in the data sources in many cases, because Kerberos delegation is not supported yet with Mobile Reports.

This is a potential data leakage scenario, so if you have potentially sensitive data sets, you’ll want to read this post.

Comments closed

Polybase: Inserting Into Azure Blob Storage

Published 2016-12-30 by Kevin Feasel

I have a post which uses Polybase to insert into Azure Blob Storage:

One additional question I have involves whether the process for loading data is round-robin on a row-by-row basis. My conjecture is that it is not (particularly given that our first example had 4 files with zero records in them!), but I figured I’d create a new table and test. In this case, I’m using three fixed-width data types and loading 10 million identical records. I chose to use identical record values to make sure that the text length of the columns in this line were exactly the same; the reason is that we’re taking data out of SQL Server (where an int is stored in a 4-byte block) and converting that int to a string (where each numeric value in the int is stored as a one-byte character). I chose 10 million because I now that’s well above the cutoff point for data to go into each of the eight files, so if there’s special logic to handle tiny row counts, I’d get past it.

Read on for the exciting(?) conclusion.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Author: Kevin Feasel