January 2017 – Page 18

Fortunately, the .NET SqlClient (and other ODBC drivers as well) has a built-in solution. Your application’s connection string has quite a few parameters available to provide configuration and information, and one that seems to get overlooked is Application Name. This one does exactly what it says on the tin – it lets you specify a name that will be displayed to anyone looking for it in SQL Server, including sp_whoisactive. Anyplace you have the ability to write a connection string, you can use this. It costs you nothing!

You can also start getting fancy with resource governor as well, segmenting pools based on application name.

Comments closed

Where Azure Analysis Services Fits

Published 2017-01-03 by Kevin Feasel

Melissa Coates explains where Azure Analysis Services fits in common BI architectures:

(2) Data Sources

From a single source such as a data warehouse. This is the most traditional path for BI development, and still has a very valid place in many BI/analytics deployments. This scenario puts the work of data integration on the ETL process into the data warehouse, which is the most appropriate place.
Directly from various systems. This can be done, but works well only in specific cases – it definitely won’t work well if there are a lot of highly normalized tables, or if there’s not a straightforward way to relate the disparate data together. Trying to go directly to the source systems & skip an intermediary data warehouse puts the “integration” burden on the data source view in Analysis Services, so plan for plenty of time testing if you’re going to try this route (i.e., it can be much harder, not easier). Note that this option only makes sense if the data is stored in Analysis Services because it needs to be related together somehow (i.e., DirectQuery mode, discussed next in #3, with > 1 data source won’t work if a user tries to combine data sources because the data is not inherently related).

If you’re thinking about Azure Analysis Services, this post is a good one.

Comments closed

Testing Transactional Replication

Published 2017-01-03 by Kevin Feasel

Jes Borland wraps up her series on transactional replication from an on-prem Availability Group to Azure SQL Database:

Congratulations, you’ve configured a remote distributor, configured all of your AG replicas as publishers, and configured your SQL Database as a subscriber! Now you want to ensure that transactions are replicating to the database, and that they continue to do so if there is a failover in the AG.

Read on for the two testing scenarios.

Comments closed

Columnstore Elimination

Published 2017-01-03 by Kevin Feasel

Sunil Agarwal has a two-part series on columnstore data elimination. First up is column elimination:

Now, let us run the same query on the table with clustered columnstore index as shown in the picture below. Note, that the logical IOs for the LOB data is reduced by 3/4^th for the second query as only one column needs to be fetched. You may wonder why LOB? Well, the data in each column is compressed and then is stored as BLOB. Another point to note is that the query with columnstore index runs much faster, 25x for the first query and 4x for the second query.

Next up is rowgroup elimination:

In the context of rowgroup elimination, let us revisit the previous example with sales data
You may not even need partitioning to filter the rows for the current quarter as rows are inserted in the SalesDate order allowing SQL Server to pick the rowgroups that contain the rows for the requested date range.
If you need to filter the data for a specific region within a quarter, you can partition the columnstore index at quarterly boundary and then load the data into each partition after sorting on the region. If the incoming data is not sorted on region, you can follow the steps (a) switch out the partition into a staging table T1 (b) drop the clustered columnstore index (CCI) on the T1 and create clustered btree index on T1 on column ‘region’ to order the data (c) now create the CCI while dropping the existing clustered index. A general recommendation is to create CCI with DOP=1 to keep the prefect ordering.

From these two articles, queries which hit a small percentage of columns and stick to a relatively small number of rowgroups will likely perform better. For people who understand normal B-tree indexes, the second point seems clear enough, but the first point is at least as important.

Comments closed

Automatic Soft-NUMA In SQL Server

Published 2017-01-03 by Kevin Feasel

Robert Davis wants to find information on soft-NUMA in his SQL Server instance:

So having read up on automatic soft-NUMA, I was eager to see what it did with my main production servers when I upgraded them. My main pair of production servers (they are paired into an Availability Group) have 4 NUMA nodes with 16 physical cores per node and hyperthreading for a total of 32 logical cores per node with 1.5 TB of RAM. Obviously, we are using core-based Enterprise Edition for these servers. I thought I knew what automatic soft-NUMA would do, and wanted to confirm if my expectations were right.

Read on, but it looks like there’s a “to be continued…” here.

Comments closed

Split Query Processing In Polybase

Published 2017-01-02 by Kevin Feasel

David DeWitt, et al, describe the Polybase engine in an academic article:

When compiling a SQL query that references an external table stored in an HDFS file, the PDW Engine Service contacts the Hadoop Namenode for information about the file. This information, combined with the number of DMS instances in the PDW cluster, is used to calculate the portion (offset and length) of the input file(s) each DMS instance should read from HDFS. This information is passed to DMS in the HDFS Shuffle step of the DSQL (distributed SQL) plan along with other information needed to read the file, including the file’s path, the location of the appropriate Namenode, and the name of the RecordReader that the bridge should use.
The system attempts to evenly balance the number of bytes read by each DMS instance. Once the DMS instances obtain split information from the Namenode, each can independently read the portion of the file it is assigned, directly communicating with the appropriate Datanodes without any centralized control.

This is a very clear paper which helps describe the core constructs of Polybase. Highly recommended.

Comments closed

Dijkstra’s Algorithm

Published 2017-01-02 by Kevin Feasel

One of the most important algorithms for graphs is Dijkstra’s Algorithm. Melissa Yan has a nice presentation on it, which I recommend reading before the paper itself, which is only a couple pages long.

Comments closed

No Curation Today

Published 2017-01-02 by Kevin Feasel

Today is New Year’s Day observed, so instead of linking to blog posts, there will be a couple links to academic papers coming up.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Month: January 2017

Tracking Applications

Where Azure Analysis Services Fits

(2) Data Sources

Testing Transactional Replication

Columnstore Elimination

Automatic Soft-NUMA In SQL Server

Split Query Processing In Polybase

Dijkstra’s Algorithm

No Curation Today