Curated SQL – Page 1671 – A Fine Slice Of SQL Server

TempDB And Parallelism

Published 2017-01-04 by Kevin Feasel

Kendra Little looks at cases when a query uses multiple tempdb data files:

As you might guess, things may not always get evenly accessed, even if you have evenly sized tempdb files. One of my queries did a select into a temp table. Although it used all four tempdb files whether or not it went parallel, there were more file_read events against the first tempdb file than against the other four.

It’s an interesting look at this specific question, but also as pedagogical technique.

Comments closed

BatchMode Execution

Published 2017-01-04 by Kevin Feasel

Sunil Agarwal describes BatchMode execution with columnstore indexes:

You may be wondering what is this magic number 900 rows within a batch? Well, when executing a query in BatchMode, SQL Server allocates a 64k bytes structure to group the rows. The number of rows in this structure can vary between 64 to 900 depending upon number of columns selected. For the example above, there are two columns that are referenced and X marks the rows that qualified in the BatchMode structure shown in the picture below. If SCAN is part of a bigger query execution tree, the pointer to this structure is passed to the next operator for further processing. Not all operators can be executed in BatchMode. Please refer to Industry leading analtyics query performance for details on BatchMode Operators.

Under the right circumstances, BatchMode execution can be a major performance benefit.

Comments closed

Extended Events Viewer Behind

Published 2017-01-04 by Kevin Feasel

Dave Mason looks at an issue in which the Extended Events viewer seems to be consistently behind:

The database_detached event was clearly received by the ring_buffer and event_file targets. But if we return to the “Live Data” tab/grid in SSMS, we still have nothing. I don’t have an explanation for this. I know, that’s not exactly gratifying. Nonetheless, let’s press on.

Read on for alternatives.

Comments closed

Virtual Function Calls

Published 2017-01-03 by Kevin Feasel

Ewald Cress is thinking about virtual function calls:

A virtual function call, on the other hand, is only resolved at runtime. The compiler literally does not know what address is going to get called, and neither does the runtime except in the heat of the moment, because that is going to depend on the type of the object instance that the function is called on. Bear with me, I’ll try and simplify.

A C++ object is just a little chunk of memory: a bunch of related instance variables if you like. All objects of the same class have the same structure in this regard. If you’re wondering about functions (a.k.a. methods), these belong to the class, or put differently, to ALL objects of that class. Once compiled, each method is a chunk of memory with a known address, containing the compiled instructions.

From there, it’s a harrowing journey through bigger layers of indirection.

Comments closed

Spark 2.1

Published 2017-01-03 by Kevin Feasel

Reynold Xin announces Apache Spark 2.1:

Structured Streaming

Introduced in Spark 2.0, Structured Streaming is a high-level API for building continuous applications. The main goal is to make it easier to build end-to-end streaming applications, which integrate with storage, serving systems, and batch jobs in a consistent and fault-tolerant way.
- Event-time watermarks: This change lets applications hint to the system when events are considered “too late” and allows the system to bound internal state tracking late events.
- Support for all file-based formats and all file-based features: With these improvements, Structured Streaming can read and write all file-based formats, e.g. JSON, text, Avro, CSV. In addition, all file-based features—e.g. partitioned files and bucketing—are supported on all formats.
- Apache Kafka 0.10: This adds native support for Kafka 0.10, including manual assignment of starting offsets and rate limiting.

This is a pretty hefty release. Click through to read the whole thing.

Comments closed

Ten Notes On SparkR

Published 2017-01-03 by Kevin Feasel

Neil Dewar has a notebook with ten important things when migrating from R to SparkR:

Apache Spark Building Blocks. A high-level overview of Spark describes what is available for the R user.
SparkContext, SQLContext, and SparkSession. In Spark 1.x, SparkContext and SQLContext let you access Spark. In Spark 2.x, SparkSession becomes the primary method.
A DataFrame or a data.frame? Spark’s distributed DataFrame is different from R’s local data.frame. Knowing the differences lets you avoid simple mistakes.
Distributed Processing 101. Understanding the mechanics of Big Data processing helps you write efficient code—and not blow up your cluster’s master node.
Function Masking. Like all R libraries, SparkR masks some functions.
Specifying Rows. With Big Data and Spark, you generally select rows in DataFrames differently than in local R data.frames.
Sampling. Sample data in the right way, and use it as a tool for converting between big and small data.
Machine Learning. SparkR has a growing library of distributed ML algorithms.
Visualization.It can be hard to visualize big data, but there are tricks and tools which help.
Understanding Error Messages. For R users, Spark error messages can be daunting. Knowing how to parse them helps you find the relevant parts.

I highly recommend checking out the notebook.

Comments closed

Non-Trusted Foreign Keys

Published 2017-01-03 by Kevin Feasel

Daniel Janik explains what happens when you don’t have trusted foreign key constraints:

Why is it untrusted? Perhaps we disabled the check to load data and neglected to re-enable it?

No matter what the reason is the next part is not as simple. This is for two reasons.

The data in the child table may not be valid. Since the key was not being checked I may have data in my table that isn’t represented in the parent.
The syntax is a bit silly. As Mike Byrd in Austin, TX says, Microsoft studders. The syntax to reenable is “CHECK CHECK”. Let’s look at how we reenable the Address key check.

Read on for pros and cons of disabling (or not trusting) foreign key constraints.

Comments closed

Tracking Applications

Published 2017-01-03 by Kevin Feasel

Andy Levy explains how to use connection strings to track which application is hogging database resources:

Fortunately, the .NET SqlClient (and other ODBC drivers as well) has a built-in solution. Your application’s connection string has quite a few parameters available to provide configuration and information, and one that seems to get overlooked is Application Name. This one does exactly what it says on the tin – it lets you specify a name that will be displayed to anyone looking for it in SQL Server, including sp_whoisactive. Anyplace you have the ability to write a connection string, you can use this. It costs you nothing!

You can also start getting fancy with resource governor as well, segmenting pools based on application name.

Comments closed

Where Azure Analysis Services Fits

Published 2017-01-03 by Kevin Feasel

Melissa Coates explains where Azure Analysis Services fits in common BI architectures:

(2) Data Sources

From a single source such as a data warehouse. This is the most traditional path for BI development, and still has a very valid place in many BI/analytics deployments. This scenario puts the work of data integration on the ETL process into the data warehouse, which is the most appropriate place.
Directly from various systems. This can be done, but works well only in specific cases – it definitely won’t work well if there are a lot of highly normalized tables, or if there’s not a straightforward way to relate the disparate data together. Trying to go directly to the source systems & skip an intermediary data warehouse puts the “integration” burden on the data source view in Analysis Services, so plan for plenty of time testing if you’re going to try this route (i.e., it can be much harder, not easier). Note that this option only makes sense if the data is stored in Analysis Services because it needs to be related together somehow (i.e., DirectQuery mode, discussed next in #3, with > 1 data source won’t work if a user tries to combine data sources because the data is not inherently related).

If you’re thinking about Azure Analysis Services, this post is a good one.

Comments closed

Testing Transactional Replication

Published 2017-01-03 by Kevin Feasel

Jes Borland wraps up her series on transactional replication from an on-prem Availability Group to Azure SQL Database:

Congratulations, you’ve configured a remote distributor, configured all of your AG replicas as publishers, and configured your SQL Database as a subscriber! Now you want to ensure that transactions are replicating to the database, and that they continue to do so if there is a failover in the AG.

Read on for the two testing scenarios.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts

Structured Streaming

(2) Data Sources