Press "Enter" to skip to content

Author: Kevin Feasel

Rolling A Log Analytics System

Michael Sun and Jeff Shmain put together a log analytics sytem using several technologies:

This is an example of tiered system design. Tiered system is a system design pattern where data is categorized and stored in different data stores that match best to each category. It can both improve performance and lower the cost of a system. One of the most famous tiered system designs is computer memory hierarchy.  In the log analytics use case, analysts mostly search for logs in recent months, but often run batch jobs to get long term trends from logs in recent years. Therefore, recent logs are indexed and stored in Solr for search, while years of logs are stored in HBase for batch processing. As such, the index in Solr is small, which both improves performance and reduces cost, among other benefits.

Although only months of logs are stored in Solr, the logs before that period are stored in HBase and can be indexed on demand for further analysis.

Now that we have covered a high level architecture of a log analytics system, we will dive into more details of individual components.

This looks like a solid architecture for a logging system and can apply to other cases as well.

Comments closed

Advanced Report Design

Paul Turley excerpts a chapter from his new Reporting Services book:

With respect to page layout, reports have two sizing modes: interactive and printable. When users run a report in their web browser and use it interactively, they typically don’t care that much about the page size. This is particularly true with reports that have wide content like a matrix region that can dynamically grow horizontally with the data. When a report is printed or rendered to a print- able format like a PDF or Word file, we need to be mindful about fitting the content on pages.

The report designer does not make page sizing and dimensions particularly obvious so it’s an easy thing to miss. Fortunately, the science behind page sizing is pretty simple. Page dimension properties are grouped into two objects that you can select in the designer; these are shown in Figure 7-1. With the Properties window visible, click outside the report body to show properties for the report. Here you will see the InteractiveSize and PageSize properties. Expand these to see the individual Width and Height properties for each group.

Read on to get the better part of a full chapter’s worth of material.

Comments closed

Introduction To Amazon Kinesis

Jen Underwood describes Amazon Kinesis:

Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis is ideal for Internet of Things (IoT) use cases. It can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, Raspberry Pi gadgets, devices, social media, operational logs, metering data and more.

With Amazon Kinesis, you can build real-time dashboards, capture exceptions, execute algorithms, and generate alerts. With point-and-click menus, you can ingest data, query it and then send output to a variety of destinations including but not limited to Amazon S3, Amazon EMR, Amazon DynamoDB, or Amazon Redshift.

Kinesis is powerful, especially if you’re already locked into the AWS platform.  My preference is Apache Kafka, but Kinesis is definitely worth learning about.

Comments closed

Create Index With Drop_Existing Bug

Kendra Little describes a bug that she encountered in discussions with a reader:

My first thought was that perhaps there is some process that runs against the production system and the test system that goes to sleep with an open transaction, holding an X or an IX lock against this table. If the index create can’t get its shared lock, then it could be part of a blocking chain.

So I asked first if the index create was the head of the blocking chain, or if it was perhaps blocked by something else. The answer came back that no, the index create was NOT blocked. It was holding the shared lock for a long time.

My new friend even sent a screenshot of the index create running against the test instance in sp_WhoIsActive with blocking_session_id null.

Read on for the full story and keep those systems patched.

Comments closed

Aggregate Predicate Pushdown And Data Types

Niko Neugebauer shows an example of how a slightly different data type can cause columnstore queries to be much faster:

Even though they are estimated to cost the same (50% for each one) with the estimated cost of 0.275286 to be more precise in this sense.
To be more precise in the reality you will notice the Aggregate Predicate Pushdown taking place on the first query, while the second query is using the Storage Engine to read out all of the 2 million rows from the table and filter it in the Hash Match iterator.

Actual Number of Locally Aggregated Rows
is the one property on the Columnstore Index Scan iterator that will give you an insight on what happened within the Columnstore Index Scan, since the Aggregate Predicate Pushdown is not shown as a filter on the property. This is not the most fortunate solution as far as I am concerned, but since the 0 rows flowing out of the Columnstore Index Scan will serve as a good indication that Aggregate Predicate Pushdown took place, but if you want to be sure of all the details you will need to check the properties of the involved iterators.

Definitely worth reading.

Comments closed

Understanding DBCC TEC

Ewald Cress explains (but does not document!) an undocumented DBCC command:

Boring old disclaimer: What I am describing here is undocumented, unsupported, likely to change between versions, and will probably make you go blind. In fact, the depth of detail exposed illustrates one reason why Microsoft would not want to document it: if end users of SQL Server found a way to start relying on this not changing, it would hamstring ongoing SQL Server improvement and refactoring.

With that out of the way, let’s dive right into DBCC TEC, a command which can dump a significant chunk of the object tree supporting a SQL Server session. This result is the same thing that shows up within a dump file, namely the output of the CSession::Dump() function – it’s just that you can invoke this through DBCC without taking a dump (cue staring match with Kendra). Until corrected, I shall imagine that TEC stands for Thread Execution Context.

I appreciate Ewald’s ability to make sense out of the madness of database internals.

Comments closed

Using OUTPUT To Get Change Counts

Manoj Pandey shows how to use the OUTPUT clause to determine the number of records inserted, updated, or deleted after a DML statement:

–> Question:

How can I get the numbers of records affected in the Merge statement, INSERT,UPDATE,DELETE separately and store it in a variable so I can get it in the application side?

Thanks !
–> My Answer:

You need to use OUTPUT clause with MERGE statement

Click through for a code sample.  The OUTPUT clause also works for non-MERGE statements like INSERT, UPDATE, and DELETE, though the “get changes by type” problem is really limited to the MERGE statement.

Comments closed

doAzureParallel

JS Tan announces a new R package:

For users of the R language, scaling up their work to take advantage of cloud-based computing has generally been a complex undertaking. We are therefore excited to announce doAzureParallel, a lightweight R package built on Azure Batch that allows you to easily use Azure’s flexible compute resources right from your R session. The doAzureParallel package complements Microsoft R Server and provides the infrastructure you need to run massively parallel simulations on Azure directly from R.

The doAzureParallel package is a parallel backend for the popular foreach package, making it possible to execute multiple processes across a cluster of Azure virtual machines with just a few lines of R code. The package helps you create and manage the cluster in Azure, and register it as a parallel backend to be used with foreach.

It’s an interesting alternative to building beefy R servers.

Comments closed

Operating System Error 3

Stacy Brown provides common reasons for why you might get Operating System Error 3:

Sometimes the users of SQL Backup Master may face the following error while executing the database backup job:

Msg 3201, Level 16, State 1, Line 1
Job Execution Error: Cannot open backup device ‘’ Operating System error 3 (The system cannot find the path specified.)

Now, there can be the various possible reasons behind the occurrence of this error. Therefore, in the following sections, all possible reason with their respective solutions are discussed. A user can refer them to solve this SQL Server operating system error 3(the system cannot find the path specified.)

Click through for solutions to several potential causes of this error.

Comments closed

Explaining DTUs

Andy Mallon explains what a Database Transaction Unit is:

I’d like to point out that the definition of a DTU is that it’s “a blended measure of CPU, memory, and data I/O and transaction log I/O…” None of the perfmon counters used by the DTU Calculator take memory into account, but it is clearly listed in the definition as being part of the calculation. This isn’t necessarily a problem, but it is evidence that the DTU Calculator isn’t going to be perfect.

I’ll upload some synthetic load into the DTU Calculator, and see if I can figure out how that black box works. In fact, I’ll fabricate the CSVs completely so that I can totally control the perfmon numbers that we load into the DTU Calculator. Let’s step through one metric at a time. For each metric, we’ll upload 25 minutes (1500 seconds–I like round numbers) worth of fabricated data, and see how that perfmon data is converted to DTUs.

Andy then goes on to show how the DTU Calculator estimates DTU usage given different resource patterns.  It’s a very interesting process and Andy clarified it considerably.

Comments closed