Distributed Transactions With Always-On Availability Groups

Dave Bermingham looks at distributed transactions within Always-On Availability Groups in SQL Server 2016:

In SQL Server 2016, Distributed Transactions are only supported if the transaction is distributed across multiple instances of SQL Server. It is NOT supported if the transaction is distributed between different databases within the same instance of SQL Server. So in the picture above, if the databases are on separate SQL instances it will work, but not if the databases reside on the same instance which is more likely.

This seems like a half-finished job.  We’ll see if Microsoft improves on this later.

Incorporating NiFi Into Brownfield Code

Kevin Feasel

2016-06-07

ETL, Hadoop

Paul Boal discusses how he incorporated Apache NiFi in an existing process:

Typically, data warehousing and ETL tool vendors recommended that we write your own custom components. After all, the target market for ETL tools is a space where the tools are specifically marketed as reducing the need for “error prone and time consuming” manual coding. When I ran across this tutorial on writing your own NiFi processor it occurred to me that NiFi is the exact opposite. It’s both Open Source and designed for extensibility from the ground up. I found it quite reasonable to write a custom NiFi processor that leverages our existing code base.

The existing code is a Java program with separate classes for each device vendor, all with the same interface to abstract the nuances of each vendor from the main data export program. This interface follows a traditional paradigm: login, query, query, query, logout. Given that my input to NiFi above takes in simple username, password, and query criteria arguments, it seems trivial to create a NiFi processor class that adapts the existing code into the NiFi API. Here’s a slightly abbreviated version of the actual code. (In reality, it’s all of 70 lines of code.)

In almost any realistic scenario, you’re not going to have the opportunity to start from scratch.  You will always have legacy components, external dependencies, and existing user bases to satisfy.  I like this article because it moves forward from that starting point.

Starting Extended Events Is Just As Fast

Erin Stellato shows she can create an Extended Event as fast as a Profiler trace:

I haven’t gotten a ton of comments, but I did get a few (thank you to those have responded!), and I decided to take one of them and create a Trace and create an Extended Events session and see how long it took for each.  Jonathan has mentioned before that he can create an XE session as fast as a Trace, and I’ve been thinking that I can as well so I thought I’d test it.  It’s a straight-forward Trace versus Extended Events test.  Want to see what’s faster?  Watch the video here.

I love the “I would pop up the timer on the screen but I don’t know how to do that” bit; very Friday afternoonish.

Qlik Sold For $3 Billion

Alex Woodie reports that Qlik Technologies has been acquired by a private equity firm:

After loading data into a server-based associative, in-memory database, Qlik customers could explore the data in a variety of ways from an AJAX Web GUI, enabling them to create and publish all sorts of reports and dashboards. The approach is not entirely dissimilar to the one taken by its rival, Tableau Software, which has also benefited from the big data boom and the democratization of BI.

The combination of market forces and a keen eye for product development were propellant for growth at Qlik. In 2009, the Radnor, Pennsylvania-based company had 11,400 customers and $157 million in revenues. By 2010, it had grown to 13,000 customers and had an IPO. By 2015, the company boasted 37,000 customers, $612 million in revenue, and a market cap north of $2.8 billion.

Qlik is definitely one of the big players in the visualization market, which includes Tableau, and Power BI/SSRS in Gartner’s Leaders quadrant and a bunch of competitors nipping at their heels.

Getting Started With Security Analytics

Michael Schiebel has an introduction to the thought process behind security analytics:

Now, we’re getting somewhere.  Looking at this graph we see we have four high-level problems we are trying to solve.

  1. (Unknown/Unknown) The first step in realizing that we have a problem is accepting that we may not have the answer.  We may not have the right mental or computational models; or even the right data to find bad things.

  2. (Known/Unknown) We’ve invested time and energy brainstorming what could happen, sought out and collected the data we believe will help, and created mental and conceptual models that SHOULD detect/visualize these bad things.  Now, we need to hunt and seek to see if we’re right.

  3. (Unknown/Known) We’ve been hunting and seeking for some time tuning and training our analytical models until they can automatically detect this new bad thing. Now we need to spend some time formalizing our response process to this new use case.

  4. (Known/Known) Great, we’ve matured this use case to a point that we can trust our ability to detect; maybe even to the point of efficient rules/signatures.  We have mature response playbooks written for our SOC analysts to follow.  Now we can feel comfortable enough to design and implement an automated response for this use case.

I think his breakdown is correct, and also would reiterate that within any organization, all four zones come into play, meaning you have different teams of people working concurrently; you’ll never automate away all the problems.

Remove Chart Clutter

Melissa Yu provides advice on improving your data visualization skills:

Common chart clutter items include:

  • 3-dimensional effects

  • Dark gridlines (use soft gray gridlines or eliminate gridlines when possible)

  • Overuse of bright, bold colors

  • Unnecessary use of all uppercase text (uppercase text is only necessary when calling attention to an element)

Basically, remove every visualization “feature” that Excel 97 gave you…

Power BI Tables Without Data Sources

Chris Webb shows how to create a table in Power BI’s M language without a backing data source:

No data source is needed – this is a way of defining a table value in pure M code. The first parameter of the function takes a list of column names as text values; the second parameter is a list of lists, where each list in the list contains the values on each row in the table.

In the last example the columns in the table were of the data type Any (the ABC123 icon in each column header tells you this), which means that they can contain values of any data type including numbers, text, dates or even other tables. Here’s an example of this

This is a helpful trick.

San Francisco Crime Analysis

Vimal Natarajan shows off some R charts using crime incident data:

By analyzing the plot above, we can arrive at the following insights:

  • The number of crimes steadily decline from midnight and are at the lowest during the early morning hours and then they start increasing and peak around 6 PM in the evening. This is the same insight we arrived in my previous analysis but here we have categorized by the Police district and still see the same pattern.

  • As seen in the previous plot, Park and Richmond districts have the lowest number of crimes throughout the day.

  • As highlighted in red in the plot above, the maximum number of crimes happens in Southern district around 6 PM in the evening.

I would prefer to see code here, but it does serve to give you an idea of what R can do.

Downloading SQL Express 2016

Dave Mason tries out SQL Server Express 2016:

I’m not a fan of the filename “SQLEXPRADV_x64_ENU.exe”. It’s not very descriptive IMO. But if you hover your mouse over the file, there’s a helpful file description tool tip. I’ll probably rename the file anyway.

The download process has changed significantly and I have to admit I’m surprised that I like it so much. I can be set in my ways and averse to change. But once I launched that initial “SQLServer2016-SSEI-Expr.exe” download, everything made sense.

Think back to SQL Server 2012 Express. Remember the “Choose the download you want” dialog? Those file names aren’t very intuitive. I had to Google them every time to make sure I picked the right one. It was slightly better for SQL Server 2014 Express. But still. Yuck!

Sounds like they’ve improved the download experience for Express edition.

Lipwig

Kevin Feasel

2016-06-07

Hadoop

Peter Coates shows how to make Hive EXPLAIN plans a lot prettier:

As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of analyze, turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.

The output of EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.

I’m going to have to keep this in mind.

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930