Cross-Container Transactions With Memory-Optimized Objects

Ned Otter continues his series on In-Memory OLTP isolation levels:

Why will it fail?

It will fail because the initiation mode of this transaction is not autocommit, which is required for READ COMMITED SNAPSHOT when referencing memory-optimized tables (the initiation mode is explicit, because we explicitly defined a transaction).  So to be totally clear, for queries that only reference memory-optimized tables, we can use the READ COMMITTED or READ COMMITTED SNAPSHOT isolation levels, but the transaction initiation mode must be autocommit. Keep this in mind, because in a moment, you’ll be questioning that statement….

There are some interesting implications that Ned teases out, so I recommend giving it a careful read.

A dplyr Quiz

John Mount wants to know how well you understand dplyr:

dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible?

dplyr makes sense to those of us who use it a lot. And we can teach part time R users a lot of the common good use patterns.

But, is it an easy task to study and characterize dplyr itself?

Take John’s quiz and find out.  He wasn’t kidding about it being an advanced quiz.

Power BI Dot Plot Custom Visual

Devin Knight continues his Power BI custom visuals series:

In this module you will learn how to use the Dot Plot Custom Visual by MAQ Software.  The Dot Plot is used to show distribution using dots among multiple categories or attributes.

Dot plots are nice because they tend to be informative while keeping the ink to whitespace ratio low.

Basics Of Dplyr

Dave Mason is dipping his toes into the R waters:

I think my first exposure to R was at PASS Summit 2016. Since then, I’ve made an effort to attend R sessions at SQL Saturdays. The one commonality I seem to find in all of them is a demo with (or mention of) the dplyr package. It’s a package of functions that manipulate data in data frame objects (think of them as SQL Server/relational tables…or if you’re a .NET developer, a System.Data.DataTable object). R feels inexorably tied to dplyr at this early stage for me. R is probably way more vast than I realize, but what would it be without dplyr? Would it still be as popular? Would it still be as powerful?

What’s It Good For

I’m not sure if I’m perceiving this the right way yet, but dplyr sure feels a lot like LINQ, a .NET Framework technology that provides query-like capability for C#. For instance, you can select a subset of objects from an array, sort them, find a minimum or maximum, etc. It’s kind of like querying SQL Server, just without SQL Server.

I like the comparison of dplyr against LINQ, as they’re both data querying and transformation tools whose motif is a series of functions chained together.

Powershell: Text Search In A Table Value

Shane O’Neill clarifies a misunderstanding in Powershell:

and you are running the following PowerShell command to check if the results contain a value…

1
2
3
$String = "abc"
$Array = @(Invoke-Sqlcmd -ServerInstance "SQLServer" -Database "Database" -Query "SELECT code FROM dbo.users")
$Array.Contains($string)

It will return FALSE.

Now we know that the FALSE is false because we know that the string is in there!
This code is proven to work with arrays as stated here by the “Hey, Scripting Guy!”s so this was getting filed under “WTF PowerShell”

It’s a good post, so check it out and remember that DataRows aren’t strings.

Getting Started With SQL Server For Free

Jeff Mlakar points out some free resources for getting started with SQL Server:

The basics of the “Learn SQL Server Starter Pack”:

  1. SQL Server 2016 DE
    1. You can get the Developer Edition (DE) for…wait for it…FREE!
    2. Out in the wild you’ll see mostly Standard Edition (SE) or Enterprise Edition (EE). The great thing about DE is that it is identical to EE (it has all the features) in every aspect except that it cannot be licensed on a production machine. It must only be used for TEST or DEV environments. For home lab purposes you can use it as your development environment and have access to all the features to learn on!
    3. Download it here – SQL Server Downloads
    4. While you are here get used to reading the release notes and what is new in the version. You don’t need to understand everything in here right away but get used to the jargon and how Microsoft describes their features.
  2. Windows Server 2016 Evaluation Edition
    1. Download Windows Server 2016 Evaluation Edition
    2. You can evaluate the software for 180 days then will need to activate it. Then you can try to register for another eval and try again
  3. Virtual Box
    1. Virtual Box is a free, simple, and reliable virtualization tool. You’ll be able to do a lot to get started and build up your virtualization knowledge with this.
    2. Download the latest version of Virtual Box
    3. You don’t need to know very much about hypervisors and such – Virtual Box is very easy to learn with good documentation.

Evaluation versions are good for learning because they force you to tear down and rebuild your environment!

Jeff then links to a number of free resources to help out with the learning experience.

A Simple Example With Spark And Kafka

Gary Dusbabek has a nice example showing how to build a simple application with Spark and Kafka:

This is a hands-on tutorial that can be followed along by anyone with programming experience. If your programming skills are rusty, or you are technically minded but new to programming, we have done our best to make this tutorial approachable. Still, there are a few prerequisites in terms of knowledge and tools.

The following tools will be used:

  • Git—to manage and clone source code

  • Docker—to run some services in containers

  • Java 8 (Oracle JDK)—programming language and a runtime (execution) environment used by Maven and Scala

  • Maven 3—to compile the code we write

  • Some kind of code editor or IDE—we used the community edition of IntelliJ while creating this tutorial

  • Scala—programming language that uses the Java runtime. All examples are written using Scala 2.12. Note: You do not need to download Scala.

The Hello World of streaming apps is a Twitter client.

Talking To Secure Hadoop Clusters

Mubashir Kazia shows how to connect to a secured Hadoop cluster using Active Directory:

The primary form of strong authentication used on a secure cluster is Kerberos. Kerberos supports credentials delegation where a server process to which a user has authenticated, can perform actions on behalf of the user. This involves the server process accessing databases or other web services as the authenticated user. Historically the form of delegation that was supported by Kerberos is now called “full delegation”. In this type of delegation, the Ticket Granting Ticket (TGT) of the user is made available to the server process and server can then authenticate to any service where the user has been granted authorization. Until recently most Kerberos Key Distribution Center(KDC)s other than Active Directory supported only this form of delegation. Also Java until Java 7 supported only this form of delegation. Starting with Java 8, Java now supports Kerberos constrained delegation (S4U2Proxy), where if the KDC supports it, it is possible to specify which particular services the server process can be delegated access to.

Hadoop within its security framework has implemented impersonation or proxy support that is independent of Kerberos delegation. With Hadoop impersonation support you can assign certain accounts proxy privileges where the proxy accounts can access Hadoop resources or run jobs on behalf of other users. We can restrict proxy privileges granted to a proxy account to act on behalf of only certain users who are members of certain groups and/or only for connections originating from certain hosts. However we can’t restrict the proxy privileges to only certain services within the cluster.

What we are discussing in this article is how to setup Kerberos constrained delegation and access a secure cluster. The example here involves Apache Tomcat, however you can easily extend this to other Java Application Servers.

This is a good article showing specific details on using Kerberos in applications connecting to Hadoop.

Power Query: Joining On Date Ranges

Reza Rad shows how to build merge joins between date ranges in Power Query:

Customer’s table has the history details of changes through the time. For example, the customer ID 2, has a track of change. John was living in Sydney for a period of time, then moved to Melbourne after that.

The problem we are trying to solve is to join these two tables based on their customer ID, and find out the City related to that for that specific period of time. We have to check the Date field from Sales Table to fit into FromDate and ToDate of the Customer table.

This is a common type 2 SCD scenario.  I’d be concerned that this solution would not work with large data sets which may already be pushing the size limits of the Vertipaq engine.

K Nearest Cliques

Vincent Granville explains an algorithm built around finding cliques of data points:

The cliques considered here are defined by circles (in two dimensions) or spheres (in three dimensions.) In the most basic version, we have one clique for each cluster, and the clique is defined as the smallest circle containing a pre-specified proportion p of the points from the cluster in question. If the clusters are well separated, we can even use p = 1. We define the density of a clique as the number of points per unit area. In general, we want to build cliques with high density.

Ideally, we want each cluster in the training set to be covered by a small number of (possibly slightly overlapping) cliques, each one having a high density.  Also, as a general rule, a training set point can only belong to one clique, and (ideally) to only one cluster. But the circles associated with two cliques are allowed to overlap.

It’s an interesting approach, and I can see how it’d be faster than K Nearest Neighbors, but I do wonder how accurate the results would be in comparison to KNN.

Categories

August 2017
MTWTFSS
« Jul  
 123456
78910111213
14151617181920
21222324252627
28293031