Press "Enter" to skip to content

Author: Kevin Feasel

HDInsight Tool For IntelliJ

Xiaoyong Zhu introduces the new HDInsight Tool for IntelliJ:

This tools extends IntelliJ to support Spark job life cycle from create, author, debug and submit job to Azure cluster and view results.  This IntelliJ HDInsight tool integrates well with Azure to allow user navigate HDInsight Spark clusters and view associated Azure storage account. To further boost productivity, the IntelliJ HDInsight tool also offers the capability to view Spark job history, display detailed job logs, and the job output to boost developer productivity. A few usability improvements have been implemented upon user preview feedback, which includes auto locate artifact, add intelligence to remember assembly location, caches spark logs, etc.

It looks like this is specifically designed for Spark-enabled clusters.

Comments closed

Machine Learning Packages In R

Khushbu Shah discusses good R packages to help with your machine learning projects:

If missing values are something which haunts you then MICE package is the real friend of yours.

When we face an issue of missing values we generally go ahead with basic imputations such as replacing with 0, replacing with mean, replacing with mode etc. but each of these methods are not versatile and could result into a possible data discrepancy.

MICE package helps you to impute missing values by using multiple techniques, depending on the kind of data you are working with.

I’d heard of a couple of these, but most of them are new to me.

Comments closed

T-SQL Tuesday: SQL 2016

Michael J. Swart is hosting this month’s T-SQL Tuesday:

SQL Server 2016 went RTM this week and so naturally, we’re going to write about it. Here are a few writing prompts for you:

  • Check out what’s new. Microsoft has written a lot about their new features. Thomas Larock has written a really nice landing page for those posts, SQL Server 2016: It Just Runs Faster – Thomas Larock. Look through those links. Do you feel optimistic about 2016? Or maybe a bit disappointed? Let us know either way

  • Haven’t had time to download the bits, install them, explore and form thoughts on 2016 yet? Have no fear, check out Microsoft’s Virtual Labs. It lets you explore features without worrying about all the setup. In minutes you’ll be typing SELECT 'hello world';

Get writing!

Comments closed

SQL Server 2016 Licensing

Slava Murygin has notes on licensing SQL Server 2016:

– Two Major Licensed Editions: Enterprise and Standard;
– Enterprise Edition can be licensed only “By Core”. Standard also available on “Server+CAL*” basis;
– If you have SA** you can still use your old CAL licenses with SQL Server 2016 Enterprise Edition, but will be limited by usage of only 20 Cores on your server;
– Standard Edition is limited by 4 Sockets/16 Cores and 128 Gb of Memory;

Licensing is boring, painful, and ultimately necessary to understand.

Comments closed

Thoughts On Stretch Database

Kevin Hill looks at Stretch database:

  • Lowest performance rate is $1.25/hr or just under $1K/mo. Only goes up from there

  • “Stretch Database currently does not support stretching to another SQL Server. ” Azure only

  • Lame/minimal filters…you have to roll your own functions, and they must be deterministic…no “Getdate() – 30”. This GUI is only slightly better than the horrible nightmare that was Notification Services…

I see the negatives overwhelming the positives at this point.  You also can’t modify schema while Stretch is active.

Comments closed

SQL Data Partners Podcast: The Wide World Of Data

Carlos L. Chacon was nice enough to interview me on his podcast:

The expansion of data sets and increased expectations of businesses for analysis and modeling of data has led developers to create a number of database products to meet those needs. As data professionals, it is incumbent upon us to understand how these tools work and put them to their best use–before somebody else puts them to sub-optimal use.  I am joined by Kevin Feasel who walks us through some of the technologies available and sorts out under what circumstances we want to consider using each one.

Show notes are on the SQL Data Partners podcast site.  My presentation slides are available online.  And if I get just a few more people to dig Aphyr as much as I do, the world will be a better place.

Comments closed

Distributed Transactions With Always-On Availability Groups

Dave Bermingham looks at distributed transactions within Always-On Availability Groups in SQL Server 2016:

In SQL Server 2016, Distributed Transactions are only supported if the transaction is distributed across multiple instances of SQL Server. It is NOT supported if the transaction is distributed between different databases within the same instance of SQL Server. So in the picture above, if the databases are on separate SQL instances it will work, but not if the databases reside on the same instance which is more likely.

This seems like a half-finished job.  We’ll see if Microsoft improves on this later.

Comments closed

Incorporating NiFi Into Brownfield Code

Paul Boal discusses how he incorporated Apache NiFi in an existing process:

Typically, data warehousing and ETL tool vendors recommended that we write your own custom components. After all, the target market for ETL tools is a space where the tools are specifically marketed as reducing the need for “error prone and time consuming” manual coding. When I ran across this tutorial on writing your own NiFi processor it occurred to me that NiFi is the exact opposite. It’s both Open Source and designed for extensibility from the ground up. I found it quite reasonable to write a custom NiFi processor that leverages our existing code base.

The existing code is a Java program with separate classes for each device vendor, all with the same interface to abstract the nuances of each vendor from the main data export program. This interface follows a traditional paradigm: login, query, query, query, logout. Given that my input to NiFi above takes in simple username, password, and query criteria arguments, it seems trivial to create a NiFi processor class that adapts the existing code into the NiFi API. Here’s a slightly abbreviated version of the actual code. (In reality, it’s all of 70 lines of code.)

In almost any realistic scenario, you’re not going to have the opportunity to start from scratch.  You will always have legacy components, external dependencies, and existing user bases to satisfy.  I like this article because it moves forward from that starting point.

Comments closed