Author: Kevin Feasel

Don’t Install Hadoop on Windows

Published 2020-05-11 by Kevin Feasel

A few days ago, I published the installation guides for Hadoop, Hive, and Pig on Windows 10. And yesterday, I finished installing and configuring the ecosystem. The only consequence I have is that “Think 1000 times before installing Hadoop and related technologies on Windows!”.

The biggest problem is that Microsoft got flaky about this. Back in 2012-2013, they backed running Hadoop on Windows as part of getting HDInsight up and running. I even remember the HDInsight emulator which could run on a local desktop. By 2014 or so, they shifted directions and decided it wasn’t worth the effort. Because Apache Spark (which does have pretty decent Windows support, at least for development) really wants Hive, you can fake it with winutils.

Comments closed

foldLeft and foldRight in Scala

Published 2020-05-11 by Kevin Feasel

Sarfaraz Hussain explains the difference between foldLeft and foldRight in Scala:

The fold method is a Higher Order Function in Scala and it has two variant namely,
i. foldLeft
ii. foldRight
In this blog, we will look into them in detail and try to understand how they work.
Before moving ahead, I want to clarify that the fold method is just a wrapper to foldLeft, i.e. the fold method internally invokes the foldLeft method. So, now let’s get started.

Folding is an extremely powerful technique for getting rid of loops in code. Being comfortable with folding is (in my eyes) one of the signs which indicate that you’ve reached a mid-level understanding of functional programming.

Comments closed

Change Tracking and Internal Tables

Published 2020-05-11 by Kevin Feasel

Tim Weigel continues a series on change tracking:

In my last post, I showed you how to configure change tracking at the table level and how to get configuration information about change tracking from the database engine. We looked at sys.change_tracking_databases and sys.change_tracking_tables, and looked at some sample scripts that present the information in a more readable format.
Before moving on to working with change tracking, I’d like to show you a little bit about how SQL Server handles change tracking data under the hood. Let’s take a few minutes to talk about sys.internal_tables, sys.dm_tran_commit_table, and sys.syscommittab. These aren’t objects that most DBAs interact with on a routine basis, but they’re useful for understanding how change tracking does what it does.

Click through to learn more about these internal tables.

Comments closed

The Pains of Database Restoration

Published 2020-05-11 by Kevin Feasel

Stuart Moore covers some of the pains of database restoration in two posts. First, why dbatools’ Restore-DbaDatabase is a complicated as it is:

At first glance Restore-DbaDatabase looks like a slow lumberig complex beast. In reality it’s not that bad.
It’s the result of design decisions I took in wanting a solid versatile command that could cope with everything that people would want from it.
In this post, we’ll go through the main decisions/points of contention one by one

Stuart then covers the limitations of Restore-DbaDatabase:

Like all tools, Restore-DbaDatabase isn’t able to do everything that everyone wants it to. Certainly, at the moment I’d like it to write it’s own blog posts and fetch me a cold beer, but that doesn’t happen
A lot of the below isn’t complaining about people asking for features. If we can do it, we will, and we’re keen to make this work for as many people in as many situations as possible
But quite a few requests over the years have been non starters for a number of reasons.

Read them both; they’re part of Stuart’s 31 Days of Backup and Restore with dbatools series.

Comments closed

Patching SQL Server in Docker Containers

Published 2020-05-11 by Kevin Feasel

Rob Farley takes us through updating SQL Server when it lives in a container:

Now, the thing with running SQL in containers is that the concept of downloading a patch file doesn’t work in the same way. If it were regular Linux, the commands would be very simple, such as ‘sudo yum update mssql-server’ in RHEL. But Docker doesn’t quite work the same way, as reflected by the Microsoft documentation which mentions Docker for installing but not in the Update section.

Rob then explains the process. Containers are cattle, not pets. Just make sure your data files live outside the container before you blow it away…

Comments closed

Don’t Use sys.dm_hadr_cluster_members for Quorum Info

Published 2020-05-11 by Kevin Feasel

Sean Gallardy explains a limitation of sys.dm_hadr_cluster_members:

I’ve now run across a few different instances where the monitoring for quorum was done via this DMV. On the surface, it seems like nothing would be wrong with using the “number_of_quorum_votes” column to check on the members of the cluster and see their voting status. However, this isn’t quite the case… you see there are various mechanisms that influence whether or not a member (or witness) has a vote and these continue to be expanded in each version of WSFC.

Click through for a short history lesson as well as some good advice on how accurately to get this information.

Comments closed

Determining Statistics Utilization

Published 2020-05-11 by Kevin Feasel

Deborah Melkin shows us how to see if a particular statistic is in use:

You know those tweets that you see once but can never find again? I remember seeing one a while ago where someone tweeted to #sqlhelp asking if the internal inserted and deleted tables had statistics or if they were like table variables, which didn’t.
This is a great question in general. But then it got me thinking – how do you prove this? I wanted to know the answer as well so I decided to look into this. And I went down the wrong sort of rabbit hole trying to figure this out. Eventually I talked to a friend about this and got pointed in the right direction…
And the answer to how you find which statistics are used is…?

Read on for the answer and several examples.

Comments closed

Using Postman with Power BI’s REST API

Published 2020-05-11 by Kevin Feasel

David Eldersveld takes us through the Power BI REST API:

Postman is a valuable tool to work with APIs, especially when testing and making ad hoc requests outside of an automated production solution. In terms of where a Power BI developer may find Postman useful, it sits somewhere between the documentation’s “Try It” functionality and a more production-worthy solution incorporating tools like Azure DevOps, Logic Apps/Power Automate, a Power BI custom connector, etc.
The ideas in this post extend an original post from Carl de Souza. Carl shows how to obtain an OAuth2 access token but does so with hardcoded values. Additional API requests use the token from the original response, but he also manually provides this token to those subsequent API calls.

David has a clever technique for getting the bearer token, so check it out.

Comments closed

Project Metamorphosis: Elastic Kafka Clusters

Published 2020-05-08 by Kevin Feasel

Jay Kreps explains what Confluent has been up to lately:

What is Project Metamorphosis?
Let me try to explain. I think there are two big shifts happening in the world of data right now, and Project Metamorphosis is an attempt to bring those two things together.
The first one, and the one that Confluent is known for, is the move to event streaming.
Event streams are a real revolution in how we think about and use data, and we think they are going to be at the core of one of the most important data platforms in a modern company. Our goal at Confluent is to build the infrastructure that makes that possible and help the world take advantage of it. That’s why we exist.
But event streaming isn’t the only paradigm shift we’re in the midst of. The other change comes from the movement to the cloud.

Click through for the high-level. I can see this even more directly competing with Kinesis and Event Hubs.

Comments closed

Technology Choices for Streaming Pipelines

Published 2020-05-08 by Kevin Feasel

The Hadoop in Real World team takes us through different tools available when working on streaming pipelines:

Businesses want to get insights as quickly as possible and do not want to wait for a day, like before, to bring up a report to understand what happened till yesterday. They require a more proactive approach that can help to act immediately when something significant happens and also to prevent the system from any faults/downtime before it occurs. Imagine you are buying some product from an e-retailer and you have gone till the point to make payment and something happened that caused the payment not to go through successfully. At that very moment, you are having a second thought about whether to buy the product now or later. Suppose, if the business is getting a report of this occurrence next day, it would not be of much use for them as the customer would have already bought it from somewhere or decided against it. This is where real-time events and insights come in. If it were a real-time report, the team would have called up the customer and made the purchase by offering some discounts, which in turn would have changed the mind of the customer.

Click through for a high-level discussion of these tools.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31