Author: Kevin Feasel

Understanding DNS for Developers

Published 2019-04-16 by Kevin Feasel

RJ Zaworski explains DNS for web developers:

DNS can use a similar TCP/IP stack, but being parts of a simple system, most DNS operations can also travel the wire on the Internet’s favorite Roulette wheel: the User Datagram Protocol, UDP.
On a good day, UDP is fast, simple, and stripped bare of unnecessary niceties like delivery guarantees and congestion management. But a UDP message may also never be delivered, or it may be delivered twice. It may never get a response, which makes for fun client design–particularly coming from the relatively safe and well-adjusted world of HTTP. With TCP, you get an established connection and all kinds of accommodations when Things Inevitably Go Wrong. UDP? “Best effort” delivery. Which means a packet thrown over the fence with a prayer for a soft landing.

It’s a good read if you’re new to DNS.

Comments closed

Automating Command Line Sessions with Powershell

Published 2019-04-16 by Kevin Feasel

James Livingston shows us how to redirect the standard in flow with .NET, using Powershell as the example language:

Command Line Interface (CLI) tools can be very useful for interacting with certain applications. However, some CLIs do not let a user pass in parameters which makes it difficult to automate. Instead, they lock a user into an interactive session and force the user to enter commands. Fortunately, some programming languages allow for a redirection of the standard input, output, and error of an running process. A developer co-worker of mine has been very successful doing this in Java, which got me thinking… And turns out, using the .NET libraries, we can implement this functionality for any CLI in PowerShell.

Automating interactions with CLI programs is often not ideal but when you’re stuck, it does the trick.

Comments closed

Micro Modules in Powershell

Published 2019-04-16 by Kevin Feasel

Kevin Marquette shows how to create a micro module and explains why you might want one:

A micro module is very small in scope and often has a single function. Building a micro module is about getting back to the basics and keeping everything as simple as possible.
There is a lot of good advice out there on how to build a module. That guidance is there to assist you as your module grows in size. If we know that our module will not grow and we will not add any functions, we can take a different approach even though it may not conform fully to the community best practices.

There are a few things which differ from standard module best practices.

Comments closed

Error Messages Related to Temporal Tables

Published 2019-04-16 by Kevin Feasel

Mala Mahadevan digs into temporal tables:

Last month I was fortunate to have my first ever article published on Simple-Talk, among the best quality website for sql server articles ever. During the process of writing this article I ran into several errors related to temporal tables that I have not seen before. Some of these are documented by Microsoft, some are fairly obvious to understand and others are not. Below I summarize the list of errors you can possibly run into if you are using this really cool feature.

Click through for the list.

Comments closed

Defining Downtime Down

Published 2019-04-16 by Kevin Feasel

Andy Mallon takes us through the notion of downtime:

There’s a lot of discussion about preventing downtime. As a DBA and IT professional, it’s my sworn duty to prevent downtime. I usually describe my job as DBA something along the lines of, “to make sure data is always available to the people and applications that need it, and never available to the people and applications that shouldn’t have it.” Preventing downtime is certainly important for that first part–but how the heck do you define downtime?

Andy asks more questions than provides answers, but these are the types of questions which the technical side and the business side can get together on to define what constitutes downtime.

Comments closed

Reverting a Git Push

Published 2019-04-16 by Kevin Feasel

Stuart Moore takes us through backing out a commit in Git when you pushed to the wrong branch:

We’ve all done it. Working for ages tracking down that elusive bug in a project. Diligently committing away on our local repo as we make small changes. We’ve found the convoluted 50 lines of tortured logic, replaced it with 5 simple easy to read lines of code and all the test have passed. So we push it backup to github and wander off to grab some a snack as a reward
Halfway to the snacks you suddenly have a nagging doubt in the back of your mind that you don’t remember starting a new branch before starting on the bug hunt.

Read on for the process.

Comments closed

Trying Out the Data Migration Assistant

Published 2019-04-16 by Kevin Feasel

Dave Mason shares some thoughts on the Data Migration Assistant:

I recently took advantage of an opportunity to try Mirosoft’s Data Migration Assistant. It was a good experience and I found the tool quite useful. As the documentation tells us, the DMA “helps you upgrade to a modern data platform by detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database. DMA recommends performance and reliability improvements for your target environment and allows you to move your schema, data, and uncontained objects from your source server to your target server.” For my use case, I wanted to assess a SQL 2008 R2 environment with more than a hundred user databases for an on-premises upgrade to SQL 2017.

Dave takes us through an upgrade on three sample databases and then gives us some more messages from actual production databases.

Comments closed

Exactly-Once Writes From Kafka To S3

Published 2019-04-15 by Kevin Feasel

Konstantine Karantasis takes us through writing from a Kafka topic into S3:

When customers were asking for an S3 connector, there were already several Kafka-to-S3 solutions out there at the time, so we had to decide whether to adopt an existing S3 connector, modify the Kafka Connect HDFS connector (as some developers attempted to do) or write a new connector from scratch.
We knew that our users needed three things from the connector:
1. Integration with the Kafka Connect API: Connect’s scaling and fault tolerance capabilities were important to have, and users didn’t want yet another system that they’d need to learn how to use, deploy and monitor.
2. Exactly once: Users didn’t want to waste expensive compute cycles on deduplicating their data. And no one likes missing events.
3. No extra dependencies: Especially dependencies on additional datastores. Kafka clients and the S3 SDK libraries should be all you need to get events from Kafka to S3. Simplicity rules, especially in a distributed systems world where simple is often the key to being reliable.
When we considered the existing connectors, we noticed that none of them delivered the reliability and exactly once capabilities we wanted. They treat S3 like it’s another file system—though it isn’t really. For example, S3 lacks file appends, it is eventually consistent, and listing a bucket is often a very slow operation.

Click through for a dive into what Confluent did and how it works.

Comments closed

Spark Memory Management on EMR

Published 2019-04-15 by Kevin Feasel

Karunanithi Shanmugam gives us some tips on memory management for Spark in Amazon’s ElasticMapReduce:

Amazon EMR provides high-level information on how it sets the default values for Spark parameters in the release guide. These values are automatically set in the spark-defaults settings based on the core and task instance types in the cluster.
To use all the resources available in a cluster, set the maximizeResourceAllocation parameter to true. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets these parameters in the spark-defaults settings. Even with this setting, generally the default numbers are low and the application doesn’t use the full strength of the cluster. For example, the default for spark.default.parallelism is only 2 x the number of virtual cores available, though parallelism can be higher for a large cluster.
Spark on YARN can dynamically scale the number of executors used for a Spark application based on the workloads. Using Amazon EMR release version 4.4.0 and later, dynamic allocation is enabled by default (as described in the Spark documentation).

There’s a lot in here, much of which applies to Spark in general and not just EMR.

Comments closed

Measuring HDFS Cache Performance Gains

Published 2019-04-15 by Kevin Feasel

Guy Shilo tries out the HDFS centralized cache:

HDFS offers a caching mechanism that takes advantage of the Data nodes memory. Blocks are loaded in memory and pinned there so that when a client requests those blocks they can be served directly from memory which is much faster than disk. There are some 3rd party products out there that does the same, but this option comes with Hadoop out of the box.
Hadoop has a special set of commands for managing this cache – the cacheadmin commands.
You must explicitly cache a directory or a file, and in case you cache a directory the caching is not recursive and sub directories will not be cached automatically. The full documentation can be found here. I was curious to see if Cloudera has integrated cache commands into their Cloudera manager, but was surprised to see that their documentation about it is basically a copy of the Apache hadoop guide and you still have to use the command line cacheadmin.

Click through to see how it performed in Guy’s scenario.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31