A micro module is very small in scope and often has a single function. Building a micro module is about getting back to the basics and keeping everything as simple as possible.
There is a lot of good advice out there on how to build a module. That guidance is there to assist you as your module grows in size. If we know that our module will not grow and we will not add any functions, we can take a different approach even though it may not conform fully to the community best practices.
There are a few things which differ from standard module best practices.
Last month I was fortunate to have my first ever article published on Simple-Talk, among the best quality website for sql server articles ever. During the process of writing this article I ran into several errors related to temporal tables that I have not seen before. Some of these are documented by Microsoft, some are fairly obvious to understand and others are not. Below I summarize the list of errors you can possibly run into if you are using this really cool feature.
Click through for the list.
There’s a lot of discussion about preventing downtime. As a DBA and IT professional, it’s my sworn duty to prevent downtime. I usually describe my job as DBA something along the lines of, “to make sure data is always available to the people and applications that need it, and never available to the people and applications that shouldn’t have it.” Preventing downtime is certainly important for that first part–but how the heck do you define downtime?
Andy asks more questions than provides answers, but these are the types of questions which the technical side and the business side can get together on to define what constitutes downtime.
We’ve all done it. Working for ages tracking down that elusive bug in a project. Diligently committing away on our local repo as we make small changes. We’ve found the convoluted 50 lines of tortured logic, replaced it with 5 simple easy to read lines of code and all the test have passed. So we push it backup to github and wander off to grab some a snack as a reward
Halfway to the snacks you suddenly have a nagging doubt in the back of your mind that you don’t remember starting a new branch before starting on the bug hunt.
Read on for the process.
I recently took advantage of an opportunity to try Mirosoft’s Data Migration Assistant. It was a good experience and I found the tool quite useful. As the documentation tells us, the DMA “helps you upgrade to a modern data platform by detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database. DMA recommends performance and reliability improvements for your target environment and allows you to move your schema, data, and uncontained objects from your source server to your target server.” For my use case, I wanted to assess a SQL 2008 R2 environment with more than a hundred user databases for an on-premises upgrade to SQL 2017.
Dave takes us through an upgrade on three sample databases and then gives us some more messages from actual production databases.
When customers were asking for an S3 connector, there were already several Kafka-to-S3 solutions out there at the time, so we had to decide whether to adopt an existing S3 connector, modify the Kafka Connect HDFS connector (as some developers attempted to do) or write a new connector from scratch.
We knew that our users needed three things from the connector:
1. Integration with the Kafka Connect API: Connect’s scaling and fault tolerance capabilities were important to have, and users didn’t want yet another system that they’d need to learn how to use, deploy and monitor.
2. Exactly once: Users didn’t want to waste expensive compute cycles on deduplicating their data. And no one likes missing events.
3. No extra dependencies: Especially dependencies on additional datastores. Kafka clients and the S3 SDK libraries should be all you need to get events from Kafka to S3. Simplicity rules, especially in a distributed systems world where simple is often the key to being reliable.
When we considered the existing connectors, we noticed that none of them delivered the reliability and exactly once capabilities we wanted. They treat S3 like it’s another file system—though it isn’t really. For example, S3 lacks file appends, it is eventually consistent, and listing a bucket is often a very slow operation.
Click through for a dive into what Confluent did and how it works.
Amazon EMR provides high-level information on how it sets the default values for Spark parameters in the release guide. These values are automatically set in the spark-defaults settings based on the core and task instance types in the cluster.
To use all the resources available in a cluster, set the
maximizeResourceAllocationparameter to true. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets these parameters in the
spark-defaultssettings. Even with this setting, generally the default numbers are low and the application doesn’t use the full strength of the cluster. For example, the default for
spark.default.parallelismis only 2 x the number of virtual cores available, though parallelism can be higher for a large cluster.
Spark on YARN can dynamically scale the number of executors used for a Spark application based on the workloads. Using Amazon EMR release version 4.4.0 and later, dynamic allocation is enabled by default (as described in the Spark documentation).
There’s a lot in here, much of which applies to Spark in general and not just EMR.
HDFS offers a caching mechanism that takes advantage of the Data nodes memory. Blocks are loaded in memory and pinned there so that when a client requests those blocks they can be served directly from memory which is much faster than disk. There are some 3rd party products out there that does the same, but this option comes with Hadoop out of the box.
Hadoop has a special set of commands for managing this cache – the cacheadmin commands.
You must explicitly cache a directory or a file, and in case you cache a directory the caching is not recursive and sub directories will not be cached automatically. The full documentation can be found here. I was curious to see if Cloudera has integrated cache commands into their Cloudera manager, but was surprised to see that their documentation about it is basically a copy of the Apache hadoop guide and you still have to use the command line cacheadmin.
Click through to see how it performed in Guy’s scenario.
Now that we know, some basic definitions. It’s time we ask the main question! Why do I care?
There are many reasons you might wanna use Docker. I will give my perspective on why I started to learn about Docker.
I had to test my Kafka producers and consumers locally instead of deploying my code in DEV/QA even before I was sure things are working fine but also be sure that the same code, when deployed in other environments, should behave the same.
There are a few really good reasons for containers and testing is one of them.
In the Froid white paper, Microsoft talked about how they were working on fixing the function problem. When I read that white paper, my mind absolutely reeled – it’s awesome work, and I love working with a database doing that kind of cool stuff. Now that 2019 is getting closer to release, I’ve been spending more time with it. Most of the time, it works phenomenally well, and it’s the kind of improvement that will drive adoption to SQL Server 2019. Here, though, I’ve specifically picked a query that runs worse only to show you that not every query will be better.
To activate Froid, just switch the compatibility level to 150 (2019), and the query runs in 1 minute, 45 seconds, or almost twice as slow. Dig into the actual plan to understand why?
Read on to learn why.