Press "Enter" to skip to content

Day: September 11, 2020

Indexing S3 Data with CDP Data Hub

Eva Nahari, et al, show how to perform indexing and serving of S3 data in Cloudera Data Platform:

This blog post will present a simple “hello world” kind of example on how to get data that is stored in S3 indexed and served by an Apache Solr service hosted in a Data Discovery and Exploration cluster in CDP. For the curious: DDE is a pre-templeted Solr-optimized cluster deployment option in CDP, and recently released in tech preview. We will only cover AWS and S3 environments in this blog. Azure and ADLS deployment options are also available in tech preview, but will be covered in a future blog post.

We will depict the simplest scenario to make it easy to get started. There are of course more advanced data pipeline setups and more rich schemas possible, but this is a good starting point for a beginner. 

Read on for the instructions.

Comments closed

A Review of KaDeck: Kafka Management Tool

Guy Shilo reviews KaDeck:

If you want the real web version then you need to register in their website and create a team first. Yes, KaDeck is aimed at teams of people working together. You create a team online and shortly you receive a mail with team id and a secret key that you will use when running KaDeck web..

It seems that there is no host installation of KaDeck right now (rpm package, binaries or so) but only a Docker image. This forces the user to run it in Docker or a Kubernetes cluster (or one of it’s commercial distributions such as OpenShift). Another thing is that checks your license online each time you start the container. If you want to use t offline you have to do a process of offline activation. You can reach it from the administration menu.

Click through for Guy’s thoughts on the product.

Comments closed

Building an HTML Report with Failed Agent Job Info

Garry Bargsley continues a series on alerting when SQL Agent jobs fail:

Welcome to Part 2 in the series about SQL Server Agent Job Failures. In this part you are going to learn how to build an HTML report with Failed Agent Job information and send the report via e-mail to the interested team(s).

Let’s get started putting the pieces together to build our report.

Click through for the details, as well as a full code sample in Powershell.

Comments closed

Spark SQL in Delta Lake

Kundan Kumarr walks us through some of the basic SQL operations you can perform with Delta Lake in Apache Spark:

Nowadays Delta lake is a buzz word in the Big Data world, especially among the spark developers because it relegates lots of issues found in the Big Data domain. Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It is evolving day by day and adds cool features in its every release. On 19th June 2020, Delta lake version 0.7.0 was released and this is the first release on Spark 3.x. This release involves important key features that can make the spark developer’s work easy.

One of the interesting key features in this release is the support for metastore-defined tables and SQL DDLs. So now we can define Delta tables in the Hive metastore and use the table name in all SQL operations. We can perform SQL DDLs to create tables, insert into tables, explicitly alter the schema of the tables, and so on. So in this blog, we will learn how we can perform SQL DDLs/DMLS/DQL in Delta Lake 0.7.0.

Click through for the examples.

Comments closed

A Review of Distributed Availability Groups

Joey D’Antoni shares some thoughts on distributed Availability Groups in SQL Server 2016 and later:

I’m writing this post because I’ve been mired in configuring a bunch of distributed availability groups for a client, and while the feature is technically solid, the lack of tooling can make it a challenge to implement. Specifically, I’m implementing these distributed AGs (please don’t use the term DAG as you’ll piss off Allan Hirt, but more importantly its used in Microsoft Exchange High Availability, so it’s taken) in Azure which adds a couple of additional changes because of the need for load balancers. You should note this feature is Enterprise Edition only, and is only available starting with SQL Server 2016.

Read on for some of the positives around distributed AGs, as well as some negatives (mostly around the lack of tooling).

Comments closed

Starting and Stopping Services Things with dbatools

Mikey Bronowski continues a series on dbatools:

OK, let’s start. While doing some maintenance tasks there is a need to start/stop/restart SQL Services. Using SQL Server Configuration Manager is one way or even straight from SSMS, however, working with multiple instances is easier with dbatools.

Mikey covers services, SQL Agent jobs, endpoints, traces, and Extended Events sessions, so check it out.

Comments closed

Automating Collection of SSIS Failure Data

Rob Sewell has a love for automation:

I have tried my best at all times to follow this advice in the last decade and pretty much I am happy that I have managed it.

– I use PowerShell (a lot!) to automate all sorts of routine tasks including migrating this blog
– I use Jupyter Notebooks to enable myself and others to automate Run Books, Training, Documentation, Demonstrations, Incident Response. You can find my notebooks here
– I use Azure DevOps to automate infrastructure creation and changes with terraform and delivery of changes to code as well as unit testing.
– I use GitHub actions to create this blog, publish the ADSNotebook module
– I use Chocolatey to install and update software
– I have used Desired State Configuration to ensure that infrastructure is as it is expected to be

At every point I am looking for a means to automate the thing that I am doing because it is almost guaranteed that there will be a time in the future after you have done a thing that there will be a need to do it again or to do it slightly differently.

Click through to see what Rob has for us in the way of collecting SSIS log data after job failures.

Comments closed

Building an Azure Function to Automate CHECKDB

Arun Sirpal shows us how to build an Azure Function:

The title is a mouthful and so is this post. In the past I have linked to blog posts from Microsoft that say consistency checks for Azure SQL Database is the responsibility of Microsoft. (https://azure.microsoft.com/en-gb/blog/data-integrity-in-azure-sql-database/)

However, Paul Randal got me thinking about his thoughts on it (via his insider email). Forming the core of this post. If you desire to run DBCC CHECKDB against Azure SQL Database (which I know people do) – how can you do this? There are many ways, but for this blog post – Enter Azure functions. There are many moving parts to this, but once setup and coded it is a very satisfying experience. Let’s dig in. I am NOT going to copy and paste every little element of the high-level guide from Microsoft, there is no point in that but I will show you the links that you need to setup the relevant function app project then the tailored bits around CHECKDB forms the bulk of this post.

This isn’t necessary to do, but if you want to learn how Azure Functions work, it’s a good example of working through the mechanics.

Comments closed