An Introduction To Kafka

Kevin Feasel

2017-07-14

Hadoop

Prashant Sharma explains the basics of Apache Kafka:

Apache describes Kafka as a distributed streaming platform that lets us:

  1. Publish and subscribe to streams of records.

  2. Store streams of records in a fault-tolerant way.

  3. Process streams of records as they occur.

Kafka is probably the most generally interesting of the current Hadoop ecosystem, with Spark not too far behind.  By “generally interesting,” I mean in the sense that companies with no vested interest in Hadoop as a whole could still be excited by the prospect of Kafka.

Data Lake Tools For VS Code Updated

Jenny Jiang announces Azure Data Lake Tools for Visual Studio Code’s July update:

Local Debug enables you to debug your C# code behind, step through the code, and validate your script locally before submitting to ADLA.

  • Use command ADL: Start Local Run Service to start local run service and set a breakpoint in your code behind, then click command ADL: Local Debug to start local debug service. You can debug through the debug console and view parameter, variable, and call stack information.

Click through to see the other improvements.

Cloudera Enterprise 5.12 GA

Fred Koopmans announces that Cloudera Enterprise 5.12 is now generally available:

Data Science & Engineering

  • Cloudera Data Science Workbench enhancements include:

    • GPU Support: Cloudera Data Science Workbench now enables popular deep learning frameworks to run on GPUs, both on-premises and in the cloud.

    • Embedded Web UIs: Users can work with the Apache Spark Web UI for Spark sessions. Other interactive web applications like TensorBoard, Shiny, and Plotly now appear directly in the workbench.

    • Enhanced Job Scheduling: Cloudera Data Science Workbench users can now schedule jobs directly from external schedulers or orchestration systems via the new Jobs API.

Read on for more enhancements.

Using SMO On Linux

Richie Lee explains how to get the SQL Tool Service running on Linux:

So, to briefly sum up, to use SMO on Linux, you need to do the following:

  • Install .NET Core 2.0
  • Install PowerShell beta 2
  • Install SQL Tool Service

You can use PowerShell from the Terminal, but I prefer something like an IDE so this is optional:

  • Download Visual Studio Code

  • Install PowerShell plugin

  • Change settings file to point explicitly to PowerShell beta 2.

Read the whole thing.

TDE + AG = Higher CPU Utilization

Ginger Keys has an analysis stress testing CPU load when Transparent Data Encryption is on and a database is in an Availability Group:

Microsoft says that turning on TDE (Transparent Data Encryption) for a database will result in a 2-4% performance penalty, which is actually not too bad given the benefits of having your data more secure. There is even more of a performance hit when enabling cell level or column level encryption. When encrypting any of your databases, keep in mind that the tempdb database will also be encrypted. This could have a performance impact on your other non-encrypted databases on the same instance.

In a previous post I demonstrated how to add an encrypted database to an AlwaysOn group in SQL2016. In this article I will demonstrate the performance effects of having an encrypted database in your AlwaysOn Group compared to the same database not-encrypted.

The results aren’t surprising, though the magnitude of the results might be.

SSMS Default Reports

Kenneth Fisher talks about the default reports built into Management Studio:

As DBAs our stock in trade is information and there is certainly an impressive amount available. The diagnostic views are the most common place to get the information we need but every now and again it’s nice to get an organized/pretty view. To that end, you can write your own reports or you can use the default reports that Microsoft makes available through SSMS. There are reports at the Server, Database and Agent level.

The Disk Usage by Table report is on my go-to list.

Grooming The Error Log

Mark Wilkinson explains how to keep your SQL Server error logs in check:

We typically think of error logs as somewhere to go to find issues, but what if your error logs ARE the issue? Like most anything else in SQL Server, if you neglect your error logs you can run into trouble. Even on a low-traffic SQL Server instance, a bad piece of code, or a hardware issue, could easily fill your error logs, and with the introduction of Hekaton in SQL Server 2014, the SQL Server error log started getting a lot more data pumped into it than you might have been used to before. What this means for the DBA is that you can quickly start filling your main system drive (if your SQL install and error logs are in the default location) with massive error logs. So what questions should you be answering about error logs to make sure you don’t run into problems?

Read on to learn more.

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31