Press "Enter" to skip to content

Day: January 8, 2020

Accessing S3 Data from Apache Spark

Divyansh Jain shows how we can connect to AWS’s S3 using Apache Spark:

Now, coming to the actual topic that how to read data from S3 bucket to Spark. Well, it is not very easy to read S3 bucket by just adding Spark-core dependencies to your Spark project and use spark.read to read you data from S3 Bucket.

So, to read data from an S3, below are the steps to be followed:

This isn’t a built-in source, so there is a little bit of work to do, but it’s not that bad.

Comments closed

Creating a Custom Partitioner for Apache Kafka

Swapnil Gosavi walks us through the process of creating a custom partitioner in Apache Kafka:

Assume we are collecting data from different departments. All the departments are sending data to a single topic named department. I planned five partitions for the topic. But, I want two partitions dedicated to a specific department, named IT, and the remaining three partitions for the rest of the departments. How would you achieve this?

You can solve this requirement, and any other type of partitioning needs by implementing a custom partitioner.

This is quite useful when you don’t necessarily want topic explosion but you do want more than what the classic partitioner allows.

Comments closed

Delegating Authentication using Managed Service Accounts

Jamie Wick helps us solve the classic Kerberos double-hop problem:

If the Report Server service doesn’t have permission to delegate to the SQL Server, it will try to connect anonymously (step 4 in the diagram above). Which results in this login error:

Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. Reason: Could not find a login matching the name provided. [CLIENT: <Client IP Address>]

Historically report server and SQL server services, that needed the ability to delegate authentication to other servers, were configured to run using an Active Directory user account. Enabling delegation on these accounts was simply a matter of setting the Trust level on the Delegation tab of the account’s properties (with Active Directory Users & Computers).

But Jamie is here to show us a better way.

Comments closed

Generating an Email List from Active Directory Users

James Livingston takes us through an interesting solution to a common problem:

If you’ve ever performed some impactful maintenance on a SQL Server, you probably notified users. If you’re great at documentation and already know exactly who to contact, this script isn’t for you. If you don’t have a user email list, this script will create it for you!

I used to manage 500 SQL Server instances and there was daily maintenance\changes going on constantly. I wrote this PowerShell script to automatically create an email list for me. This PowerShell script gathers the login information from an instance of SQL Server and then pulls their email address from Active Directory.

Read on to see the script in action.

Comments closed

The Costs of Virtualization

David Klee points out that virtualization, configured correctly, should not harm SQL Server performance much:

A wonderful reader of my blog sent me a note (thanks Jess!) about a single line notation in the latest SQL Server release notes. The notes is as follows.

Running SQL Server on a virtual machine will be slower than running natively because of the overhead of virtualization.

The question was simple. Why would Microsoft add this disclaimer? It was being used as a negative talking point towards SQL Server virtualization, and holding the DBA team back from getting the benefits of virtualization.

David gives us some rough numbers on what that means. Spoiler alert: if you set up your environment right, it’s not much.

Comments closed