Press "Enter" to skip to content

Author: Kevin Feasel

Accessing S3 Data from Apache Spark

Divyansh Jain shows how we can connect to AWS’s S3 using Apache Spark:

Now, coming to the actual topic that how to read data from S3 bucket to Spark. Well, it is not very easy to read S3 bucket by just adding Spark-core dependencies to your Spark project and use spark.read to read you data from S3 Bucket.

So, to read data from an S3, below are the steps to be followed:

This isn’t a built-in source, so there is a little bit of work to do, but it’s not that bad.

Comments closed

Delegating Authentication using Managed Service Accounts

Jamie Wick helps us solve the classic Kerberos double-hop problem:

If the Report Server service doesn’t have permission to delegate to the SQL Server, it will try to connect anonymously (step 4 in the diagram above). Which results in this login error:

Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. Reason: Could not find a login matching the name provided. [CLIENT: <Client IP Address>]

Historically report server and SQL server services, that needed the ability to delegate authentication to other servers, were configured to run using an Active Directory user account. Enabling delegation on these accounts was simply a matter of setting the Trust level on the Delegation tab of the account’s properties (with Active Directory Users & Computers).

But Jamie is here to show us a better way.

Comments closed

Creating a Custom Partitioner for Apache Kafka

Swapnil Gosavi walks us through the process of creating a custom partitioner in Apache Kafka:

Assume we are collecting data from different departments. All the departments are sending data to a single topic named department. I planned five partitions for the topic. But, I want two partitions dedicated to a specific department, named IT, and the remaining three partitions for the rest of the departments. How would you achieve this?

You can solve this requirement, and any other type of partitioning needs by implementing a custom partitioner.

This is quite useful when you don’t necessarily want topic explosion but you do want more than what the classic partitioner allows.

Comments closed

The Costs of Virtualization

David Klee points out that virtualization, configured correctly, should not harm SQL Server performance much:

A wonderful reader of my blog sent me a note (thanks Jess!) about a single line notation in the latest SQL Server release notes. The notes is as follows.

Running SQL Server on a virtual machine will be slower than running natively because of the overhead of virtualization.

The question was simple. Why would Microsoft add this disclaimer? It was being used as a negative talking point towards SQL Server virtualization, and holding the DBA team back from getting the benefits of virtualization.

David gives us some rough numbers on what that means. Spoiler alert: if you set up your environment right, it’s not much.

Comments closed

Generating an Email List from Active Directory Users

James Livingston takes us through an interesting solution to a common problem:

If you’ve ever performed some impactful maintenance on a SQL Server, you probably notified users. If you’re great at documentation and already know exactly who to contact, this script isn’t for you. If you don’t have a user email list, this script will create it for you!

I used to manage 500 SQL Server instances and there was daily maintenance\changes going on constantly. I wrote this PowerShell script to automatically create an email list for me. This PowerShell script gathers the login information from an instance of SQL Server and then pulls their email address from Active Directory.

Read on to see the script in action.

Comments closed

Controlling IoT Devices via Databricks

Saeed Barghi takes us through building an interesting solution:

A few weeks ago I did a talk at AI Bootcamp here in Melbourne on how we can build a serverless solution on Azure that would take us one step closer to powering industrial machines with AI, using the same technology stack that is typically used to deliver IoT analytics use cases. I demoed a solution that received data from an IoT device, in this case a crane, compared the data with the result of a machine learning model that has ran and written its predictions to a repository, in this case a CSV file, and then decided if any actions needs to be taken on the machine, e.g. slowing the crane down if the wind picks up.

This was a really interesting article.

Comments closed

Checks After a DBA Leaves

Thomas Rushton has some thoughts on a resiliency plan for DBA departures:

Transferring database ownership should be relatively straight-forward and risk-free. Except there are certain applications that depend on the database being owned by a sysadmin, or by a particular account (sa), or by their own service account, rather than checking for membership of the db_owner role… So make sure you test your changes.

Read the whole thing.

Comments closed

Truncating Paths in Powershell Prompts

Jeffery Hicks shares a method for shortening your displayed path in Powershell:

As you can see, I don’t have a lot of space left at my prompt. Usually, if I am staying in one location say for demos, I’ll create a PSDrive with a shorter name. I use the New-PSDriveHere command from the PSScriptTools module. But another option is to truncate the prompt. So instead of the long path you see here it might be C:\Users…\HelpDesk. I modified the default PowerShell prompt function to do just that.

Click through for the script and a few demos.

Comments closed

Using the OUTPUT Clause

Eduardo Pivaral takes us through the OUTPUT clause:

Even when the code is easy to read, but if you use this pattern over all your codebase, maintain it can become difficult if you have to change object names or implement it on another system.

T-SQL language provides the OUTPUT clause, that allows you to retrieve information from a DML statement in the same batch.

This is pretty useful for performance tuning in some scenarios, but also for simplifying multi-step processes.

Comments closed

On Self-Signed Certificates

Sean Gallardy isn’t too concerned about self-signed certificates in SQL Server:

This happens at startup and is completely transparent to both end users and administrators alike, apparently not so much to your security team utilizing the latest and greatest security tools available (yes, that’s sarcasm). One day you may get a notice that “SQL Server is using a self-signed certificate (issue #1) and that the key length/algorithm isn’t what our internal security team allows (issue #2). You must fix asap or be out of compliance!” Oh no, the dreaded compliance threat.

Sean is…not impressed with these issues and explains why.

Comments closed