Preventing Credential Compromise When Using AWS

Will Bengtston walks us through techniques Netflix uses to protect credentials in AWS:

Scope

In this post, we’ll discuss how to prevent or mitigate compromise of credentials due to certain classes of vulnerabilities such as Server Side Request Forgery (SSRF) and XML External Entity (XXE) injection. If an attacker has remote code execution (RCE) or local presence on the AWS server, these methods discussed will not prevent compromise. For more information on how the AWS services mentioned work, see the Background section at the end of this post.

Protecting Your Credentials

There are many ways that you can protect your AWS temporary credentials. The two methods covered here are:

  • Enforcing where API calls are allowed to originate from.

  • Protecting the EC2 Metadata service so that credentials cannot be retrieved via a vulnerability in an application such as Server Side Request Forgery (SSRF).

Read the whole thing if you’re an AWS user.

Plotting Diagrams In R With nest() And map()

Sebastian Sauer shows how to display multiple ggplot2 diagrams together using facets as well as a combination of the nest() and map() functions:

One simple way is to plot several facets according to the grouping variable:

d %>% ggplot() + aes(x = hp, y = mpg) + geom_point() + facet_wrap(~ cyl)

Faceting is great, but it’s good to know the other technique as well.

Indexed View Matching

Erik Darling has a series of posts on indexed views, with the latest covering query matching even when using a keyword in creation of the indexed view itself:

There are a whole bunch of limitations in creating indexed views. One of them is that you can’t base the query on DISTINCT.

Fair enough, but you can do GROUP BY.

And what’s pretty cool is that the optimizer can match a query written to find distinct values to an indexed view with a group by.

Click through for the best example ever.

Continuing The Advent Of Code In T-SQL

Kevin Feasel

2018-12-06

T-SQL

Wayne Sheffield has a few more posts in the Advent of Code series.  His latest edition:

In Day 5, we find ourselves working with the polymers of the new Santa suit. A polymer (the input file), consists of units, represented by upper and lower case letters. Adjacent units of the same letter, but of different polarity (case), cancel each other out. This may lead to other units that can then cancel each other out. The goal is to reduce the polymer to as small as possible, and report back the reduced size.

Tasks:

  1. Perform a case-sensitive search/replace for each letter of the alphabet. The search is for a pair of the same letter, where one is upper case, and the other is lower case.
  2. Recursively perform this operation until the string can no longer be reduced.

In my opinion, the key part to this is that the operation needs to be performed recursively. I can think of only two ways to recursively perform an operation in SQL Server:

  1. A recursive common table expression (cte).
  2. Using a WHILE loop.

I don’t like using either of these mechanisms in SQL Server – they both perform operations in a “Row-By-Agonizing-Row” method, instead of a more set-based approach. However, set-based recursion usually performs extremely poorly. So, I’m going to use a while loop.

The recursion requirement does limit things a bit; otherwise I could see putting something together with the LEAD() window function.

Cross-Availability Group Login Management

David Fowler walks us through a problem about orphaned users and Availability Groups:

Now, I’m pretty sure that most of us will have been in the position where, after a fail-over we get inundated with calls, emails, Skype messages and carrier pigeon drops letting us know that so and so can no longer access the database.

When you look into it, you either find that the login never existed in the first place, so you create it or that it was there but the database user has become orphaned from it (happens when the login SID doesn’t match the SID of the database user, Adrian wrote about orphaned users in Dude where’s my access?).

You remap the orphaned user and everything is good again…  that is until the next time you failover and once again you’ll be hit with the same orphaned user problem.

Click through for the explanation and a permanent fix for this issue.

Connecting Power BI To Dockerized SQL Server

Chris Taylor shows us how to build a SQL Server on Linux Docker container and use it to supply data to a Power BI dashboard:

I (and many others) have done a series of docker blog posts over the last couple of years but they’ve all tended to evolve around spinning up a SQL Server 2017+ container for testing or demo purposes. This is only really the start, think of the bigger picture here, once you have your database environment the world is your oyster.

This blog post will show how we can use SQL Server 2019 CTP2.1 running on Linux (Ubuntu) in a docker container as our data source for a Power BI environment in next to no time!

These steps show a very manual process for completing this setup, if it is something you are looking to do frequently then I suggest creating a Dockerfile and/or yml file and use docker-compose. This way you can have all your setup in one file and it will be a single statement to get your SQL Server 2019 environment up and running.

Read on for the demo.

Working With Missing Values In R

Kevin Feasel

2018-12-05

R

Anisa Dhana has a few examples of ways we can work with data containing missing values in R:

Imputation is a complex process that requires a good knowledge of your data. For example, it is crucial to know whether the missing is at random or not before you impute the data. I have read a nice tutorial which visualize the missing data and help to understand the type of missing, and another post showing how to impute the data with MICE package.

In this short post, I will focus on management of the missing data using the tidyverse package. Specifically, I will show how to manage missings in the long data format (i.e., more than one observation for id).

Anisa shows a few different techniques, depending upon what you need to do with the data.  I’d caution about using mean in the second example and instead typically prefer median, as replacing missing values with the median won’t alter the distribution in the way that it can with mean.

Configuring Kafka Streams For Least Privilege

Gwen Shapira explains how we can assign minimal rights to Kafka Streams and KSQL users:

The principle of least privilege dictates that each user and application will have the minimal privileges required to do their job. When applied to Apache Kafka® and its Streams API, it usually means that each team and application will have read and write access only to a selected few relevant topics.

Organizations need to balance developer velocity and security, which means that each organization will likely have their own requirements and best practices for access control.

There are two simple patterns you can use to easily configure the right privileges for any Kafka Streams application—one provides tighter security, and the other is for more agile organizations. First, we’ll start with a bit of background on why configuring proper privileges for Kafka Streams applications was challenging in the past.

Read the whole thing; “granting everybody all rights” generally isn’t a good idea, no matter what your data platform of choice may be.

Working With Key-Value Pairs In Spark

Teena Vashist shows us a few of the functions available with Spark for working with key-value pairs:

1. Creating Key/Value Pair RDD: 
The pair RDD arranges the data of a row into two parts. The first part is the Key and the second part is the Value. In the below example, I used a parallelize method to create a RDD, and then I used the length method to create a Pair RDD. The key is the length of the each word and the value is the word itself.

scala> val rdd = sc.parallelize(List("hello","world","good","morning"))
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24
scala> val pairRdd = rdd.map(a => (a.length,a))
pairRdd: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[1] at map at <console>:26
scala> pairRdd.collect().foreach(println)
(5,hello)
(5,world)
(4,good)
(7,morning)

Click through for more operations.  Spark is a bit less KV-centric than classic MapReduce jobs, but there are still plenty of places where you want to use them.

Getting Maintenance Plan Information From Powershell

Shane O’Neill gives us the low-down on what we need to do in order to retrieve maintenance plan information from SQL Server using Powershell:

It’s surprisingly difficult to get this information in SQL Server. In fact I was quite stuck trying to figure out how to get this information when I realized that the good people over at Brent Ozar Unlimited already do some checking on this for their sp_Blitz tool.

A quick look at the above code showed me that dbo.sysssispackages was what I was looking for. Combine this with:

  • 1. Some hack-y SQL for the frequency in human readable format, and
  • 2. Some even more hack-y SQL to join on the SQL Agent Job name

And we had pretty much the XML for the Maintenance Plan and the SQL Agent Job Schedule that they were against.

Shane has made the code available as well, so check it out if you have any maintenance plans you’re trying to understand and maybe even get away from.

Categories

December 2018
MTWTFSS
« Nov Jan »
 12
3456789
10111213141516
17181920212223
24252627282930
31