Press "Enter" to skip to content

Day: January 27, 2017

Securing MapR

Mitesh Shah provides some high-level information on how to secure a MapR cluster:

  • Security Best Practice #2:  Require Authentication for All Services.  While it’s important for ports to be accessible exclusively from the network segment(s) that require access, you need to go a step further to ensure that only specific users are authorized to access the services running on these ports.  All MapR services — regardless of their accessibility — should require authentication.  A good way to enforce this for MapR platform components is by turning on security.  Note that MapR is the only big data platform that allows for username/password-based authentication with the user registry of your choice, obviating the need for Kerberos and all the complexities that Kerberos brings (e.g., setting up and managing a KDC). MapR supports Kerberos, too, so environments that already have it running can use it with MapR if preferred.

There’s nothing here which is absolutely groundbreaking, but they are good practices.

Comments closed

Powershell On Windows 10 Bash

Max Trinidad goes all the way with installing an Ubuntu environment on Windows:

It’s a known fact, if you install PowerShell Open Source in Windows 10 Bash subsystem, that it won’t work correctly. As soon as start typing $PSVersionTable and press enter, the cursor goes to the top of the screen. And, you keep typing and it gets very ugly.

Now, what if I tell you, I found the way to run PowerShell Open Source without any of these issues. Just like running it like it was installed in a Linux environment. No issues with the cursor going crazy and able to page up and down.

There are quite a few steps here, but Max lays them out clearly.

Comments closed

Sampling Data Lake Data

Alex Whittles shows how to use U-SQL to sample data to read in Power BI:

The answer is sampling, we don’t bring in 100% of the data, but maybe 10%, or 1%, or even 0.01%, it depends how much you need to reduce your dataset. It is however critical to know how to sample data correctly in order to maintain a level of accuracy of data in your reports.

Option 1: Take the top x rows of data
Don’t do it. Ever. Just no.
What if the source data you’ve been given is pre-sorted by product or region, you’d end up with only data from products starting with ‘a’, which would give you some wildly unpredictable results.

Option 2: Take a random % sample
Now we’re talking. This option will take, for example 1 in every 100 rows of data, so it’s picking up an even distribution of data throughout the dataset. This seems a much better option, so how do we do it?

Read on for a couple of sampling methods.

Comments closed

TempDB And TDE

Bob Ward troubleshoots an oddity around sys.databases marking tempdb as encrypted even when no user databases are encrypted:

In my test I did not hit the breakpoint. And furthermore, you will notice that when you query sys.dm_database_encryption_keys, there is no row for tempdb at all.  So our debugger breakpoint proves that tempdb is not permanently encrypted. Instead, if ALL user databases have TDE disabled and you restart SQL Server, tempdb is no longer encrypted. So instead of using sys.databases, use sys.dm_database_encryption_keys to tell which databases are truly enabled for encryption. I then verified my findings in the source code. Basically, we only enable encryption for tempdb if 1) ALTER DATABASE enables any user db for TDE 2) When we startup a user database and have encryption enabled. I also verified the behavior with my colleagues in the Tiger Team (thank you Ravinder Vuppula). We will look at fixing the issue with sys.databases in the future (ironically as I said earlier it was never enabled for tempdb before SQL Server 2016).

Read on for Bob Ward’s patented Debugger Fun.  My takeaway from this is that sys.dm_database_encryption_keys is valid, whereas sys.databases’s is_encrypted column might not be.

Comments closed

Comparing Column Names

Jen McCown has a script to compare column names between tables to find case inconsistencies:

I’m reviewing the code for the upcoming Minion CheckDB, and one of the things we’re checking for is case consistency in column names. For example, if Table1 has a column named Col1, and Table2 has COL1, well that’s no good.

But, how do we easily find those mismatches on a system that’s not case sensitive? Easy: collations.

Click through for the script.

Comments closed

Bad Habits: A Full Listing

Aaron Bertrand has provided an index to his bad habits series:

Here is an ongoing list of articles that I consider to be along these lines – either promoting best practices or eradicating bad habits; not all are explicitly framed as a “bad habit,” but they do all represent in some way things I wish I observed less often. Some of my opinions are controversial, and many have evoked very passionate comment threads – so I recommend scrolling down for those, too.

It’s a pretty long list.

Comments closed

UDFs And Recompilations

Erik Darling shows how to diagnose a high recompilation problem:

So yeah, that function seems to get up to something once for every ID you pass in. Remember that in our STUFF… query, we grabbed the TOP 10 each time. In the XE session, each time we call the proc, the string splitting function compiles and executes code 10 times. Bummerino. That’s the life of a loop.

On SQL Server 2016 (and really, with any non-looping code), we can get around the constant compilations with a simple rewrite. In this case, I’m calling 2016’s STRING_SPLIT function instead of the MSTVF function.

Read on for more.

Comments closed