Meltdown Performance Effects On Cassandra

The folks at Instaclustr have done some analysis on how Meltdown has affected Cassandra performance on AWS:

In our Security Advisory published 8 January, we advised of up to 20% increase in CPU utilization and small increase in latency across managed clusters in AWS and GCP following the rollout of the patches to the cloud provider hypervisors. We have since observed a reversal of this impact in the weeks following the initial announcements. That is, these effects disappeared when further AWS and GCP patches were rolled out by the cloud providers.

We assessed the risk of the vulnerabilities to our environment as Low. Our clusters run as single tenant and customer access is limited to the application layer.  If a user were able to exploit either of the vulnerabilities they could only gain access to their own information.

In short, they saw a change early on, but subsequent patching has removed that performance degradation.  Read the whole thing for more details.

Hortonworks DataFlow 3.1 Released

George Vetticaden and Haimo Liu announce Hortonworks DataFlow version 3.1:

Apache Kafka 1.0 support with full integration with HDF Services – Kafka 1.0 provides important new features including more stringent message processing semantics with support for message headers and transactions, performance improvements and advanced security options.

  • Apache Ambari support for Kafka 1.0 – Install, configure, manage, upgrade, monitor, and secure Kafka 1.0 clusters with Ambari.

  • Apache Ranger support for Kafka 1.0 – Manage access control policies (ACLs) using resource or tag-based security for Kafka 1.0 clusters.

  • New NiFi and SAM processors for Kafka 1.0 – New processors in NiFi and Hortonworks Streaming Analytics Manager (SAM) support Kafka 1.0 features including message headers and transactions.

Click through for the list of top changes.

Library Paths In R

Stacia Varga troubleshoots an issue integrating Power BI with R:

As I was putting together an example of using an R script as a Power BI data source, I ran into some issues on my development machine that was frankly driving me crazy. When I tried to run the query in Power BI with my R script (that ran successfully in the IDE, by the way), I kept getting this message:

DataSource.Error: ADO.NET: R script error.
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
  namespace 'scales' 0.3.0 is being loaded, but >= 0.4.1 is required
Error: package or namespace load failed for 'rnoaa'
Execution halted

Stacia’s answer works as long as the .libPaths() results match expectations.  Another idea would be to set the R_LIBS_USER user-level environment variable to the desired starting directory and that should force the directory in the environment variable to be first when calling .libPaths().

Data Masking Prior To SQL Server 2016

Daniel Hutmacher shows how to roll your own data masking with SQL Server:

Dynamic data masking is a neat new feature in recent SQL Server versions that allows you to protect sensitive information from non-privileged users by masking it. But using a brute-force guessing attack, even a non-privileged user can guess the contents of a masked column. And if you’re on SQL Server 2014 or earlier, you won’t have the option of using data masking at all.

Read on to see how you can bypass dynamic data masking, and for an alternative approach that uses SQL Server column-level security instead.

Click through for the demo.

Bash For The Powershell-Minded

Mark Wilkinson has started a new series on Bash.  His first post is an introduction to the scripting language:

Bash (the Bourne Again Shell) was created in 1989 for the GNU Project as a free replacement for the Unix Bourne shell. Most modern Linux systems use Bash as their default command line shell, so if you have ever dropped to a command line on a Linux system, you have probably used Bash. Just like PowerShell, Bash is both a scripting language and a command shell/interpreter. So not only can you execute commands in an interactive shell session, but you can also write scripts that incorporate multiple commands.

Once you get your hands dirty with Bash you’ll notice a lot of features that were incorporated into PowerShell. Things like command substitution: $(Get-Date) were directly pulled from Bash $(date). Other features will look familiar as well, like the ability to pipe multiple commands together.

One thing you need to understand right away is that Bash is string based, not object based like PowerShell. This means you’ll find yourself doing a lot more string processing to get tasks done. Things like string splitting will be much more common. Bash does support objects, like arrays, but few if any commands output an array. As we go through this series you’ll see that this might not be as limiting as it sounds.

The best part about learning Bash is that you can then get into arguments about Bash vs ksh vs zsh.

Handling Permissions Changes With Powershell

Drew Furgiuele has a process to store and then re-run rights grants on SQL Server databases:

Permission requirements for these environments can change over time, just like the code and data going into your databases. It’s hard to track permissions because a database permission is much more than just a user principal; database objects often contain permission definitions for GRANT and DENY states, and users may belong in certain database roles in one environment, but not another. This isn’t a big deal… until it is: sooner or later your data and code drift will be different than production, or maybe some new change really breaks an environment. Then, you’ll be asked to restore these environments to either an earlier version, or, more likely, you’ll be asked to “refresh” these editions to what is currently in production.

You probably already have a process for this, but how are you handling maintaining differences in permissions between environments? Wouldn’t it be nice if you had a way to quickly evaluate, store, and then re-apply permissions as part of refresh? Even better, wouldn’t it be cool if you could do this for all your databases on a given instance? Or what about all your instances in a given environment?

You can, and you can do it pretty easily with PowerShell.

My one problem with Drew’s otherwise-excellent post is that he approved far too many entry visas in the opening GIF.  100% deny, 0 problems.

Virtualize Data Or Move It?

James Serra contrasts data virtualization with traditional ETL moving data to a warehouse:

Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users.

Data movement is the process of extracting data from source systems and bringing it into the data warehouse and is commonly called ETL, which stands for extraction, transformation, and loading.

If you are building a data warehouse, should you move all the source data into the data warehouse, or should you create a virtualization layer on top of the source data and keep it where it is?

Read on for James’s thoughts.

Performance Problems With ADO.Net’s AddWithValue

Dan Guzman inveighs against using AddWithValue in ADO.Net:

The nastiness with AddWithValue is that ADO.NET infers the parameter definition from the supplied object value. Parameters in SQL Server are inherently strongly-typed, including the SQL Server data type, length, precision, and scale. Types in .NET don’t always map precisely to SQL Server types, and are sometimes ambiguous, so AddWithValue has to makes guesses about the intended parameter type.

The guesses AddWithValue makes can have huge implications when wrong because SQL Server uses well-defined data type precedence rules when expressions involve unlike data types; the value with the lower precedence is implicitly converted to the higher type. The implicit conversion itself isn’t particularly costly but is a major performance concern when it is the column value rather than the parameter value must be converted, especially in a WHERE or JOIN clause predicate. The implicit column value conversion can prevent indexes on the column from being used with an index seek (i.e. non-sargable expression), resulting in a full scan of every row in the table or index.

Read the whole thing.

Categories

February 2018
MTWTFSS
« Jan  
 1234
567891011
12131415161718
19202122232425
262728