Press "Enter" to skip to content

Month: August 2016

Failed Logins

Kevin Hill discusses failed logins:

We’ve all seen them.

Login failed for user ‘MyDomain\Bob’ (password issue)
Login failed for user ‘MyDomain\Nancy’ (default database issue)
Login failed for user ‘blah, blah, blah…’

But what about Login Failed for user ‘Insert Chinese characters here’, Reason, An attempt to logon using SQL Authentication failed.

Wait…nobody in the company has a username with Chinese characters.   And we don’t have SQL Authentication turned on….

I generally agree with Kevin’s assessment, but have one big point of contention:  he recommends turning off successful login logging.  I think that’s not a great thing to do, particularly for a company with a mature security team.  Think about this scenario:  if you see four or five failed login attempts for sa, and you don’t use sa in your environment, you know somebody’s trying something sneaky.  If you see four or five failed login attempts for sa and then a successful login attempt for sa, you know they succeeded.  If you don’t log successful login attempts, you lose that critical piece of information.

Comments closed

Markov Chains

Sergey Bryl has an introductory-level post on what Markov chains are and how they work:

Using Markov chains allow us to switch from heuristic models to probabilistic ones. We can represent every customer journey (sequence of channels/touchpoints) as a chain in a directed Markov graph where each vertex is a possible state (channel/touchpoint) and the edges represent the probability of transition between the states (including conversion.) By computing the model and estimating transition probabilities we can attribute every channel/touchpoint.

Let’s start with a simple example of the first-order or “memory-free” Markov graph for better understanding the concept. It is called “memory-free” because the probability of reaching one state depends only on the previous state visited.

Markov chains are great for behavior prediction and sentence formation.  This is part one of a series I will eagerly anticipate.  H/T R Bloggers.

Comments closed

Developing Spark Applications In .NET

Kaarthik Sivashanmugam talks about Mobius, a Microsoft-driven .NET wrapper for Spark:

The C# language binding to Spark is similar to the Python and R bindings. In fact, Mobius follows the same design pattern and leverages the existing implementation of language binding components in Spark where applicable for consistency and reuse. The following picture shows the dependency between the .NET application and the C# API in Mobius, which internally depends on Spark’s public API in Scala and Java and extends PythonRDD from PySpark to implement CSharpRDD.

Looks like there’s some fuzziness on just how well F# is supported.  Still, this is very exciting as a way of bridging the gap for .NET developers.

Comments closed

Dropping Masking From A Column

Steve Jones shows how to drop Dynamic Data Masking from a column:

This is a quick one. As I experimented with Dynamic Data Masking for the Stairway to Dynamic Data Masking, and writing my Using SQL Compare with Dynamic Data Masking, I needed to remove masking from a column. I didn’t want to rebuild tables, and hoped there was an easy way to ALTER a column.

There is.

The more I’ve seen of DDM, the less I like it.  So I’m more a fan of scripts to remove it than scripts to add it…

Comments closed

Optimizing HBase In HDInsight

Ashish Thapliyal links to a 30-minute presentation on HBase optimization:

This session was presented by Nitin Verma (Sr. Software Engineer) and Pravin Mittal (Principal Engineering Manager) @ HBaseCon 2016. The session goes deeper into success story of enabling a big internal customer on HDInsight HBase.

HBase design is a totally different mindset from relational design, so you have to unlearn a lot of habits when moving over to it.

Comments closed

ODBC Driver 13.1

The SQL Server Blog reports that the Microsoft ODBC Driver for SQL Server has been updated to version 13.1:

Always Encrypted

You can now use Always Encrypted with the Microsoft ODBC Driver 13.1 for SQL Server. Always Encrypted is a new SQL Server 2016 and Azure SQL Database security feature that prevents sensitive data from being seen in plaintext in a SQL instance. You can now transparently encrypt the data in the application, so that SQL Server or Azure SQL Database will only handle the encrypted data and not plaintext values. If a SQL instance or host machine is compromised, an attacker can only access ciphertext of your sensitive data. Use the ODBC Driver 13.1 to encrypt plaintext data and store the encrypted data in SQL Server 2016 or Azure SQL Database. Likewise, use the driver to decrypt your encrypted data.

Check out the full list  of new features at the link above.

Comments closed

Reverse Engineering SSIS Packages

Ben Weissman shows how to use BimlOnline to reverse engineer an Integration Services package into its component Biml:

A few things to be aware of:

– Your file will be uploaded to and stored at BimlOnline so you may want to remove passwords etc.
– If you’re trying to figure out how to build a specific task in Biml but your file does way more that just that, consider creating (and uploading) a file that will only contain the task you’re looking for – this will keep the resulting Biml clean and easy to read.

This is extremely helpful for figuring out how to use third-party components with Biml.  If you want a local IDE, there’s always BimlStudio (which costs money).

Comments closed

Int To BigInt

Kendra Little walks through the process of expanding an int column into a bigint:

Sometimes you just can’t take the outage. In that case, you’ve got to proceed with your own wits, and your own code. This is tricky because changes are occurring to the table.

The solution typically looks like this:

  • Set up a way to track changes to the table – either triggers that duplicate off modifications or Change Data Capture (Enterprise Edition)

  • Create the new table with the new data type, set identity_insert on if needed

  • Insert data into the new table. This is typically done in small batches, so that you don’t overwhelm the log or impact performance too much. You may use a snapshot from the point at which you started tracking changes.

  • Start applying changed data to the new table

  • Make sure you’re cleaning up from the changed data you’re catching and not running out of space

  • Write scripts to compare data between the old and new tables to make sure you’re really in sync (possibly use a snapshot or a restored backup to compare a still point in time)

  • Cut over in a quick downtime at some point using renames, schema transfer, etc. If it’s an identity column, don’t forget to fix that up properly.

This method matches what I’ve done in zero downtime situations.

Also see Aaron Bertrand’s article on the same topic:

In part 3 of this series, I showed two workarounds to avoid widening an IDENTITY column – one that simply buys you time, and another that abandons IDENTITY altogether. The former prevents you from having to deal with external dependencies such as foreign keys, but the latter still doesn’t address that issue. In this post, I wanted to detail the approach I would take if I absolutely needed to move to bigint, needed to minimize downtime, and had plenty of time for planning.

Because of all of the potential blockers and the need for minimal disruption, the approach might be seen as a little complex, and it only becomes more so if additional exotic features are being used (say, partitioning, In-Memory OLTP, or replication).

At a very high level, the approach is to create a set of shadow tables, where all the inserts are directed to a new copy of the table (with the larger data type), and the existence of the two sets of tables is as transparent as possible to the application and its users.

Those are two good posts on this topic.

Comments closed

Billing Migration: Choosing A Database Product

Jyoti Shandil, et al, explain how they chose a database product for Netflix’s billing system:

AWS RDS MySQL: Ideally we would have gone with MySQL RDS as our backend, considering Amazon does a great job in managing and upgrading relational database as a service, providing multi-AZ support for high availability. However, the main drawback to RDS was the storage limit of 6TB. Our requirement at the time, was closer to 10TB.

AWS Aurora: AWS Aurora would have met the storage needs, but it was in beta at that time.

PostgreSQL: PostgreSQL is a powerful open source, object-relational database system, but we did not have much in house expertise using PostgreSQL. In the DC, our primary backend databases were Oracle and MySQL. Moreover, choosing PostgreSQL would have eliminated the option of a seamless migration to Aurora in future, as Aurora is based on the MySQL engine.

From there, they also explain some technical issues they found in migrating data.  Read the whole thing.  If you’re coming into this series blind, they also have part 1 and part 2 of the series, giving more of an architectural overview of their billing system.

Comments closed

DacFx Wrapper

Ed Elliott has a new Powershell module named DacFxed:

There is a solution? Well yes of course otherwise I wouldn’t have been writing this! DacFxed is a powershell module which:

  • 1. References the DacFx nuget package so updating to the latest version is simple
  • 2. Implements a hack (ooh) to allow contributors to be loaded from anywhere
  • 3. Is published to the powershell gallery so to use it you just do “Install-Module -Name DacFxed -Scope User -Force”
  • 4. Has a Publish-Database, New-PublishProfile and Get-DatabaseChanges CmdLets

Cool right, now a couple of things to mention. Firstly this is of course open source and available: https://github.com/GoEddie/DacFxed

This is a nice tool to deploy dacpac files using Powershell.  Check out the GitHub repo for more details.

Comments closed