Press "Enter" to skip to content

Author: Kevin Feasel

Secure Strings in Powershell

Greg Moore shows how to build out secure strings in Powershell:

One of the more common problems I’ve faced with a particular client is setting up data extractions from outside sources such as SFTP servers. Since this client deals with PII data, it’s essential that this is done in as a secure manner as possible. For example, all connections need to be encrypted using protocols such as SFTP instead of FTP.

Securely connecting is particularly easy to accomplish if one uses a 3rd party module such as Posh-SSH. However, you are still stuck with the problem of logging into the remote SFTP server securely.

Keeping credentials secured is something which Powershell makes pretty easy. I can’t think of a reason why you’d ever need to keep credentials in plaintext here.

Comments closed

Estimated Execution Plans in Azure Data Studio

Dave Bland walks us through the “Explain” button in Azure Data Studio:

At first I was thinking maybe it is a short cut to help.  Nope, it is actually an easy way to get the estimated execution plan. The key is estimated execution plan. While in SQL Server Management Studio we have been able to easily get both the estimated or actual execution plan.

Just as a reminder, the actual execution plan requires the query to actually run. While the estimated plan will use statistics to generate the plan.  Generally, if the actual and estimated are capture very close to each other, you are more than likely not going to see a difference between the tow.

Now let’s talk about the Explain button.  It will return two things, the XML of the execution plan and a graphical representation of the execution plan.

There are a few differences between Azure Data Studio’s implementation of execution plans and SQL Server Management Studio’s.

Comments closed

The State of DevOps for Data Platform Professionals

Kendra Little summarizes the Accelerate: State of DevOps Report 2019 with a focus on what this means for data platform professionals:

While there are a ton of valuable insights in the report, in this post I will focus in on the findings which I believe are most relevant to those of us who work “close to a database.” There are three very interesting aspects of the research which hit close to home:

1. Speed and stability are not tradeoffs
2. Heavy change processes negatively impact speed and stability
3. Communities of practice are a common and successful tool to transform culture

Read on for Kendra’s detailed notes.

Comments closed

DBCC CHECKDB Error on Azure SQL Database

Arun Sirpal explains an error message on Azure SQL Database:

msg 7928, Level 16, State 1, Line 3
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 8921, Level 16, State 3, Line 3
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Read on to see what this means, as well as what it means for you.

Comments closed

Comparing Performance: HBase1 vs HBase2

Surbhi Kochhar takes us through performance improvements between HBase version 1 and HBase version 2:

We are loading the YCSB dataset with 1000,000,000 records with each record 1KB in size, creating total 1TB of data. After loading, we wait for all compaction operations to finish before starting workload test. Each workload tested was run 3 times for 15min each and the throughput* measured. The average number is taken from 3 tests to produce the final number. 

The post argues that there’s an improvement, but when a majority of cases end up worse (even if just a little bit), I’m not sure it’s much of an improvement.

Comments closed

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark:

When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit is written out as a JSON file, starting with 000000.json. Additional changes to the table generate subsequent JSON files in ascending numerical order so that the next commit is written out as 000001.json, the following as 000002.json, and so on.

It’s interesting that they chose JSON instead of a binary transaction log like relational databases use.

Comments closed

Spark for .NET Developers

Ed Elliott has a long-form post covering spark-dotnet:

The .NET driver is made up of two parts, and the first part is a Java JAR file which is loaded by Spark and then runs the .NET application. The second part of the .NET driver runs in the process and acts as a proxy between the .NET code and .NET Java classes (from the JAR file) which then translate the requests into Java requests in the Java VM which hosts Spark.

The .NET driver is added to a .NET program using NuGet and ships both the .NET library as well as two Java jars. One jar is for Spark 2.3 and one for Spark 2.4, and you do need to use the correct one on your installed version of Scala.

As much as I’ve enjoyed his series, getting it in a single-post format is great.

Comments closed

Storytelling with Power BI: Consistency

Mark Lelijveld continues a series on storytelling with Power BI:

In the below report you can easily click on a country on the left side to move to another page. When it comes to interactivity it is all done right! On the right top you can also filter on order date. Let’s say we apply a filter to only see the sales up to the end of 2013. This results in a sales amount of nearly $ 319K.

Now, Germany gets my attention. I want to see more and decide to navigate to the other page by clicking on Germany. Ending up at the other page, I see that the sales amount changes back to $2.3M. In other words, my filter is gone!

Much of the difference between adequacy and excellence with visualization is in this kind of polish.

Comments closed

Powershell Remoting in dbatools

Claudio Silva takes us through a change to several cmdlets in dbatools:

I wondered why and asked the Windows team if they could provide any insight. A colleague explained to me that I needed to change three things to make my remoting commands work on our network:

1. Use the FQDN on -ComputerName and/or -SqlInstance parameters
2. Use -UseSSL parameter on the New-PSSession command
3. Use -IncludePortInSPN parameter for the New-PsSessionOption command

Read the whole thing.

Comments closed

Azure Dedicated Hosts in Preview

Mine Tokus covers the benefit of Azure Dedicated Hosts:

Recently introducedAzure Dedicated Host Preview provides single-tenant physical servers that can host one or more virtual machines. With this new hosting model, physical server is dedicated to your organization and capacity isn’t shared with other customers. Physical server-level isolation helps to address security and compliance requirements, brings visibility and control over the server infrastructure and enables significant cost savings and licensing flexibility for SQL Server workloads on Azure VMs.

I think this might get some recalcitrant large companies to be willing to adopt cloud technologies.

Comments closed