Notebooks in Azure Databricks

Brad Llewellyn takes us through Azure Databricks notebooks:

Azure Databricks Notebooks support four programming languages, Python, Scala, SQL and R.  However, selecting a language in this drop-down doesn’t limit us to only using that language.  Instead, it makes the default language of the notebook.  Every code block in the notebook is run independently and we can manually specify the language for each code block.

Before we get to the actually coding, we need to attach our new notebook to an existing cluster.  As we said, Notebooks are nothing more than an interface for interactive code.  The processing is all done on the underlying cluster.

Read on to learn how Databricks uses the notebook metaphor heavily in how you interact with it.

Reading and Writing CSV Files with spark-dotnet

Ed Elliott continues a series on Spark for .NET:

How do you read and write CSV files using the dotnet driver for Apache Spark?

I have a runnable example here:
https://github.com/GoEddie/dotnet-spark-examples

Specifcally:
https://github.com/GoEddie/dotnet-spark-examples/tree/master/examples/split-csv

The quoted links will take you straight to the code, but click through to see Ed’s commentary.

Logging in Azure

Kevin Feasel

2019-07-16

Cloud

Rolf Tesmer has a detailed post covering how and what to log when using Azure for a modern data warehouse:

In my view – what often doesn’t get enough attention up front are the critical aspects of monitoring, auditing and availability. Thankfully, these are generally not too difficult to plug-in at any point in the delivery cycle, but as like with most things in cloud there are just so many different options to consider!

So the purpose of this blog is to focus on the key areas of Azure Services Monitoring and Auditing for the Azure Modern Data Platform architecture.

Click through for examples from a number of different Azure services.

Drawing SSIS Packages as SVGs

Bartosz Ratajczyk continues a series on taking SSIS packages and generating SVGs from their control flows:

To make things harder, the layout of the sequences and tasks is not some nested XML structure. All of the elements have the same parent – <GraphLayout>, meaning all of them are at the same tree level. Also – there is no attribute showing where a particular object belongs. Almost. In the example with the sequences, I see two regularities:
– the outer container is placed later in the XML, than the inner container
– the @Id attributes show the nesting of the objects

I’m not sure how often I’d use this in practice, but if you want to understand some of the internals of SSIS, this is an interesting series to follow.

Getting to DevOps

Kevin Feasel

2019-07-16

DevOps

Grant Fritchey takes us through some of the baby steps in getting started with DevOps:

However, easy by comparison doesn’t mean just simply easy. There’s a lot of work involved and making mistakes early in the process has repercussions for every later step.

For example, where do your put your code?

Yeah, yeah, I know. Source control. I mean, where in source control do you put the code? What do you call the project and solution? Is it in git, github, Azure, or somewhere else? Choose wisely because every single step of automation you set up after this will be completely dependent on that first choice. Further, putting in github, or example, has repercussions for how you implement automation in Azure DevOps Pipelines.

But hey, no pressure.

Removing Text Between Delimiters in Power Query

Imke Feldmann has a new M function for us:

While there is a native function to fetch text between 2 delimiters in Power Query, there is no such function that removes the text instead. Therefore I’ve created a custom function Text.RemoveBetweenDelimiter. It even lets you choose to remove the delimiters themselves as well via the optional 4th parameter.

Click through for the script and sample uses.

SQL Graph and Hierarchies

Kevin Feasel

2019-07-16

Graph

Hasan Savran shows how you can use SQL Server graph tables to handle hierarchical data:

The > determines the direction of the query. By changing this arrow into other direction, you can use the same relation table to find who manages employee. For example, let’s rewrite this query and look who manages East Manager

SELECT emp.Name as Emp, emp2.Name as Mgr
FROM Hr as emp, ReportsTo as r, Hr as emp2 
WHERE MATCH(emp<-(r)-emp2) and emp.empid = 8

There are a few relational approaches which work well for hierarchies, though they all come with tradeoffs in terms of read complexity, write complexity, and development complexity. Click through for Hasan’s take on the graph-based approach.

Powershell and Windows Terminal Profiles

Jeffery Hicks shows how you can modify your Windows Terminal profile using Powershell:

I recently updated my Windows 10 systems to the 1903 release. One of the reasons is that I wanted to try out the new Windows Terminal preview. You can find it in the Windows Store. This is bleeding edge stuff and far from complete but promises to a great addition. Now you will be able to have all your command terminals, in one tabbed application and easily be able to switch between them. As I said, this is far from being a finished and polished product. Right now, if you want to add a new profile, that is another terminal, you have to manually edit a json file. If you have VS Code installed, the file will open in that.  Otherwise, I’m assuming you’ll get whatever application is associated with the .json extension.

Read on for a Powershell one-liner which lets you create a terminal profile.

Categories

July 2019
MTWTFSS
« Jun Aug »
1234567
891011121314
15161718192021
22232425262728
293031