Press "Enter" to skip to content

Author: Kevin Feasel

When Data Factory Flows Don’t

Emma Stewart points out an issue that might vex newcomers to Azure Data Factory:

The data within the Data Lake store was organised into a Year and Month hierarchy for the folders, and each days transactions were stored in a file which was named after the day within the relevant month folder. The task then was to create a pipeline which copies the dataset in the Data Lake Store over to the dbo.Orders table in Azure SQL DB every day within the scheduled period (Q1 2016).

After creating all the json scripts and deploying them (with no errors), I clicked on the ‘Monitor and Manage’ tile to monitor the activities, check everything was working as it should be and monitor the progress. After waiting for at least 10 minutes, I started to get frustrated.

Click through for the fix and an explanation.

Comments closed

Upgrading Cassandra To Version 3

Mikhail Chinkov has a process for upgrading Cassandra from version 2 to the latest release of 3:

At first sight it should be obvious. Cassandra is a distributed storage and you’re able to upgrade each node independently. But also it’s a kind of tricky, because Cassandra has so many concepts and moving parts. Introducing such a major change, you’ll be probably excited about how not to break one.

Also, as with every DB upgrade, the most important outcome will be your app behaviour. Protocol versions support might be removed from the future versions. Storage might work another way application doesn’t expect. There might be a lot of pitfalls. So, to start getting the benefits of upgrade, we have to be 200% sure that the application works. And at least it won’t work worse with database.

The whole process is straightforward but there do seem to be a couple places where you can shoot yourself in the foot.

Comments closed

Availability Group Latency Reports

Sourabh Agarwal points out some new reports in Management Studio 17.4:

The Latency data collection functionality and the associated reports allows a database administrator to quickly discern the bottleneck in the log transport flow between the Primary and the Secondary replicas of an Availability Group. This feature does NOT answer the question “Is there latency in the Availability Group deployment?” but rather provides a way to understand why there is latency in the Availability Group Deployment. This functionality provides a way to narrow down the potential cause of latency in an Availability Group deployment.

There are some things that this report doesn’t capture, but it does give us a bit more insight.

Comments closed

Updating @@SERVERNAME

Eitan Blumin has a script to change what you get when you reference @@SERVERNAME:

If, for whatever reason, the Windows Computer Name is changed after SQL Server is already installed, then @@SERVERNAME and the information in sysservers would not automatically reflect the change.
This means that @@SERVERNAME contains the incorrect value for the machine name.

Sometimes, and especially in production environments, the value in that global variable is important and is used as part of business processes.
And if @@SERVERNAME doesn’t reflect the actual server name, it could cause problems.

Read on for that script.

Comments closed

Joins And Parentheses

Shane O’Neill walks through different ways of grouping tables in a SQL query:

Asker: that’d be awesome if i can inner join two other tables instead of the table mentioned after FROM keyword
Me: …wait, what?
A: He’s asking
t1 left join t12
t1 left join t13
t12 inner join t13
M: em…it’s possible but it’s…iffy
A:  i wanna learn it.
do your magic

I’ve seen this in action before, but rewrote the queries not to do this.  The problem is that as the query gets more complicated, it becomes much harder to diagram things mentally.  I don’t think I’ve seen a use yet that I couldn’t rewrite to be simpler.

Comments closed

Data Warehousing Versus Data Virtualization

Koos van Strien contrasts data virtualization with data warehouse automation:

From a certain viewpoint, one could state that Data Virtualization is focused on the way the world should work: when integrating data, one shouldn’t have to store it everywhere. Why not let the system decide when to store? For some, to adopt this view might mean a paradigm shift: suddenly, the Data Warehouse the go-to integration point any more!

From this viewpoint, DWA a tool “from the trenches”: after years of struggle and hard work to build our warehouses, we’ve developed some smart ways to automate our warehouse-building based on abstract models.

Worth reading the whole thing.

Comments closed

Using The Command Line To Migrate To Azure SQL Database

Arun Sirpal shows how to use SqlPackage.exe to migrate a database to Azure SQL Database:

I have moved many databases to Azure via different methods but I recently came across a new way. Well technically it’s not new, I should say, newly found. The migration was done via the command line which is not exactly ground breaking but it’s nice to have another option.

The idea behind this is simple. Create the bacpac via command line using sqlpackage.exe with the action as export then do an import action into Azure.

Read on for the demo.

Comments closed

Data Breaches And Knowledge-Based Authentication

Jeff Mlakar summarizes Troy Hunt’s recent congressional testimony:

Lastly, there is a lack of accountability for the breaches. If you collect data about others you are responsible for it. Yet all too often organizations discover years later they suffered a massive data breach and then proclaim to the press that they were hacked by evil doers and caught unprepared.

Then they progress through the stages of data breach grief:

  1. OMG I just read the news and found out we’ve been hacked

  2. Turns out it was 4 years ago

  3. Blame evil hackers while proclaiming innocence as a naive victim

  4. The media turns up the heat – time to blame some systems administrator

  5. Offer your customers credit monitoring

  6. Acceptance

  7. Wait until the next hack then GOTO step #1

It will be interesting to see what (if anything) comes out of this.

Comments closed

Estimating Used Car Prices

Kevin Jacobs wants to estimate the value of his car and shows how to set up a machine learning job to do this:

As you can see, I collected the brand (Peugeot 106), the type (1.0, 1.1, …), the color of the car (black, blue, …) the construction year of the car, the odometer of the car (which is the distance in kilometers (km) traveled with the car at this point in space and time), the ask price of the car (in Euro’s), the days until the MOT (Ministry of Transport test, a required periodical check-up of your car) and the horse power (HP) of the car. Feel free to use your own variables/units!

It’s an interesting example of how you can approach a real problem.

Comments closed

Introduction To Neural Nets

Ben Gorman has a two-part series introducing neural networks.  First, the basics behind neural networks:

We can solve both of the above issues by adding an extra layer to our perceptron model. We’ll construct a number of base models like the one above, but then we’ll feed the output of each base model as input into another perceptron. This model is in fact a vanilla neural network. Let’s see how it might work on some examples.

Then, he digs into the mathematics of backpropagation:

Our problem is one of binary classification. That means our network could have a single output node that predicts the probability that an incoming image represents stairs. However, we’ll choose to interpret the problem as a multi-class classification problem – one where our output layer has two nodes that represent “probability of stairs” and “probability of something else”. This is unnecessary, but it will give us insight into how we could extend task for more classes. In the future, we may want to classify {“stairs pattern”, “floor pattern”, “ceiling pattern”, or “something else”}.

Our measure of success might be something like accuracy rate, but to implement backpropagation (the fitting procedure) we need to choose a convenient, differentiable loss function like cross entropy. We’ll touch on this more, below.

This is definitely a series to read after you’ve gotten your coffee.

Comments closed