Press "Enter" to skip to content

Month: March 2020

Characterizing and Optimizing a Serverless Workload

Adrian Colyer reviews an interesting paper:

Today’s paper analyses serverless workloads on Azure (the characterisation of those workloads is interesting in its own right), where users want fast function start times (avoiding cold starts), and the cloud provider wants to minimise resources consumed (costs). With fine-grained, usage based billing, resources used to keep function execution environments warm eat directly into margins. The authors demonstrate a policy combining keep-alive times for function runtimes with pre-warming, that dominates the currently popular strategy of simply keeping a function execution environment around for a period of time after it has been used, in the hope that it will be re-used. Using this policy, users see much fewer cold starts, while the cloud provider uses fewer resources. It’s the difference between the red (state-of-the-practice) and green (this paper) policies in the chart below. Win-win!

Very interesting.

Comments closed

Correlation in easystats

The easystats team announces a new R package:

The easystats project continues to grow with its more recent addition, a package devoted to correlations. Check-out its webpage here!

It’s lightweight, easy to use, and allows for the computation of many different kinds of correlations, such as partial correlations, Bayesian correlations, multilevel correlations, polychoric correlations, biweightpercentage bend or Sheperd’s Pi correlations (types of robust correlation), distance correlation (a type of non-linear correlation) and more, also allowing for combinations between them (for instance, Bayesian partial multilevel correlation).

I’d recommend reading the examples on the GitHub repo due to formatting. Looks quite interesting. H/T R-Bloggers.

Comments closed

Visualizing “Check All that Apply” Options

Stephanie Evergreen shows a couple of ways to visualize multi-select results:

Which means a bar chart, ordered greatest to least, is your alternative. But that can have many variations.

In this example, created by Dr. Sheila B. Robinson, she used 100% stacked bars for each survey item, to indicate that each item could have totaled 100% if all respondents checked that box. This is a nice way to show that, while the response options as a whole can’t add to 100%, each option on its own CAN. Plus, look at the cute icons.

Click through for several alternatives depending upon the story you’re trying to tell.

Comments closed

Connecting to Cosmos DB from the Gremlin Console

Hasan Savran shows how to connect to Cosmos DB’s Gremlin API via the Gremlin Console:

Graph Databases have been popular lately. You can use Azure Cosmos DB as your  Graph database source by selecting Gremlin API. Gremlin programing language is developed by Apache TinkerPop of the Apache Software Foundation. I will show you how to connect to Cosmos DB Gremlin API from TinkerPop Gremlin console.

     You can download the latest version of Gremlin console from here. The latest version was 3.4.6 when I wrote this post. I was able to connect to the Cosmos DB by using the versions 3.4.3 and 3.4.6.  You can run the console from Linux or Windows, I will focus on the Windows version here but Linux version should work the same way. You must have Java SDK 8 to run this console. Latest version of Java SDK does not work with this console.

There are a couple of configuration steps, but nothing crazy.

Comments closed

Trimming Characters with Power Query

Imke Feldmann shows off an optional parameter with Text.TrimStart() and Text.TrimEnd() in Power Query:

When cleaning dirty data, you might have used the Trim-function (Text.TrimStart or Text.TrimEnd) to delete leading or trailing whitespace from your strings. However, did you know that you could use these functions as well to delete any other characters as well from the start or end of a string? Trimming text with custom characters is pretty straightforward:

Interesting that it does not take a regular expression but rather a list of characters.

Comments closed

Tips and Traps with PowerShell 7

Jeffrey Hicks takes us through some of the tricky parts of migrating to Powershell 7:

A long established community best practice in PowerShell scripting is not using command and parameter aliases. In a cross-platform world, this is even more critical. You may have been in the habit of using Sort in your code in place of Sort-Object. I know I have. I didn’t mind bending the no alias rule abit because there was nothing cryptic about Sort.

But in the Linux world, sort is a native command. There is no PowerShell alias. If your code uses sort, on Linux it will call the native command which will most likely break your code.

Read on for several more hints.

Comments closed

Evaluating Regression Models in Azure ML

Dan Fitton continues a series on model evaluation with Azure Machine Learning:

The initial go-to metric for understanding a regression model is the R squared (or R2) value, also known as the coefficient of determination. R squared measures how well the model is fitted to the data – the goodness of fit. It indicates how much of the variation of y (the target) is explained by the variation in x (the features).

The measures are bog standard if you’ve worked with regressions before, and Dan does a good job explaining them.

Comments closed

Ensuring Ordering of Kafka Data

Zeke Dean takes us through an explanation of Kafka partitioning and message delivery behaviors:

Apache Kafka provides developers with a uniquely powerful, open source and versatile distributed streaming platform – but it also has some rather complex nuances to understand when trying to store and retrieve data in your preferred order.

Kafka captures streaming data by publishing records to a category or feed name called a topic. Kafka consumers can then subscribe to topics to retrieve that data. For each topic, the Kafka cluster creates and updates a partitioned log. Kafka send all messages from a particular producer to the same partition, storing each message in the order it arrives. Partitions therefore function as a structure commit log, containing sequences of records that are both ordered and immutable.

Click through for the explanation.

Comments closed

Cosmos DB Changes

Hasan Savran hits on a few changes to Cosmos DB:

Free Tier of Cosmos DB
     Couple of big changes came from Azure Cosmos Db team this week. First one is, free tier of Cosmos DB. In free tier, you get the first 400 Request Units and 5 GB of storage for free for the lifetime of the account. This is a great deal if you like to use Azure Cosmos DB in your new project. Give it a try, It’s free! All you need to do is click on Apply for Apply Free Tier Discount.

This is a big one, but there are a few other interesting changes in there as well.

Comments closed