Press "Enter" to skip to content

Day: April 2, 2018

Using Python Within R

David Smith points out new reticulate package:

With reticulate, you can:

  • Import objects from Python, automatically converted into their equivalent R types. (For example, Pandas data frames become R data.frame objects, and NumPy arrays become R matrix objects.)

  • Import Python modules, and call their functions from R

  • Source Python scripts from R

  • Interactively run Python commands from the R command line

  • Combine R code and Python code (and output) in R Markdown documents, as shown in the snippet below

The first thing that came to mind when reading this was the implementation of the keras package in R and how it calls out to TensorFlow (written in Python).  The ability to make R vs Python an “and” instead of an “or” proposition is quite powerful.

Comments closed

HDFS Erasure Coding

Ayush Tiwari explains how HDFS Erasure Coding lets us keep the same data durability while improving storage efficiency:

In this particular example, when you look at the codeword, actual data cells are 6(blue cells) & 3(red cells) are the parity cells which are simply obtained by multiplied our data cells to generation matrix.
Storage failure can be recovered by the multiplying inverse of generator matrix with the extended codewords as long as ‘k’ out of ‘k+m’ cells are available.
Therefore, here Data Durability is 3 as it can handle 2 simultaneous failure, Storage Efficiency is 67% (as we are using only 1 extra block i.e. 6/9) & only we need to store half number of cells as compared to original number of cells, we can conclude that we also have only 50% overhead in it.

It’s a good explanation of one of the biggest improvements to HDFS over the past several years.

Comments closed

Ignoring Case In Table.Distinct With Power Query

Cedric Charlier shows how to ignore case when running Table.Distinct in Power Query:

Note that at the top of the table, I’ve twice the value ADO.NET, once with uppercase and once lowercase for the “Net” part. If I try to apply a Table.Distinct, the two values will be considered as distinct and one of them won’t be removed.

Will it be an issue? If this your key and it’s part of a model where this key is part of one side of a one-to-many then it will be an issue.

Read on for the solution.

Comments closed

Using Cosmos DB For Graph Data

Jose Mendes has an introduction to the Cosmos DB graph engine:

Gremlin is the graph traversal language of Apache TinkerPop, an open source Graph Computing Framework. Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of steps, with each step performing an operation on the data stream (further details here). There are 4 fundamental steps:

· transform: transform the objects in the stream

· filter: remove objects from the stream

· sideEffect: pass the object, but yield some side effect

· branch: decide which step to take

Click through for a quick example showing how to create and populate a graph.

Comments closed

Will It Bit?

Louis Davidson wants to see what he can cast to a bit type:

There are no other textual/alpha string values that will cast to a bit value, but the numeric values that will cast to a bit are voluminous (even some that are in string format). Consider the following eight statements:

SELECT CAST(100 AS bit);
SELECT CAST(-100 AS bit);
SELECT CAST(99999999999999999999999999999999999999 AS bit);
SELECT CAST(-99999999999999999999999999999999999999 AS bit);
SELECT CAST(88.999999 AS bit);
SELECT CAST('1' AS bit);
SELECT CAST('2' AS bit);
SELECT CAST('999999' AS bit);

Danged if they didn’t all work, and all return 1.

Check out what else Louis tries to cast to a bit type.

Comments closed

All About Power BI Licensing

Reza Rad has a guide for the perplexed when it comes to licensing Power BI:

The pricing table above may scare you off and you may immediately think of not going through the embedded path. However, I need to let you know that there are some scenarios which Power BI Embedded can be a much more cost-effective option than Pro. Here is an example:

Assume that you have 100 users for your Power BI solution. And your users are not connecting all at the same time to use Power BI reports. You may have the maximum of 300 page renders per hour for them if you use embedded. In such case, embedded for that scenario would cost you about $700 USD per month, where the Power BI Pro for 100 users would be $1000 USD per month. This means saving of $3,600 USD per year. This is an example scenario that Power BI Embedded can be more cost-effective than Pro.

Give this a careful reading if you’re looking to implement Power BI in your environment.

Comments closed