Press "Enter" to skip to content

Category: Misc Languages

Pipes And More Pipes In R

Gabriel (de Selding?) has a tutorial on how to use the various pipes in R:

In F#, the pipe-forward operator |> is syntactic sugar for chained method calls. Or, stated more simply, it lets you pass an intermediate result onto the next function.

Remember that “chaining” means that you invoke multiple method calls. As each method returns an object, you can actually allow the calls to be chained together in a single statement, without needing variables to store the intermediate results.

In R, the pipe operator is, as you have already seen, %>%. If you’re not familiar with F#, you can think of this operator as being similar to the +in a ggplot2 statement. Its function is very similar to that one that you have seen of the F# operator: it takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a “THEN”.

Auto-recommended for the F# love, and a good tutorial to boot.

John Mount has a few interesting notes on the topic:

Read on for the rest of his notes, too.

Comments closed

Error Handling In Scala

Manish Mishra gives a few examples of how to handle errors in Scala:

Try[T] is another construct to capture the success or a failure scenarios. It returns a value in both cases. Put any expression in Try and it will return Success[T] if the expression is successfully evaluated and will return Failure[T] in the other case meaning you are allowed to return the exception as a value. However with one restriction that it in case of failures it will only return Throwable types:

def validateZipCode(zipCode:String): Try[Int] = Try(zipCode.toInt)

But Throwing an exception doesn’t make much sense here since it is not much of a calculation. Although we can take this example to understand the use case. If the given string is not a number, it will be a failure. The value from the Try can be extracted in same as Option. It can be matched

As you write more complicated Spark operations, handling errors becomes critical.

Comments closed

Azure Functions Basics

Vincent-Philippe Lauzon explains the basics of Azure Functions:

In general, serverless refers to an economical model where we pay for compute resources used as opposed to “servers”.

Wait…  isn’t that what the Cloud is about?

Well, yes, on a macro-scale it is, but serverless brings it to a micro-scale.

In the cloud we can provision a VM, for example, run it for 3 hours and pay for 3 hours.  But we can’t pay for 5 seconds of compute on a VM because it won’t have time to boot.

A lot of compute services have a “server-full” model.  In Azure, for instance, a Web App comes in number of instances.  Each instance has a VM associated to it.  We do not manage that VM but we pay for its compute regardless of the number of requests it processes.

In a serverless model, we pay for micro-transactions.

This is the first part in a series and is aimed at giving a conceptual explanation.

Comments closed

Building Dynamic Row Headers With ML Services

Dave Mason tries to get around his RESULT SETS limitation when using SQL Server Machine Learning Services:

The columns in the data frame clearly have names, but SQL Server isn’t using them. The data frame columns have types in R too (more on this in a moment). Now that makes me wonder about the data types for the data returned by SQL. How is that determined? If SQL isn’t using the column names, can I assume it isn’t making use of the R column types either?

For a point of reference, let’s run some more R code to show the column names and types. As before, the rvest package is used to scrape a web page, with each HTML <table> found becoming a data frame in the “tables” list (line 3). A data frame of table metadata is created by calling data.frame(). The first parameter is a vector of column names (line 4), the second parameter is a vector of column classes (line 5), and the third parameter causes the row “names” to be incrementing digits (line 6).

This is a work in progress as Dave continues his series.

Comments closed

Basics Of Elasticsearch In .NET

Ivan Cesar gives us a brief tutorial of the Elasticsearch .NET API:

To be able to search something, we must store some data into ES. The term used is “indexing.”

The term “mapping” is used for mapping our data in the database to objects which will be serialized and stored in Elasticsearch. We will be using Entity Framework (EF) in this tutorial.

Generally, when using Elasticsearch, you are probably looking for a site-wide search engine solution. You will either use some sort of feed or digest, or Google-like search which returns all the results from various entities, such as users, blog entries, products, categories, events, etc.

These will probably not just be one table or entity in your database, but rather, you will want to aggregate diverse data and maybe extract or derive some common properties like title, description, date, author/owner, photo, and so on. Another thing is, you probably won’t do it in one query, but if you are using an ORM, you will have to write a separate query for each of those blog entries, users, products, categories, events, or something else.

Check out Ivan’s tutorial for several examples.  Elasticsearch is really good for text-based search and simple aggregations, but it probably shouldn’t be a primary data store for any data you really care about.

Comments closed

Working With CosmosDB

Derik Hammer has an introductory article showing how to work with CosmosDB to store and use document-style data:

Querying Cosmos DB is more powerful and versatile. The CreateDocumentQuery method is used to create an IQueryable<T> object, a member of System.Linq, which can output the query results. The ToList() method will output a List<T> object from the System.Collections.Generic namespace.

Derik also shows how to import the data into Power BI and visualize it.  It’s a nice article if you’ve never played with CosmosDB before.

Comments closed

Comparing Tree Graphs In SQL

Dmitriy Vlasov shows how to compare two trees in PL/SQL:

During the day, various changes are received by the accounting system from the design system. Production planning is based on the data from the accounting system. Conditions allow you to accept all the changes for the day and recalculate the product specification at night. However, as I wrote above, it is unclear how the yesterday state of the product differs from the today one.

I would like to see what was removed from the tree and what was added to it, as well as which part or assembly replaced another one. For example, if an intermediate node was added to the tree branch, it would be wrong to assume that all the downstream elements were removed from the old places and added to the new ones. They remained where they were, but the insert of the mediation node took place.  In addition, the element can ‘travel’ up and down only within one branch of the tree due to the specifics of the manufacturing process.

This is Oracle-specific; migrating it to another platform like SQL Server would take a bit of doing.

Comments closed

Using XPath To Shred HTML

Shannon Lowder shows off the HTML Agility Path project to help him parse the contents of webpages:

Let’s say we wanted the table.  We could use the XPath /html/body/table to retrieve it. We can also use XPath to refer to a collection.  Let’s say we wanted all the rows. We would use the XPath /html/body/table/tr. We would get a collection of three rows.  Notice the XPath looks a lot like a Linux or windows folder path.  That’s the idea of XPath!

I would like to point out a couple of extra points.  First, XPath is case sensitive.  So if I had tried to use /html/body/table/TR, I would find no nodes.

Second, you can use “short hand” in your XPath queries.  //body/table/tr would get you to the same place /html/body/table/tr did.

This intro is part of a series Shannon has started on scraping data from websites.

Comments closed

Setting Data Types When Building Table Values In Power BI

Chris Webb shows how to use a parameter on the Table.AddColumn function in M to define specific data types:

In the last two screenshots the ABC123 icon in the column headers show that they are set to use the Any data type; the columns returned by calling the function have lost their data types.

The key to solving this problem is using the optional fourth parameter of the Table.AddColumn() function, which allows you to set a data type for the column that function adds to a table. Altering the Invoked Custom Function step of the previous query to do this, setting the new column to be a table type like so:

Worth reading in its entirety.

Comments closed

Clippy Lives: In Scala

Akhil Vijayan explains Scala Clippy:

Now you may be wondering how these errors are identified and we get advice related to it.

Simple, these are provided by the Scala community. If you visit their official website Scala Clippy where you can find a tab “Contribute”. Under that, we can post our own errors. These errors are parsed first, and when successful we can add our advice which will be reviewed and if accepted it will be added to their database which will, in turn, be beneficial to others.

Take a close look at the screenshots; I missed it at first, but there’s helpful advice above the error message.

Comments closed