Press "Enter" to skip to content

Category: Misc Languages

A Primer on Regular Expressions

Steven Sanderson provides a quick guide to regular expressions:

Regular expressions, often abbreviated as regex, are powerful tools used in programming to match and manipulate text patterns. While they might seem intimidating at first, regular expressions are incredibly useful for tasks like data validation, text parsing, and pattern matching. In this blog post, we’ll explore regular expressions in the context of R programming, breaking down the concepts step by step and providing practical examples along the way. By the end, you’ll have a solid understanding of regular expressions and be ready to apply them to your own projects.

This is an extremely powerful language which can take years (decades?) to master, especially considering that there are several regular expression syntaxes and they don’t all behave the same way. But still, I’ve found that the more familiar you are with regular expressions, the simpler certain classes of problem become.

Leave a Comment

Adding Count to a Grouped DataFrame in Spark

The Big Data in Real World team does some counting:

We want to group the dataset by Name and get a count to see the employee and the number of projects they are assigned to. In addition to that sub count, we also want to add a column with a total count like below.

One important thing to remember about Spark transformations is that they’re lazy: just because you ran df.groupBy(...).agg(...) doesn’t mean the new DataFrame exists yet, so until you call the show() action (or whatever), the original data is still there for the taking, which is how you can reference it again later in the chained statement.

Leave a Comment

Analyzing Big-O Notation in Polyglot Notebooks

Matt Eland brings me back to college:

Polyglot Notebooks is a great way of running interactive code experiments mixed together with rich markdown documentation.

In this short article I want to introduce you to the #!time magic command and show you how you can easily measure the execution time of a block of code.

This can be helpful for understanding the rough performance characteristics of a block of code inside of your Polyglot Notebook.

In fact, we’ll use this to explore the programming concepts behind Big O notation and how code performance changes based on the number of items.

I like this for two reasons. First, because a visual indicator of Big-O notation is helpful for students learning about the topic. Second, because that’s not the only thing you can do with the #time magic.

Leave a Comment

Importing Code into Polyglot Notebooks

Matt Eland brings some code to the party:

We’ve seen that Polyglot Notebooks allow you to mix together markdown and code (including C# code) in an interactive notebook and these notebooks allow you to share data between cells and between languages. However, frequently in programming you want to reference code that others have written without having to redefine everything yourself.

In this article we’ll explore how Polyglot Notebooks allows you to import dotnet code from stand-alone files, DLLs, and NuGet packages so your notebooks can take advantage of external code files and the same libraries that you can work with from your code in Visual Studio.

The syntax, by the way, is very similar to the F# Interactive (and the short-lived C# Interactive) tool, particularly #i and #r.

Leave a Comment

SandDance in Polyglot Notebooks

Matt Eland continues a series on dotnet Polyglot Notebooks:

As I’ve been doing more and more dotnet development in notebooks with Polyglot Notebooks I’ve found myself wanting more options to create rich data visualizations inside of VS Code the same way I could use Plotly for data visualization using Python and Jupyter Notebooks.

Thankfully, Microsoft SandDance exists and fills some of that gap in terms of doing rich data visualization from dotnet code.

In this article I’ll talk more about what SandDance is, show you how you can use it inside of a Polyglot Notebook in VS Code, and show you a simple way you can use it without needing a Polyglot Notebook.

Most of my experience with SandDance was with Azure Data Studio, but it’s nice to see this capability in notebooks as well.

Leave a Comment

Poisson Hidden Markov Models in SAS

Ji Shen shows off how to perform discrete time series in SAS:

The HMM procedure in SAS Viya supports hidden Markov models (HMMs) and other models embedded with HMM. PROC HMM supports finite HMM, Poisson HMM, Gaussian HMM, Gaussian mixture HMM, the regime-switching regression model, and the regime-switching autoregression model. This post introduces Poisson HMM, the latest addition to PROC HMM in the SAS Viya 2023.03 release.

Count time series is ill-suited for most traditional time series analysis techniques, which assume that the time series values are continuously distributed. This can present unique challenges for organizations that need to model and forecast them. As a popular discrete probability distribution to handle the count time series, the Poisson distribution or the mixed Poisson distribution might not always be suitable. This is because both assume that the events occur independently of each other and at a constant rate. In time series data, however, the occurrence of an event at one point in time might be related to the occurrence of an event at another point in time, and the rates at which events occur might vary over time.

HMM is a valuable tool that can handle overdispersion and serial dependence in the data. This makes it an effective solution for modeling and forecasting count time series. We will explain how the Poisson HMM can handle count time series by modeling different states by using distinct Poisson distributions while considering the probability of transitioning between them.

Read on for an overview of Hidden Markov Models (in general and the Poisson variation in particular) and some of the challenges you can run into when performing this test.

Leave a Comment

Variable Sharing in Polyglot Notebooks

Matt Eland performs a few swaps:

While Polyglot Notebooks certainly brings the dream of notebook development to dotnet, Polyglot is at its finest when you work with one language and then hand off data to the next language for additional processing.

In this article we’ll talk about sharing variables between kernels using Polyglot Notebooks and VS Code. We’ll explore the syntax and tooling that exists around these functionalities as well as the current limitation of sharing variables between kernels.

For simplicity, I’m going to avoid getting into SQL and KQL kernels in this article, but I plan on delving further into each of these specialized kernels in future articles.

Click through for an example using the best .NET language, as well as C#. Do read the whole thing, especially if you think about passing around discriminated unions or method-reach objects.

Leave a Comment

Integrating VBA and R Code

Steven Sanderson has become Dr. Moreau. Part 1 shows how to call R code from VBA:

This line defines a subroutine called “CallRnorm”. A subroutine is a block of code that can be executed repeatedly from any part of the code, and it starts with the “Sub” keyword followed by the subroutine name and any arguments in parentheses.

Part 2, as you might expect, covers the obverse:

Yesterday I posted on using VBA to execute R code that is written inside of the VBA script. So today, I will go over a simple example on executing an R script from VBA. So let’s get into the code and what it does.

First, let’s look at the Function called “Run_R_Script”. This function takes four arguments, where the first two are mandatory, and the last two are optional.

Comments closed

An Introduction to Polyglot Notebooks

Matt Eland walks us through a sample:

Polyglot Notebooks allows you to create notebooks composed of multiple cells. These cells can be either markdown cells for documentation or code cells containing code in either C#, F#, PowerShell, SQL, KQL, HTML, JavaScript, or mermaid markdown for diagramming.

This allows you to mix together rich documentation supported by little pieces of code that progressively expand upon an idea, tell a story, or otherwise provide insight or information to you as a developer.

Click through for the example. The thing I hadn’t realized—because I don’t really do this in Jupyter—is that you can share variables between languages. That’s a fairly useful feature when you want to do most of your work in one language but just happen to need a library using a different language.

Comments closed

Working with .NET Polyglot Notebooks

Matt Eland installs Polyglot notebooks in VSCode:

Polyglot Notebooks is a powerful new interactive notebook technology that lets you run experiments in your editor, mix together code and rich documentation, and write code in a variety of languages to accomplish your tasks.

This article will guide you through the setup process to get Polyglot Notebooks running on your machine so you can do local data science and analytics notebooks using dotnet.

In case you’re curious, my installation has the ability to create notebooks in F#, C#, HTML, JavaScript, KQL, Markdown, Mermaid, Powershell, and SQL. I’m not positive how many of those come with the extension itself and how many are additional kernels I got from installing other extensions.

Comments closed