Press "Enter" to skip to content

Day: December 13, 2023

Testing the Reproducibility of Random Numbers in STAN

Sebastian Sauer performs some tests:

Bayes models (using MCMC) build on drawing random numbers. By their very nature, random numbers are random. Unless they are not. As you may know, the random number fuctions in computers are purely deterministic.

However, in practice, some inpredictable behavior may still show up. The reason being simply that two computational environment must – in theory – being exactly identical in order to reproduce the same results. At least identical in every bit that influence random number generation.

Click through for the testbed and code.

Comments closed

Analyzing Shiny App Startup Times

Osheen MacOscar wants to know how long it takes to start up that Shiny app:

In the last blog I spoke about using Google Lighthouse to test the speed of web pages. I wanted to build upon that and use Lighthouse to test some Shiny apps.

To get a feel for Shiny’s performance in a Lighthouse analysis, I needed a lot of shiny apps that I could test and create a dataset from, so I used the entries to the 2021 Shiny app contest, which is a competition where people enter Shiny apps to be judged on technical merit and artistic achievement. I used the 2021 apps as there has unfortunately not been a competition since. A full list of the submissions can be found on the Posit Community website.

Read on to see what you can do with Lighthouse, as well as a few pain points around it.

Comments closed

Troubleshooting an Azure ML Deployment Locally

I have a new video:

In this video, I take us through the process of creating a local deployment of an Azure ML managed endpoint. We will cover requirements, why you might want to do this, and common problems you may run into along the way.

This was a fun video to make, especially in anticipating the sorts of problems that come up along the way. I won’t pretend that it’s comprehensive but it does hit several of the most common problems I see (or cause).

Comments closed

Continuing the Advent of Fabric

Tomaz Kastrun has been busy. On day 9, we build a custom environment:

Microsoft Fabric provides you with the capability to create a new environment, where you can select different Spark runtimes, configure your compute resources, and create a list of Python libraries (public or custom; from Conda or PyPI) to be installed. Custom environments behave the same way as any other environment and can be used and attached to your notebook or used on a workspace. Custom environments can also be attached to Spark job definitions.

On day 10, we have Spark job definitions:

An Apache Spark job definition is a single computational action, that is normally scheduled and triggered. In Microsoft Fabric (same as in Synapse), you could submit batch/streaming jobs to Spark clusters.

By uploading a binary file, or libraries in any of the languages (Java / Scala, R, Python), you can run any kind of logic (transformation, cleaning, ingest, ingress, …) to the data that is hosted and server to your lakehouse.

Day 11 introduces us to data science in Fabric:

We have looked into creating the lakehouse, checked the delta lake and delta tables, got some data into the lakehouse, and created a custom environment and Spark job definition. And now we need to see, how to start working with the data.

Day 12 builds an experiment:

We have started working with the data and now, we would like to create and submit the experiment. In this case, MLFlow will be used here.

Create a new experiment and give it a name. I have named my “Advent2023_Experiment_v3”.

Click through to catch up with Tomaz.

Comments closed

Debugging Stored Procedures

Erik Darling shares some tips:

Debugging, like error handling, is a design choice that is fairly easy to make at the outset of writing a stored procedure, and is usually a lot easier to get in there if you do it from the get-go.

The number of times I’ve had to go back and add debugging into something only to introduce debugging bugs is actually a bit tragic.

If you’ve never brought down a system with your monitoring process or introduced bugs via debugging code, you’re not living life to its fullest.

Comments closed

Microsoft Fabric and Dataverse Link

Teo Lachev sees some potential:

I might have identified at last a good case for Microsoft Fabric, but I’ll be in a better position to confirm after a POC with larger datasets. Dynamics Online, aka Dynamics 365, epitomizes the customer’s struggle to export their data hosted in SaaS cloud offerings for analytics or other purposes. Since unfortunately Microsoft doesn’t provide direct access to the Dynamics native storage (Azure SQL Database), which often could be the simplest and fastest solution, Dynamics has “compensated” throughout the years by introducing and sunsetting various options:

Click through for that table, as well as Teo’s thoughts on the possibility of Dataverse Link to Fabric as a viable solution.

Comments closed

B-Tree Indexes in Postgres

Henrietta Dombrovskaya continues a series on indexing in PostgreSQL:

In the previous article we learned that the most helpful indexes are indexes with the lowest selectivity, which means that each distinct value in an index corresponds to a small number of rows. The smallest number of rows is one, thereby, the most useful indexes are unique indexes.

The definition of a unique index states just that: an index is unique if for each indexed value there is exactly one matching row in the table. PostgreSQL automatically creates a unique index to support any primary key or unique constraint on a table.

Read on for more about unique indexes, compound indexes, and bitmaps.

Comments closed