Press "Enter" to skip to content

Day: October 21, 2022

Working with Transformer Models for Machine Translation

Stefania Cristina continues a series on transformer models. First up is plotting loss curves:

We have previously seen how to train the Transformer model for neural machine translation. Before moving on to inferencing the trained model, let us first explore how to modify the training code slightly, in order to be able to plot the training and validation loss curves that can be generated during the learning process. 

The training and validation loss values provide important pieces of information, because they allow us to have a better insight on how the learning performance is changing over the number of epochs, and help us diagnose any problems with learning that can lead to an underfit or an overfit model. They will also inform us about the epoch at which to use the trained model weights at the inferencing stage.

Then we get to try it out:

We have seen how to train the Transformer model on a dataset of English and German sentence pairs, as well as how to plot the training and validation loss curves in order to diagnose the model’s learning performance and decide at which epoch to inference the trained model. We are now ready to inference the trained Transformer model for the purpose of translating an input sentence.

In this tutorial, you will discover how to inference the trained Transformer model for neural machine translation. 

Click through for the results and to see exactly why there’s so much computational effort dumped into high-end trained models.

Comments closed

Shiny App Dockerfile Automation

Jamie Owen and Colin Gillespie don’t have time to write dockerfiles:

For creating a production deployment of a {shiny} application it is often useful to be able to provide a Docker image that contains all the dependencies for that application. Here we explore how one might go about automating the creation of a Dockerfile that will allow us to build such an image for a {shiny} application.

There are some neat tricks in here.

Comments closed

Managing R Secrets with .env Files

Thomas Williams has a secret:

You should never embed passwords or other “secrets” – sensitive data – in code. A better way is to put sensitive data into configuration, and load configuration from your code. Read on to find out how to do this in R Markdown (and Shiny).

Click through for one way to do this. Just make sure you .gitignore excluded .env files.

Comments closed

Lock Escalation outside of Repeatable Read

Paul White continues a series on lock escalation:

When SQL Server reads data under locking read committed isolation (the default for non-cloud offerings) and chooses row-level locking granularity, it normally only locks one row at a time. The engine processes the row through parent operators before releasing the shared lock prior to locking and reading the next row.

When the engine chooses page-level locking granularity, it takes and holds a shared page lock while processing rows on the current page and then releases the lock right before acquiring a lock on the next page.

In general, SQL Server is careful to take and hold locks only when needed and only long enough to guarantee correct results, taking account of the current isolation level. When locking at row granularity, the engine can skip row locks entirely when it’s safe to do.

Read on to understand when SQL Server either gets it wrong or when exigent factors alter the story.

Comments closed

Replicated Tables in Dedicated SQL Pools

Pedro Martinez explains the idea behind replicated tables in Azure Synapse Analytics dedicated SQL pools:

If you have ever used Azure Synapse Analytics dedicated SQL pool you would know there are multiple table types to choose from, for your workload. You might ask yourself, “when can I use Replicated table type and how I can efficiently use them”?  

This blog is going to talk in detail about replicated table type, when to use and what are best practices for its usage. But before that, let’s start by understanding the different table types: 

I’ve seen replicated tables get overused, so check out Pedro’s advice on how not to get burned with them.

Comments closed

DOP Feedback in SQL Server 2022

Kate Smith points out a new feature in SQL Server 2022:

In SQL Server 2022, we introduced a new feature called DOP feedback. This feature will look at any parallel query and determine if it might perform better with a lower degree of parallelism than currently being used. For example, perhaps 16 threads will perform better than 20 if there are a lot of waits on other threads. It will test out the new degree of parallelism and, either decide that this was a good change and keep the 16 threads, or it will revert to previous levels of parallelism and go back to 20 threads. If the new degree of parallelism is good, then this optimization is persisted inside the query store and will be applied appropriately to a query for future executions. 

Read on for an overview of how it works and what protections are in place to keep it from going completely bonkers. Well, more completely bonkers than what you already have.

Comments closed

CI/CD with Azure Synapse Link for SQL Server 2022

Kevin Chant gives us the whole story:

In this post I want to show you a complete CI/CD experience for Azure Synapse Link for SQL Server 2022 tables. Which uses a YAML Pipeline in Azure DevOps. Including how to automatically stop and start it in the pipeline.

In a previous post I showed how an easier way to perform CI/CD for Azure Synapse Link for SQL Server 2022. Where you only need to stop the link, update the SQL Server database and afterwards start the link again.

However, the best CI/CD solutions are the ones where you do not do any manual work at all. This includes stopping and starting the link.

And that’s just what Kevin gives us.

Comments closed

Database Projects and Version Control

Olivier van Steenlandt helps you get database code into source control:

In this blog post, we will focus on how to get started with Database Projects and how to get this into Source Control (Azure Repos). So together we will create our first Database Project, import our database into the project and push it to the Azure Repository.

Before we can start, we need to make sure that we have the required tools installed, in this blog post I will focus on Visual Studio. In order to create your first Database Project, you need to ensure that the SQL Server Data Tools extension for Visual Studio is installed.

This one I intended to post earlier in the week but it got away from me for a little bit. Do check it out.

Comments closed