Press "Enter" to skip to content

Author: Kevin Feasel

The Problem of Reproducability in Neural Networks

Pete Warden explains a problem:

Last week I had a question from a colleague about reproducibility in TensorFlow, specifically in the 1.14 era. He wanted to be able to run the same training code multiple times and get exactly the same results, which on the surface doesn’t seem like an unreasonable expectation. Machine learning training is fundamentally a series of arithmetic operations applied repeatedly, so what makes getting the same results every time so hard? I had the same question when we first started TensorFlow, and I was lucky enough to learn some of the answers from the numerical programming experts on the team, so I want to share a bit of what I discovered.

Read on for that answer.

Leave a Comment

SQL Server 2022 Query Store Hints

Matthew McGiffen takes a hint:

Another neat little feature in SQL Server 2022 is Query Store Hints. This is the ability to apply a query hint through Query Store rather than having to modify existing code or fiddle around with plan guides.


Query hints are a way to influence optimizer behaviour towards generating desired execution plans for a given query. The word “hint” is a bit of a misnomer as usually they mandate what you wish to happen.

Right. They’re ‘hints’ in the way that my wife ‘hints’ that I should take out the garbage.

Leave a Comment

Where Extended Events Go by Default

Tom Zika is curious:

Have you ever wondered where the .xel file is saved when you create a new Extended Event session and don’t specify the full path (just the file name)?

Like so: [image removed because you should go to Tom’s site and see it, ed.]

Well, so did I and here’s what I’ve found out.

Click through to learn where these files end up if you don’t specify anything.

Leave a Comment

Controlling Cosmos DB Time to Live

Rahul Mehta pulls out the stopwatch:

As Microsoft states, Azure Cosmos DB “is a fully managed NoSQL database service for building scalable, high-performance applications”. Cosmos DB is widely used for storing NoSQL data with options to create using different Core (SQL), MongoDB, Cassandra, Table, and using gremlin.

With wide usage, the content storage also increases, sometimes even in Gigabytes a day. With such content storage, retention and archival of data are one of the common ask from the customer. Today, we are going to talk about how to retain data and remove unnecessary data periodically from Azure Cosmos DB. Before we do that, we need to understand a storage concept called “Container”

Read on to learn about containers, as well as the built-in way to garbage collect data.

Leave a Comment

Azure Synapse Analytics Updates for November 2022

Ryan Majidimehr has a bundle of updates for us:

We are always working to improve Azure Analytics Spark performance. We are making significant changes that will increase Spark performance by up to 77%.  

Based on our testing using the 1TB TPC-H industry benchmark, you’re likely to see up to 77% increased performance. While your workload may perform differently than the TPC-H benchmark, everyone is expected to see improved performance. These Spark performance improvements come from moving to the latest Azure v5 VMs which have improved CPU performance, increased temporary SSD throughput, and lastly higher remote storage IOPS.  

Click through for a whole bunch of updates.

Leave a Comment

Migrating Azure Analysis Services to Power BI Premium

Gilbert Quevauvilliers dumps AAS:

I thought it would be a good idea to walk through the steps when looking to migrate AAS to PBI.

In the past when I had to do this for clients it was a lot of manual steps and a lot of small things to get just right. This process is now seamless and awesome!

Reviewing Gilbert’s step-by-step process, yeah, this is easy, though watch out for the pitfalls Gilbert found.

Leave a Comment

Redshift Query Editor v2

Anusha Challa, et al, announce a new version of a Redshift query editor:

Amazon Redshift is a fast, fully managed, petabyte-scale cloud data warehouse. You have the flexibility to choose from provisioned and serverless compute modes. You can start loading and querying large datasets conveniently in Amazon Redshift using Amazon Redshift Query Editor v2, a web-based SQL client application.

It’s worth a try if you’re a Redshift user, though I’d imagine that frequent Redshift users have already sorted out their IDEs of choice.

Leave a Comment

Percentiles in KQL

Robert Cain continues a series on KQL:

Often we want to get data that is relative to other data. For example, we want a list of computers that have free space that is greater than the free space of other computers. We need to set a threshold, for example we want to return results where the free space is greater than 95% of the free space on other computers.

To do this, Kusto provides the percentile operator, along with its variants percentiles and percentiles_array.

Read on to see how it works. I do like the way that KQL handles percentile operations.

Leave a Comment

Text Concatenation and Dataset Refresh Errors

Chris Webb follows up on a prior post:

One of the most popular topics I’ve ever written about here is the dataset refresh errors you get with dynamic data sources and the Web.Contents function in Power BI (see here and here for example). Recently I learned about an interesting new detail regarding this problem that I thought was worth blogging about, and which may help someone trying to troubleshoot it.

Read on for the scenario, as well as what you can do to avoid an issue.

Leave a Comment