Press "Enter" to skip to content

Month: December 2024

Comparing Positron to RStudio

Theo Roe performs a product comparison:

Positron is the new beta Data Science IDE from Posit. Though Posit have stressed that maintenance and development of RStudio will continue, I want to use this blog to explore if Positron is worth the switch. I’m coming at this from the R development side but there will of course be some nuances from other languages in use within Positron that require some thought.

Read on for Theo’s perspective. Knowing that it’s using the same underlying framework as Visual Studio Code, I kind of wish this were an extension for VS Code rather than a separate app.

1 Comment

Deployment Parameters in Azure AI Foundry

Tomaz Kastrun continues a series on Azure AI:

Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant’s personality, tell it what it should and shouldn’t answer, and tell it how to format responses. There’s no token limit for this section, but it will be included with every API call, so it counts against the overall token limit

Click through for a description of each part of the deployment parameters section.

Comments closed

Mounding ADF Instances in Microsoft Fabric

Koen Verbeeck has an existing Azure Data Factory:

We recently started using Microsoft Fabric for our cloud data platform. However, we already have quite an estate of Azure data services running in our company, including a huge number of Azure Data Factory (ADF) pipelines. It seems cumbersome to migrate all those pipelines to Microsoft Fabric, especially because some features are not supported yet and ADF is the mature choice at the moment. We like the concept of Microsoft Fabric’s centralization, where everything is managed in one platform. Is there an option to manage ADF in Fabric?

Read on for the answer, but make sure to check out its limitations as well.

Comments closed

The Power of Rounding

Denny Cherry makes a change:

So, I ran across a problem with QuickBooks that involves some of the most basic math that we were all taught in elementary school: how to round numbers properly. You’d think that a company that makes accounting and invoicing software for a living would understand how rounding of numbers works. But based on the last hour of having to edit the data that gets send to QuickBooks from our internal system you’d be wrong.

Denny’s example is $3.18497736 and rounding happens after four decimal spots, so it’s $3.1849 or $3.1850. Denny expects $3.1850 and QuickBooks gives $3.1849.

In this case, Denny’s right. The part that confuses people is banker’s rounding, which has you round to the even number if your last digit is a 5. For example, if it were $3.18495 and you round to four spots after the decimal, that would be $3.1850. With $3.18485, it would round to $3.1848.

.NET uses banker’s rounding by default, which can confuse people unfamiliar with the concept. SQL Server, meanwhile, rounds the way that Denny expects: 5 or higher rounds up, 0-4 rounds down.

Comments closed

Deployment in Azure AI Foundry

Tomaz Kastrun continues a series on Azure AI:

When you are in Azure AI Foundry, on the left navigation bar, select “Model Catalog”.

For this demo, I will be selecting multimodal model “gpt-4” that can work with images and text.

Click “> Deploy” and select the deployment type and also customize the deployment details.

Tomaz has some step-by-step instructions, a bit of detail on deployment types, and a bit of info on how to consume the results.

Comments closed

Against ODBC Functions

Andy Brownsword stakes a claim:

It’s a function, but not quite as we know it. They’re surrounded by { curly braces } and prefixed with fn. The kind of thing I’d expect to see in PowerShell.

It turns out they were introduced in ODBC versions 1-3, with the latter being based on the SQL-92 specification. These were introduced in 1995 and 1992 respectively, which likely explains why I don’t see them used.

I’m thinking hard and don’t believe I’ve ever seen anyone use these in code I’ve maintained. Thus, I agree with Andy: you probably don’t need them.

Comments closed

Alternatives to Long IN() Lists in SQL Server

David Fowler still has a list:

This post comes off the back of my last, where I looked at issues caused by explicitly declaring a large number of values in an IN clause. The query processor ran out of internal resources and could not produce a query plan – When You’ve Managed to Confuse SQL With a Crazy IN Clause.

My suggestion was to put the values into a temp table and carry out a semi-join.

That got me thinking, which of the two methods would perform better.

I’m going to look at the two methods for different amounts of values and have a look at how they get on in terms of reads and total time.

Read on for the answer. If I’m surprised by anything in it, it’s that the threshold where temp tables out-perform the IN() clause is so high.

Comments closed

Unique Constraint Violations Cause Bloat in PostgreSQL

Josef Machytka shows something annoying:

The issue of table and index bloat due to failed inserts on unique constraints is well known and has been discussed in various articles across the internet. However, these discussions sometimes lack a clear, practical example with measurements to illustrate the impact. And despite the familiarity of this issue, we still frequently see this design pattern—or rather, anti-pattern—in real-world applications. Developers often rely on unique constraints to prevent duplicate values from being inserted into tables. While this approach is straightforward, versatile, and generally considered effective, in PostgreSQL, inserts that fail due to unique constraint violations unfortunately always lead to table and index bloat. And on high-traffic systems, this unnecessary bloat can significantly increase disk I/O and the frequency of autovacuum runs. In this article, we aim to highlight this problem once again and provide a straightforward example with measurements to illustrate it. We suggest simple improvement that can help mitigate this issue and reduce autovacuum workload and disk I/O.

I’ll sometimes create unique constraints in SQL Server and set the flag to ignore duplicates, most often in queue tables or situations where I know whatever gets inserted will always be the same, but that multiple agents could act to insert a record and the calling code doesn’t have any sort of protection. The idea that this pattern can hurt you in PostgreSQL seems weird to me.

Comments closed

Using Stored Procedures in Power BI

Annamarie Van Wyk makes a call:

The aim of this blog is to take you through one example of executing SQL stored procedures in Power BI. Similar to SQL Server Reporting Services (SSRS) we can create parameters that will be used to return a result set based on the output parameters that were selected.  The parameter selection will be available on the Power BI home page. The data used in his example is production line data. The production line is inside a factory where products are being packaged during a certain shift and at a certain production run date. It is literally a factory line with products being packed.

Read on for a simple answer, followed by how to parameterize the procedure call.

Comments closed