Press "Enter" to skip to content

Day: December 12, 2023

Stepwise and Piecewise Regression in R

Steven Sanderson takes us through two regression techniques. First up is stepwise regression:

Stepwise regression is a powerful technique used to build predictive models by iteratively adding or removing variables based on statistical criteria. In R, this can be achieved using functions like step() or manually with forward and backward selection.

Piecewise regression follows:

Piecewise regression is a powerful technique that allows us to model distinct segments of a dataset with different linear relationships. It’s like fitting multiple straight lines to capture the nuances of different regions in your data. So, grab your virtual lab coat, and let’s get started.

Read on for explanations of both techniques, as well as some visuals and potential pitfalls you might run into along the way.

Comments closed

Microsoft Fabric SQL Endpoints and REST API

Tomaz Kastrun continues a series on Microsoft Fabric. Day 6 covers the SQL Analytics endpoint:

SQL Analytics endpoint in lakehouse is a SQL-based experience for lakehouse delta tables. By using standard T-SQL language, you can write queries to analyze data in delta tables, create functions, procedures, views and even apply security over the objects. There are some of the functionalities missing from your standard T-SQL language, but the experience is the same.

Besides the SQL experience, you can always use the corresponding items in the workspace view of Lakehouse explorer, use SQL in notebooks, or simply use SQL analytics endpoint

Day 7 looks at what subset of T-SQL syntax you can use against SQL Analytics endpoints:

You get the gist, and there are some other limitations; computed columns, indexed views, any kind of indexes, partitioned tables, triggers, user-defined types, sparse columns, surrogate keys, temporary tables and many more. Essentially, all the commands that are not supported in distributed processing mode.

The most biggest annoyance (!) is case sensitivity! Ughh.. This proves that the SQL operates like API on top of delta tables, which is translated either into PySpark commands or not directly to Spark since Spark is not case-sensitive. So, the first one will work and the second statement will be gracefully terminated.

Day 8 covers the Lakehouse REST API:

Now that we explored the lakehouse through the interface and workspaces, let’s check today, how can we use REST API. Microsoft Fabric Rest API defines a unified endpoint for operations.

Comments closed

Advent of Code Day 3

Kevin Wilkie lives the struggle. Check out part one:

Then we do something very similar to what we did in Walking Through the Advent of Code Day 2 – find the digits, and the remaining text, and we’re home free!

Notice, though, that the position we grab is found at the very end. We just find the string in the line and show where it is. Except, that doesn’t always work.

And then there’s part 2:

Hopefully, by now, you’ve read my post Walking Through the Advent of Code Day 3. After looking through it, you’re probably thinking… “What have I gotten myself into? Is SQL the way to go with all of this? Can I back out now and learn a programming language that AoC can be done in “easily”?”

Click through to experience the pain or at least have a little bit of Schadenfreude in your life.

Comments closed

Error Handling in T-SQL Stored Procedures

Erik Darling intimates that some of our code might occasionally have errors or might experience circumstances in which not everything is in perfect alignment:

Okay, look, the best post about this is, of course, by Erland Sommarskog: Error and Transaction Handling in SQL Server

Just like Erland, it has three parts and three appendices. If you want to learn how to do a whole lot of things right, give yourself a couple days to read through and digest all that.

What I’m here to talk about is some of the stuff you should think about before implementing any error handling at all.

I agree with most of Erik’s opinion here. My very mild disagreement is that I’ll still protect against things like invalid parameters or logic errors (start date before end date) in the stored procedure. I do that for three reasons:

  • Defense in depth isn’t just a security principle–it’s also a code practices principle.
  • The app gets things wrong, too. Sometimes, the app dev accidentally sends parameters in the wrong order, and it’s better to get an error early on in development versus thinking everything works because the procedure called successfully and ship it.
  • Even if “the” app correctly handles inputs, there’s always a chance some other app or process will call this stored procedure and it might not have the same error handling code built in.
Comments closed

Security in Microsoft Fabric

Alex Lisboa-Wright talks security:

In Fabric, the basic logical structure for data services is the Workspace, in which users can create Items, which are the various resources that perform all the data operations available in Fabric, such as Lakehouses, pipelines, machine learning models and so on. Each workspace is a self-contained data storage and development environment, whose user access is controlled by both workspace admins and member users. User access controls include options to manage users’ workspace roles, which determine the permissions assigned to each user. Security permissions can be managed on the workspace and item levels in the Fabric UI. MEID authentication can also be employed within Fabric, as connecting Fabric items to other Azure resources requires MEID. MEID’s Conditional Access feature can also be configured for use in Fabric (see this documentation for best practice for Fabric resources linking to other Azure services).

Read on to learn more. Fabric is a broad set of tools and technologies, so security is both important and definitely non-trivial, even when you consider that it is a software-as-a-service offering and therefore doesn’t have much going on with user-facing networking or infrastructure security.

Comments closed