Press "Enter" to skip to content

Month: October 2024

A Survey of Predictive Analytics Techniques

Akmal Chaudhri tries a bunch of things:

In this short article, we’ll explore loan approvals using a variety of tools and techniques. We’ll begin by analyzing loan data and applying Logistic Regression to predict loan outcomes. Building on this, we’ll integrate BERT for Natural Language Processing to enhance prediction accuracy. To interpret the predictions, we’ll use SHAP and LIME explanation frameworks, providing insights into feature importance and model behavior. Finally, we’ll explore the potential of Natural Language Processing through LangChain to automate loan predictions, using the power of conversational AI.

Click through for the notebook, as well as an overview of what the notebook includes. I don’t particularly like word clouds as the “solution” in the BERT example, though without real data to perform any sort of NLP, there’s not much you can meaningfully do.

Comments closed

Source Control Tips

Aamir Khan shares some tips on source control:

In software development, version control is an essential practice that helps manage changes to code and collaborates effectively with team members. A well-organized repository not only streamlines the development process but also enhances productivity and minimizes errors. In this blog post, we’ll explore best practices for maintaining a clean and organized repository, including branch naming conventions and crafting effective commit messages.

The primary audience for this is software developers, but if you create and modify SQL assets (tables, stored procedures, views, functions, etc.), source control is a great thing for many reasons, and this still applies to you. And if you write Powershell scripts because you’re “not a coder,” well, I have some shocking news for you.

Comments closed

Fixing Timeout Issues with Azure SQL Database

Reitse Eskens shares some knowledge:

The customer can connect to the Azure Sql database with Sql Server Management Studio (SSMS) but not with a specific client application.
When digging into the logs (all logs were activated for this database), nothing shows up for the specific login used by the client application. The application itself returns a connection error caused by a time-out.

The application resides outside of Azure and can’t use a VPN connection, the Azure Sql Server has a specific firewall rule to allow incoming traffic from this specific IP address. Not a situation I’m really happy with, but it happens.

Read on for the solution. It was not one I had anticipated. But it did land in my “When in doubt, blame the network” policy.

Comments closed

The Importance of Monitoring in Microsoft Fabric

Marc Lelijveld flips a switch but also watches it:

A long time ago, I blogged about Power BI governance with topics like feature implementation in a phased approach and why you should consider to disable export to Excel. In this blog, I want to continue the governance topic with another blog about why monitoring your tenant is important! This blog will also provide you an overview of the various monitoring options you have out of the box, no matter what your role is. No matter if you are the workspace-, capacity-, domain- or tenant administrator.

I encourage everyone, no matter if you are the service administrator or not, to go through this blog and look from various angles how monitoring can help. I think it can be relevant for any Fabric / Power BI user to see all capabilities it has to offer from a different angle and better understand possible restrictions that are set by your service administrator.

Read on for Marc’s argument, as well as plenty of examples of what you can do as far as monitoring goes.

Comments closed

Comparing Snowflake vs SQL Server E-Mail Configuration

Kevin Wilkie sends two e-mails:

Today, I want to talk about all the effort that goes into setting up the ability to email in SQL Server and Snowflake.

First is our old friend – SQL Server. I’ll leave this one to the experts at Microsoft. As has been the case over the last few years, they have some great documentation at Learn.Microsoft.com – especially when it comes to SQL Server.

I don’t know anything about sending e-mails via Snowflake (other than what Kevin mentions here), though I imagine a lot of the difference in complexity is that SQL Server allows arbitrary SMTP selection and requires an existing SMTP server.

Comments closed

RandomWalker 0.2.0 Release

Steven Sanderson makes an announcement:

In the ever-evolving landscape of R programming, packages continually refine their capabilities to meet the growing demands of data analysts and researchers. Today, we’re excited to announce the release of RandomWalker version 0.2.0, a minor update that brings significant enhancements to time series analysis and random walk simulations.

RandomWalker has been a go-to package for R users in finance, economics, and other fields dealing with time-dependent data. This latest release introduces new functions and improvements that promise to streamline workflows and provide deeper insights into time series data.

Read on to see what has changed.

Comments closed

Variable Types in Postman

Huyen Maithi talks variables:

Variables enable you to store and reuse values. Postman is a powerful API development tool that offers a feature known as environment variables. These variables help you work efficiently, collaborate with teammates in testing and development by allowing users to easily manage dynamic values across requests.

Click through for an overview of the types of variables you can create for Postman requests.

Comments closed

Dynamically Start a Collection of Child Pipelines in Fabric Data Factory

Andy Leonard continues a series on Microsoft Fabric Data Factory:

In this post, I modify the dynamic parent pipeline from the previous post to explore calling several child pipelines that may be called by a parent pipeline. In this post, we will:

  • Clone the child pipeline (twice)
  • Copy the cloned child pipeline id values
  • Clone the dynamic parent pipeline from the previous post
  • Add and configure a pipeline variable for an array of child pipeline ids
  • Add and configure a ForEach
    • Move the “Invoke Pipeline (Preview)” activity
    • Configure the “ForEach”
    • Configure the “Invoke Pipeline (Preview)” Activity to Use “ForEach” Items
  • Test the execution of a dynamic collection of child pipelines

Andy’s got quite a bit in this post, so check it out.

Comments closed

Prod Data in Dev

Brent Ozar looks at survey results:

No matter which way you slice it, about half are letting developers work with data straight outta production. We’re not masking personally identifiable data before the developers get access to it.

It was the same story about 5 years ago when I asked the same question, and back then, about 2/3 of the time, developers were using production data as-is:

Brent covers some of the challenges involved, and I can add one more: the idea of environments gets really squishy when talking about data science. My development model still needs production data (unless the dev data has the same structural attributes and data distributions as prod), and I don’t really want to train different models in dev/test/prod because, even with the same default data, many algorithms are stochastic in nature: if I run it multiple times, I can end up with different results. And even if I can get the same results by re-running and using a consistent seed, that also introduces a structural instability because I’m relying on a specific seed.

In short, I agree with Brent: this is a tough nut to crack.

Comments closed

The Power of Pre-Attentive Attributes

Elena Drakulevska is seeing pink elephants:

In a world packed with data, how do you make sure your key points don’t get lost in the noise?

Enter the Pink Elephant Principle—a concept that makes sure your most important elements stand out, like a big pink elephant in the middle of a room. It’s impossible to ignore, and that’s exactly what you want for the critical parts of your report!

The irony of this is that the historical term of seeing pink elephants is a person so drunk that he’s hallucinating. Humor of the term aside, Elena drives home a very important principle around ensuring you take advantage of pre-attentive attributes to ensure users see what’s important with the least cognitive effort.

Comments closed