Press "Enter" to skip to content

Month: April 2021

Plotting XGBoost Trees with R

Andrew Treadway shows off a method to visualize the results of training an XGBoost model:

In this post, we’re going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine learning algorithm, which is frequently used in Kaggle competitions and has many practical use cases.

Let’s start by loading the packages we’ll need. Note that plotting XGBoost trees requires the DiagrammeR package to be installed, so even if you have xgboost installed already, you’ll need to make sure you have DiagrammeR also.

Click through for the process. H/T R-Bloggers.

Comments closed

Validating IP Addresses using Powershell

Joey D’Antoni needs a good IP address:

I had a client ask me to write a PowerShell script (for an Azure Automation runbook) to automate the creation of firewall rules for an Azure SQL Database. As part of this process, I was planning on having to validate the IP addresses (a valid IP address is of the format x.x.x.x where x is an integer between 0 and 255) by having to write some code to split the IP address string into individual octets and figuring out if each value was a valid number. However, in doing research (searching Stack Overflow), I discovered PowerShell has a built-in IP address data type, which made my life way easier.

Read on to see how you can use this to your advantage.

Comments closed

Just Take My Money: Paying for Power BI Premium per User

Wolfgang Strasser shows how hard it can be to let someone take your money in return for goods or services:

Initially I told my customers, purchase the PPU license in the Microsoft 365 portal and thought that it should be an easy deal.. but hey – it’s licensing, Microsoft licensing involved…

The place where to buy the PPU add-on is not that obvious as it looks like.

Read on to learn how to upgrade to Premium per User if you already have a Professional license.

Comments closed

User-Defined Roles in SQL Server

Greg Larsen shows us how to create and use user-defined security roles in SQL Server:

When developing an application that accesses SQL Server, you need to set up a security model, so each user gets the access they need based on their job duties. Some applications require the same access for all users, while other application might require different security access rights based on the role a user has in the organization. The best practice for providing user access is to use role-based security. SQL Server provides user-defined Server and Database roles for defining security rules for different user security needs. This article will discuss how to use user-defined server and database roles to provide secure access to SQL Server resources.

This is something I think DBAs tend to under-utilize. Granted, a big part of why I think DBAs under-utilize it is that we often have Windows groups that we can use as the base for roles, but even so, it’s a good way to secure assets in the database.

Comments closed

Working with High Virtual Log Files

Chad Callihan explains the notion of Virtual Log Files and has a process to handle them when they multiply like rabbits:

Today, I want to go over what Virtual Log Files are and how to handle them if you have too many in your databases.

A SQL Server log file is made up of smaller files called Virtual Log Files (VLFs). As the log file grows, so will the count of VLFs. I haven’t seen or heard of a calculation that can be worked out to determine how many VLFs you should have or how many is too many for a database. I’ve heard that you shouldn’t have more than a few hundred. I’ve also heard to not worry about VLFs until you break 1000. If you check your databases and have thousands in a database, I would say it’s best to get that count lowered whether you’re seeing issues yet or not.

Read on to see how.

Comments closed

Random Sequences and Probabilities

Holger von Jouanne-Diedrich explains the results of a poll:

Some time ago I conducted a poll on LinkedIn that quickly went viral. I asked which of three different coin tossing sequences were more likely and I received exactly 1,592 votes! Nearly 48,000 people viewed it and more than 80 comments are under the post (you need a LinkedIn account to fully see it here: LinkedIn Coin Tossing Poll).

In this post I will give the solution with some background explanation, so read on!

Read on to understand why it’s just as likely that you’ll see a sequence, when flipping a coin, of H,H,H,H,H,H just as often as you’ll see H,T,H,T,H,T.

Comments closed

Troubleshooting Code Performance in R

Mira Celine Klein shows how to benchmark R code performance:

Let’s assume you have written some code, it’s working, it computes the results you need, but it is really slow. If you don’t want to get slowed down in your work, you have no other choice than improving the code’s performance. But how to start? The best approach is to find out where to start optimizing.

It is not always obvious which part of the code makes it so slow, or which of multiple alternatives is fastest. There is the risk to spending a lot of time optimizing the wrong part of the code. Fortunately, there are ways to systematically test how long a computation takes. An easy way is the function system.time. Just wrap your code into this function, and you will (in addition to the actual results of that code) get the time your code took to run.

But that’s not the only route—read on to learn about other techniques as well and see them in action.

Comments closed

Gaps and Islands in Dates

Aaron Bertrand shows off a great use for calendar tables in gap and island style queries:

In my previous article I revisited the concept of a calendar table, and explained some ways to use this data for business date calculations. This time, I wanted to explore how you can use the calendar table to simplify generating date ranges, and some query challenges this can help you simplify.

Click through for examples of the sorts of gap and island problems you can solve fairly easily with a calendar table. For an even simpler example, many BI reports want to see days even where there is no data, and a calendar table gives you that capability.

Comments closed