Press "Enter" to skip to content

Day: March 13, 2018

XGBoost With Python

Fisseha Berhane looked at Extreme Gradient Boosting with R and now covers it in Python:

In both R and Python, the default base learners are trees (gbtree) but we can also specify gblinear for linear models and dart for both classification and regression problems.
In this post, I will optimize only three of the parameters shown above and you can try optimizing the other parameters. You can see the list of parameters and their details from the website.

It’s hard to overstate just how valuable XGBoost is as an algorithm.

Comments closed

Domain, Range, And Codomain

Kevin Sookocheff explains the concepts of domain, range, and codomain:

That is, a function relates an input to an output. But, not all input values have to work, and not all output values. For example, you can imagine a function that only works for positive numbers, or a function that only returns natural numbers. To more clearly specify the types and values of a functions input and output, we use the terms domain, range, and codomain.

Speaking as simply as possible, we can define what can go into a function, and what can come out:

  • domain: what can go into a function

  • codomain: what may possibly come out of a function

  • range: what actually comes out of a function

Read on for more, including a couple of examples.  These are important concepts for learning functional programming.

Comments closed

Experimenting With The Data Professional Salary Survey

Mala Mahadevan investigates a potential correlation in the data professional salary survey:

The questions I was looking at are as below:
1 Is there any correlation between experience and number of hours worked?
2 Is there any correlation between experience and job duties/kinds of tasks performed?
3 Is there any correlation between experience and managing staff – ie – do more people with experience take to management as a form of progress?

I am using this blog post to explore question 1.

Click through to see if there is a correlation between experience and hours worked.  One critique I have is that years of experience is not normally distributed:  there’s a hard cutoff at 0, so although the possible range does follow what a hypothetical normal distribution would do (and it doesn’t really affect the analysis Mala did), that difference can be important in other analyses.

Comments closed

Linking Azure VMs To An On-Prem Domain

Denny Cherry explains how to integrate Azure VMs with your existing Active Directory domain:

The first step is to put some domain controllers in Azure.  To do this, you’ll need a site to site VPN between Azure and your on-premises environment.  If you have multiple on-premises sites, then you’ll want to create a VPN between Azure and all your on-premises environments.  If your Azure environment is hosted in multiple regions, then you’ll want to create a mesh network when each on-premises site in VPNed into all of your vNets.  You’ll probably also want your vNets VPNed to each other (Peering of your networks between sites may be an option as well depending on how you’ve set things up).  If you have an extremely large number of users at your site, then Express Route might be something worth looking into instead of a site to site VPN.

Click through for the full process.

Comments closed

Automatically Restarting Telegraf On Windows

Tracy Boggiano has a quick Powershell script to try starting Telegraf until it succeeds:

I’ve noticed on demo machines that sometimes Telegraf doesn’t start on the first try, and this seems to not happen on most of my production servers, but they have a lot more memory and CPU power. So I figured I would write a quick blog post and provide a way to set up a way to get the service to start when the machine is rebooted. This is a known issue that a user has offered a bounty to get it fixed so if you know some Go and have time, please check out the issue on Github.

Click through for the script.

Comments closed

Execution Plans And GDPR

Grant Fritchey isn’t crazy when it comes to execution plans:

Now, when you save an execution plan out to a file, you’re potentially transmitting PI data. It goes further. When you hard code values, PI is not just in the query. Those PI values can also be stored throughout the plan in various properties.

So now you see what I mean when I say that the GDPR affects how we deal with execution plans. I’m not done yet.

Unfortunately, questions like the one Grant raises here won’t be answered until we see a few test cases in the European courts.

Comments closed

Calling Azure Cognitive Services From SSIS

Rolf Tesmer shows off how easy it is to call Azure Cognitive Services from SQL Server Integration Services:

My SQL SSIS package leverages the Translator Text API service.  For those who want to learn the secret sauce then I suggest to check here – https://azure.microsoft.com/en-us/services/cognitive-services/translator-text-api/

essentially this API is pretty simple;

  1. It accepts source textsource language and target language.  (The API can translate to/from over 60 different languages.)

  2. You call the API with your request parameters + API Key

  3. The API will respond with the language translation of the source text you sent in

  4. So Simple, so fast, so effective!

Click through for the full post.  It really is simple.

Comments closed

Anticipating Disk Growth

Adrian Buckman has a script which gives you an idea of what would happen if your databases all grew by some factor overnight:

The other day I got thinking about what would happen if all databases on a single instance grew out, every single one of them! but not just once, what if they all grew out three, four or fives times overnight – what would things look like?

Well I know the likelihood may be slim but wouldn’t it be nice just to see how many times things could grow before it all runs out of space.

I decided for a bit of fun I would write a query to see what the drive space would look like, this would simulate database growth and then show what drive space would be left after the total growths specified.

It’s a good idea to anticipate this kind of activity, though based on the companies I’ve worked for in the past, the answer would be “run out of disk really fast.”

Comments closed

VS Code And Splatting Powershell

Rob Sewell shows off how easily to splat Powershell parameters with Visual Studio Code:

Well you will know that when you call a PowerShell function you can use intellisense to get the parameters and sometimes the parameter values as well. This can leave you with a command that looks like this on the screen

It goes on and on and on and while it is easy to type once, it is not so easy to see which values have been chosen. It is also not so easy to change the values.
By Splatting the parameters it makes it much easier to read and also to alter.

To learn more about splatting, there’s a whole section in Powershell help on the topic.

Comments closed