Press "Enter" to skip to content

Author: Kevin Feasel

Automated ML Pipelines with SAS

Sophia Rowland shows off SAS’s auto-ML action:

The dsAutoMl action does it all. It will explore your data, generate features, select features, create models, and autotune the hyper-parameters of those models. This action includes the four policies we have seen in my first two blogs: explorationPolicy, screenPolicy, transformationPolicy, and selectionPolicy. Please review my previous blogs if you need a refresher on the data exploration and cleaning process or feature generation and selection process. The dsAutoMl action builds on our prior discussions through model generation and autotuning. A data scientist can choose to build several models such as decision trees, random forests, gradient boosting models, and neural networks. In addition, the data scientist can control which objective function to optimize for and the number of K-folds to use. The output of the dsAutoMl action includes information about the features generated, information on the model pipelines generated, and an analytic store file for generating the features with new data.

This is an area where several companies are investing a lot of money, trying to simplify the process of training models.

Comments closed

Managing SQL Server Documentation with JSON

Phil Factor gives us the gloop:

Metadata extract files are handy for documentation, study, cataloguing and change-tracking. This type of file supplements source because it can record configuration, permissions, dependencies and documentation much more clearly. It is a good way of making a start with documenting your database.

Here is a sample of a json metadata file (from AdventureWorks 2016). It was generated using GloopCollectionOfObjects.sql that is here in Github, and is being viewed in JSONBuddy. I use this format of JSON, a collection of documents representing SQL Server base objects (no parent objects) when I need to read the contents into MongoDB. The term ‘Gloop’ refers to a large query that, you’d have thought, would be better off as a procedure. Here is a typical sample of the output.

This is an interesting approach to documentation. I’m not totally buying into it, but that might just be due to my not having tried it.

Comments closed

Incremental Data Moves to Azure Blob Storage

Ginger Daniel continues a series on moving data incrementally from SQL Server to Azure Blob Storage:

In Part 1 of this series, we demonstrated how to copy a full SQL database table from a SQL Server database into an Azure Blob Storage account as a csv file.  My client needed data moved from their on premise SQL Server database to Azure, and then needed the daily incremental data changes uploaded as well.  This article will discuss how to upload the incremental data changes to Azure after the initial data load.

Click through for the process.

Comments closed

Eager Spooling Against Indexes

Erik Darling finds an eager spool even when there is a good index to use:

But he did write about Eager Index Spools recently, and the post ended with the following statement:

Eager index spools are often a sign that a useful permanent index is missing from the database schema.

I’d like to show you a case where you may see an Eager Index Spool even when you have the index being spooled.

Click through for Erik’s demonstration.

Comments closed

SQL Undercover Inspector v2

Adrian Buckman announces version 2.0 of Undercover Inspector:

There is a new setting in the Settings table called ‘ReportDataDetailedSummary’ this setting is on or off (0 or 1) and will control the level of detail logged in the summary column. When set to a 1 you will get granular detail of Warning/Advisory counts per server per module, setting this setting to 0 will return it back to the original way of logging which was to summarize the entire report into Warning count and advisory count.

There are a lot of changes in here.

Comments closed

The Difficulty of Tracking CPU Usage

Grant Fritchey digs into the difficulties of tracking CPU usage on machines:

There are a bunch of ways to look at processor usage. The simplest, and probably most common, is to use the Performance Monitor counters such as ‘% Processor Time’. Query this, you can get an average of the processor usage at a moment in time.

Ta-da! Fixed it. I thought you said this was hard Grant.

Spoilers: that didn’t fix it.

Comments closed

Testing In-Browser Power BI Report Performance

Chris Webb gives us some tips on testing Power BI reports in a web browser:

It turns out that testing performance of a report in the browser is not as straightforward as it seems. In this post I’m going to describe some of the factors you have to take into account when doing this type of testing; in the next post I’ll go into more detail about how you actually measure report rendering times in the browser and how to see what happens when the report is rendered.

Click through for those factors.

Comments closed

Changes to EC2 Metadata Service

Praveen Sripati takes a look at changes to the AWS EC2 Instance Metadata Service following attacks against Capital One and dozens of other organizations:

Captial One Bank (1) and 30 different organizations were hacked around end of July, I have written a blog (1) around the same time on how to recreate the hack in your own AWS account and also a few mitigations around the same. Now, AWS has made a few changes to the AWS EC2 Instance Metadata Service (IMDS) around the same (12). AWS re:Invent 2019 session (1) around the same has also been planned on December 5th, 2019. Will update with the link once the recording of the session has been uploaded.

The old/existing approach is called IMDSv1 and the new one IMDSv2. Although IMDSv1 solves a few problems like not storing the access keys on the EC2, it bought its own headaches which lead to the hacks.

Click through to see what these problems were and how they led to IMDSv2.

Comments closed

Using PowerPoint to Create Power BI Layouts

Jon Fletcher has a good tip for snazzing up a Power BI dashboard:

First question, why bother with layouts?
Using layouts in Power BI allows a user to make their visuals stand out better, the page looks professional and more appealing to its audience.

Second question, why PowerPoint?
The default page size in Power BI desktop is 16:9, (this trick doesn’t work for other Power BI page sizes), which is identical to a PowerPoint slide.
Therefore whatever is designed in PowerPoint will fit onto a Power BI page perfectly. Also PowerPoint is very easy to use; most people are familiar with it.

Click through for an example. It’s easy to go overboard with this, but Jon does a good job of using a muted color so that the edges don’t overwhelm your eyes. I might knock it down a shade or two further from that, but regardless, this is a nice tip.

Comments closed