Logistic Regression Defaults and sklearn

Giovanni Lanzani shares some thoughts on scikit-learn defaults for Logistic Regression:

If you read the post, you can see that the biggest problem with the choice is that, unless your data is regularized, you will train a model that probably under performs: you are unnecessarily penalizing it by making it learn less than what it could from the data.

The second problem with the default behavior of LogisticRegression is about choosing a regularization constant that is — in effect — a magic number (equal to 1.0). This hides the fact that the regularization constant should be tuned by hyperparameter search, and not set in advance without knowing how the data and problem looks like.

Knowledge is power. Also read the post Giovanni links to in order to learn more about the issue.

Azure Data Studio September Release

Alan Yu announces the September release of Azure Data Studio:

As we continue to bring over key features from SQL Server Management Studio, one highly requested feature was enabling SQL Server command line (SQLCMD) mode in our Query Editor. SQLCMD mode allows users to write and edit queries as SQLCMD scripts. In addition, users can also execute the SQLCMD scripts.

This feature is now possible in Azure Data Studio.

Looks like there were several good improvements this month.

The Value of Query Store

Erin Stellato has started a series on the benefits of Query Store:

The Query Store feature previewed in Azure SQL Database in summer 2015, was made generally available that fall, and was part of the SQL Server 2016 release the following summer.  Over the past four years (has it really been that long?!) I have devoted significant time to learning Query Store – not just understanding how it works from the bottom up, but also why it works the way it does.  I’ve also spent a lot of time sharing that information and helping customers understand and implement the feature, and then working with them to use the query store data to troubleshoot issues and stabilize performance.  During this time I have developed a deep appreciation for the Query Store feature, as its capabilities go far beyond its original marketing.  Is it perfect?  No.  But it’s a feature that Microsoft continues to invest in, and in this series of blog posts my aim is to help you understand why Query Store is a tool you need to leverage in your environment.

Read on for a high-level overview of how Query Store is useful.

Execution Plan Properties

Erik Darling reminds you to check out the properties of execution plan elements:

I read a lot of posts about query plans, and I rarely see people bring up the properties tab.

And I get it. The F4 button is right next to the F5 button. If you hit the wrong one, you might ruin everything.

But hear me out, dear reader. I care about you. I want your query plan reading experience to be better.

Erik provides sound advice here.

Troubleshooting AWS Database Migration Service Errors

Samir Behara takes us through troubleshooting AWS Database Migration Service issues:

For troubleshooting any issues with AWS DMS, it is necessary to have logs enabled. The DMS logs would typically give a better picture and helps find errors or warnings that would indicate the root cause of the failure. If the logs are not available there is nothing much you can do from a detailed troubleshooting analysis perspective. So basically next step is to turn on DMS logs and kick the job again and validate if the errors are captured in the logs.

If logs are not enabled, you need to set up a new task with logging enabled so if and when it errors out, you can take a look and troubleshoot the same.

I’ll save my full rant for another day, but I’m not that impressed with DMS. It could be a failing on my part, though.

Uploading to Blob Storage Archive Tier

Bub Pusateri has a helpful script for us:

Last year I wrote about how to upload data to Azure Blob Storage Archive Tier, and included a PowerShell script to do so. It’s something I use regularly, as I have hundreds of gigabytes of photos and videos safely (and cheaply!) stored in Azure Blob Storage using Archive Tier.

But read on for a recent announcement to make the process easier.

Shared Access Signatures

Arun Sirpal explains what an Azure Shared Access Signature is:

Using a Shared Access Signature (SAS) is usually the best way to control access rights to Azure storage resources (like a container for backups) without exposing the primary / secondary storage keys. It is based on a URI and this is what I want to look at today.

Read on to learn about the components which make up a Shared Access Signature.

Choosing the Right Words on Visuals

Elizabeth Ricks shares an example of where choosing the right words on a visual can make a huge difference:

I presumed the graph would depict cancellation rates for a set of products, with “Tier 2 with Promotion” at the top, representing the highest cancellation rate. When we get to the data, though, that’s not the case. Rather, the graph shows the inverse metric (retention rate) with Tier 2 + Promo as the bottom line with the lowest retention rate. Eventually I figured this out—but only because I spent time studying the data to make this determination! 

Click through for the initial visual as well as a couple of alternatives.

Python and R Data Reshaping

John Mount takes us through a couple of data shaping packages:

The advantages of data_algebra and cdata are:

– The user specifies their desired transform declaratively by example and in data. What one does is: work an example, and then write down what you want (we have a tutorial on this here).
– The transform systems can print what a transform is going to do. This makes reasoning about data transforms much easier.
– The transforms, as they themselves are written as data, can be easily shared between systems (such as R and Python).

Let’s re-work a small R cdata example, using the Python package data_algebra.

Click through for the example.

Dealing with Thousands of Databases

Andy Levy wraps up a Q&A series on dealing with thousands of databases:

When you started, did you know what your position was going to look like 1 month, 6 months, 1 year, 5 years from then? How accurate has that been so far?

I’ve only been at my current job for about 2 1/2 years, but I can speak to the shorter intervals. I’m going to be intentionally vague in spots here as I don’t want to disclose too much.

And if you’d like to hear Andy talk about migrating 8000 databases, Carlos Chacon and I interviewed Andy for the SQL Data Partners podcast.


September 2019
« Aug