Press "Enter" to skip to content

Day: September 11, 2019

R User Salaries By Country

Capri Granville shares a chart showing a box plot of salaries for professional R users by country:

Interesting analysis done in R, about salaries of R developers broken down by country, featuring salary range and median salary. 

The dataset consists of survey answers from nearly 90,000 respondents. About 5,000 of them reported using R for “extensive development work over the past year”. The first filter used reduces the dataset from 88,883 respondents to 5,048. The second filter excludes students, hobby programmers and former developers. This reduces the dataset to 4,047 respondents. The third filter excludes unemployed and retired respondents and the dataset is further reduced to 3,871 respondents. Finally, we exclude respondents from an unknown country and respondents with unknown or zero salary.

Check out Tomaz Weiss’s detailed post which dives into these numbers for United States respondents.

Comments closed

Logistic Regression Defaults and sklearn

Giovanni Lanzani shares some thoughts on scikit-learn defaults for Logistic Regression:

If you read the post, you can see that the biggest problem with the choice is that, unless your data is regularized, you will train a model that probably under performs: you are unnecessarily penalizing it by making it learn less than what it could from the data.

The second problem with the default behavior of LogisticRegression is about choosing a regularization constant that is — in effect — a magic number (equal to 1.0). This hides the fact that the regularization constant should be tuned by hyperparameter search, and not set in advance without knowing how the data and problem looks like.

Knowledge is power. Also read the post Giovanni links to in order to learn more about the issue.

Comments closed

Azure Data Studio September Release

Alan Yu announces the September release of Azure Data Studio:

As we continue to bring over key features from SQL Server Management Studio, one highly requested feature was enabling SQL Server command line (SQLCMD) mode in our Query Editor. SQLCMD mode allows users to write and edit queries as SQLCMD scripts. In addition, users can also execute the SQLCMD scripts.

This feature is now possible in Azure Data Studio.

Looks like there were several good improvements this month.

Comments closed

The Value of Query Store

Erin Stellato has started a series on the benefits of Query Store:

The Query Store feature previewed in Azure SQL Database in summer 2015, was made generally available that fall, and was part of the SQL Server 2016 release the following summer.  Over the past four years (has it really been that long?!) I have devoted significant time to learning Query Store – not just understanding how it works from the bottom up, but also why it works the way it does.  I’ve also spent a lot of time sharing that information and helping customers understand and implement the feature, and then working with them to use the query store data to troubleshoot issues and stabilize performance.  During this time I have developed a deep appreciation for the Query Store feature, as its capabilities go far beyond its original marketing.  Is it perfect?  No.  But it’s a feature that Microsoft continues to invest in, and in this series of blog posts my aim is to help you understand why Query Store is a tool you need to leverage in your environment.

Read on for a high-level overview of how Query Store is useful.

Comments closed

Troubleshooting AWS Database Migration Service Errors

Samir Behara takes us through troubleshooting AWS Database Migration Service issues:

For troubleshooting any issues with AWS DMS, it is necessary to have logs enabled. The DMS logs would typically give a better picture and helps find errors or warnings that would indicate the root cause of the failure. If the logs are not available there is nothing much you can do from a detailed troubleshooting analysis perspective. So basically next step is to turn on DMS logs and kick the job again and validate if the errors are captured in the logs.

If logs are not enabled, you need to set up a new task with logging enabled so if and when it errors out, you can take a look and troubleshoot the same.

I’ll save my full rant for another day, but I’m not that impressed with DMS. It could be a failing on my part, though.

Comments closed