Press "Enter" to skip to content

Month: March 2020

Using the Tune Package in R for Hyperparamter Optimization

Abderrahim Lyoubi-Idrissi takes us through a Bayesian approach to tune hyperparameters:

In contrast to the model parameters, which are discovered by the learning algorithm of the ML model, the so called Hyperparameter(HP) are not learned during the modeling process, but specified prior to training.

Hyperparameter tuning is the task of finding optimal hyperparameter(s) for a learning algorithm for a specific data set and at the end of the day to improve the model performance.

Abderrahim contrasts two different methods here: Grid Search and Bayesian Optimization. Definitely an interesting read if you develop data science models.

Comments closed

Removing Old Backups from Azure Blob Storage

Niko Neugebauer has some advice on how to clean up backups which live in Azure Blob Storage:

Continuing the topic of the Backups to Azure Blob Storage that I have kind of kicked off with the post Striping Backups to Azure Blob Storage, I want to touch on the important aspect of “keeping it clean” – thus deleting the old backups.
On the regular Windows Server this is a rather easy task, and if you are using a standard maintenance solution, such as Ola Hallengren’s Maintenance Solution or any other one. You can also use the regular SSMS maintenance (*cough* for whatever reason that is unknown to me, that you might wish to *cough*), or you can easily set up a regular Windows Scheduler with Command Line Batch or Powershell or whatever tool/script/language you like.

The situation is quite different with the Backup To URL functionality, the one that is available since more than 6 years (and the good old SQL Server 2012 has even got a support for it in a certain Cumulative Update – SQL Server 2012 Service Pack 1 CU 2, to be more prices)

Niko goes through five different methods you can use, so check it out.

Comments closed

Avoid Default String Lengths

Kenneth Fisher warns us against creating strings without specifying a length:

Every now and again I see someone get lazy and declare a string (CHARNCHARVARCHAR and NVARCHAR) without specifically declaring what the length is going to be. This can lead to some interesting problems. First of all it’s usually going to be a length of one.

I’d take this one step further and say avoid creating strings without specifying a length in products like ADO.NET as well—there, you won’t get the single-character length Kenneth mentions here, but you do get one plan per character length passed in, which does a great job of bloating the plan cache.

Comments closed

Using T-SQL to Generate a Login Migration Script

Thomas Rushton shows us how to migrate logins from one server to another using just T-SQL:

There are three things to look at:

1. Creating Logins from Windows accounts
2. Creating logins that are authenticated by SQL Server
3. Assigning membership of the appropriate server roles

Note that we’re just creating a snapshot of what’s there – we’re not aiming to keep these things in true synchronisation between servers. Note also that this script will only create accounts that don’t exist. If there’s an account already there with the same name, the script generated will not recreate it with new options.

We get the final script as well as a nice walkthrough of each component.

Comments closed

Using DAX’s GroupBy Function

Reza Rad takes us through the GroupBy function in DAX:

There are many different ways you can create aggregations in Power BI, You can do it in the source (using the database t-SQL language), or using Group By operation in Power Query. You can also do it in DAX using some functions. One of the functions that can be used for grouping and aggregation is Group By. This article is about how to use Group By in DAX. Creating aggregation using DAX is a very useful skill because you can use it to create virtual tables in your measures and have better dynamic calculations in Power BI.

Click through for the explanation and an example.

Comments closed

Finding Below-Average Sales Per Hierarchy Level in Power BI

Soheil Bakhshi has an interesting problem to solve:

Now, the requirement is to show the above behaviour in a “Column Chart” (yes! visualising time series with column chart, that’s what the customer wants) and highlight the columns with values below average amount in Orange and leave the rest in default theme colour.

So, I need to create Measures to conditionally format the column chart. I also need to add a bit of intelligent in the measures to:

– Detect which hierarchy level I am in
– Calculate the average of sales for that particular hierarchy level
– Change the colour of the columns that are below the average amount

Let’s get it done!

Read on to see how you can do exactly this.

Comments closed

All About Line Graphs

Mike Cisneros takes us through one of the most useful visuals out there:

A typical line graph will have continuous data along both the vertical (y-axis) and horizontal (x-axis) dimensions. The y-axis usually shows the value of whatever variable we are measuring; the x-axis is most often used to show when we measured it, either chronologically or based on some independent variable (e.g., as we rev our old car’s engine, we measure the decibel level at different RPM). 

While some line graphs do not use continuous data on the x-axis (particularly slopegraphs and parallel coordinates diagrams, which are specialized variants of line graphs), what we absolutely can’t use on our x-axis is data that doesn’t have any meaningful relationship among the categories shown. 

Read on for a lot of good information on a workhorse visual.

Comments closed

Schema Management for Spark Applications

Walaa Eldin Moustafa takes us through some of the things that LinkedIn has learned about schema management with Apache Spark:

At LinkedIn, the Hive Metastore is the source of truth catalog for all Hadoop data. The Hive Metastore is managed by Dali. Dali is a data access and processing platform that is integrated to compute engines and ETL pipelines at LinkedIn to ensure consistency and uniformity in the access and storage of data. Dali utilizes the Hive Metastore to store data formats, data locations, partition information, and table information. Among other features, Dali also manages the definition of SQL views, as well as storing and accessing those definitions from the Hive Metastore.

Read on for a good explanation of the how as well as the why.

Comments closed

Optimistic Concurrency in Cosmos DB

Hasan Savran takes us through optimistic concurrency with Cosmos DB:

To handle this problem, usually developers use a column like LastUpdateDt. You bring this column to Frontend and post it back to database with updated model. If LastUpdateDt hasn’t changed updates goes into database. If LastUpdateDt is changed, that means somebody else updated this model and your code rejects changes.

     In this post, I will try to answer questions like “ How do we do this in Azure Cosmos DB?”“Do I need to do all that logic manually by using Cosmos DB SDK? “ I will use Cosmos DB’s REST API to demo how Cosmos DB handles Optimistic Concurrency Control automatically. If you have experience developing any REST API, you might be familiar with headers like If-Match or If-None-Match. These HTTP headers controls what should be updated or not. Also, you can use them for caching too since they check which item is updated or not, you may need to cache items until they are changed.

Hasan has a demo for us as well, so check it out.

Comments closed