Press "Enter" to skip to content

Day: November 7, 2019

Troubleshooting Tez Performance

Dmitry Tolpeko digs through Tez logs to figure out a performance issue:

Why did it take so long to run the job? Is there any way to improve its performance?

Tez Application Master Log
I am going to use the Tez AM log to investigate vertex performance and find possible bottlenecks.
Note that there is the Timeline Server REST API that you can use to get the statistics for Tez jobs, but the application master log is “event-driven”, shows the exact order of all events and contains much more details in general.

Click through for the process.

Comments closed

Preventing Overfitting in ML Models

Tom Jordan gives us four techniques to reduce the likelihood of overfitting in our models:

Dropout
This technique is exclusively used within the training of neural networks, so isn’t applicable to all machine learning models, however can be used in the production of extremely effective neural network models. During the start of each step in the training process, each sub unit of the model, the neuron, has a probability of being included in that step or not. If it doesn’t make the cut, it is effectively deleted from the network for that step, and then reintroduced on the next step.

There are some good techniques here.

Comments closed

Troubleshooting Deadlocks using Extended Events

Jamie Wick helps us figure out what’s causing deadlocks:

Recently I started getting random alerts that a job on one of the SQL servers was failing because of a deadlock problem.

The source of the problem wasn’t immediately discernible as there wasn’t any pattern to when the job was failing. Troubleshooting was further complicated by the database being written/maintained by a 3rd party vendor that encrypts all of their stored procedures.

So… How to find out what was causing the deadlock?

Extended Events are an ideal solution for this situation.

Read on to learn how.

Comments closed

Fun with Markdown in Azure Data Studio

Dave Bland takes us through some of the formatting options available in Azure Data Studio notebooks:

When working in a Notebook you have two types of cells, text and code.  The focus of this post is how to format the text cell.  Of course text goes into this cell so that part is easy and of course the text can say anything you would like to say.  When we work with text in Word, there is a format tool bar that we can use to make it look like we want it.  The text cells do not have this toolbar.

You might be asking, without the format toolbar, does that mean we can’t format the text?  That answer is no….we can still format the text, we just need to do it slightly different.  Rather than use a toolbar, we need to use characters.

There’s a lot of power in Markdown.

Comments closed