Press "Enter" to skip to content

Author: Kevin Feasel

Testing Stock Market Efficiency with Compression Algorithms

Holger von Jouanne-Diedrich has a clever test:

One of the most fiercely fought debates in quantitative finance is whether the stock market (or financial markets in general) is (are) efficient, i.e. whether you can find patterns in them that can be profitably used.

If you want to learn about an ingenious method (that is already present in anyone’s computer) to approach that question, read on!

As soon as I saw the post, my Eugene Fama senses were tingling. The results are not surprising (at least, to anyone who got my reference in the prior sentence), but I did enjoy the rather clever approach to the question.

Comments closed

Environments in Azure ML

Luis Valencia explains what environments are in Azure ML:

An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments, including in data preparation, training, and deployment to a web service. An Environment is managed and versioned in an Azure Machine Learning Workspace. You can update an existing environment and retrieve a version to reuse. Environments are exclusive to the workspace they are created in and can’t be used across different workspaces.

In basic terms for a developer, it’s basically a Docker Image with all the needed dependencies (conda/pip packages) to run your experiment.

A friendly word of advice from some bad experiences: stick with the curated environments as much as you can. Those are easy and rarely fail. Building your own environments from Conda files is a possibility, but it’s an, err, probabilistic exercise as to whether your compute target will actually work or not.

Comments closed

Tools and Tips for Accessibility

Daron Yöndem shares insights:

Last week, as a new employee, I went through Microsoft’s internal employee learning portal and found the Accessibility 101 online course. To my surprise, the course did have a good amount of practical information and connected the concept of accessibility nicely to inclusion and diversity. In this post, I want to share a couple of the practical steps to help you step up your accessibility game. If you are where I was, I’m sure you will love these.

Click through for some easy ways to improve presentations and webpages. Most of this is a few minutes’ worth of effort but can pay dividends. On a side note, congrats to Daron for the Microsoft gig. I enjoyed working with him in the past and know he’ll do great there.

Comments closed

Pipelined Functions in Powershell

Robert Cain continues a series on functions in Powershell:

In my previous post, I covered the use of PowerShell Advanced Functions. I highly suggest you read it if you haven’t, it provides some foundational knowledge that will be important to understand for this post.

In this post, we’ll see how to pipeline enable your functions. Just like a cmdlet, you’ll be able to take input from the pipeline, work with it, then send it out your function back into the pipeline.

Making your code pipeline-friendly is especially important if you want others to use your functions, as that’s one of the biggest benefits of Powershell as a language.

Comments closed

Measure Filters in Power BI

Marco Russo and Alberto Ferrari dive into a topic:

The first paragraph of this article needs to be a warning: the article itself is here for DAX and Power BI enthusiasts only. We are going to show a report that does not work, and then we explore how to fix the problem by performing a deep analysis of the queries generated by Power BI, finding the problem, and finally fixing it. The article contains a lot of references to advanced DAX concepts and the final solution is NOT a best practice. The value of the article is not in the specific solution. Rather, the important part is that a deep understanding of DAX and Power BI can help you obtain the right results, specifically when you have the feeling that you are faced with a bug because Power BI is acting strange. If you do not like DAX before reading this article, you will like it even less at the end. But if you love DAX, then chances are you will really enjoy the reading, even though it requires quite a lot of brain bandwidth. For sure, it took all of mine when I first encountered this behavior.

Break out the propeller hats before you dive in.

Comments closed

Scaling Hadoop Beyond 10,000 Nodes

Keqiu Hu, et al, take us through a problem of scale:

At LinkedIn, we use Hadoop as our backbone for big data analytics and machine learning. With an exponentially growing data volume, and the company heavily investing in machine learning and data science, we have been doubling our cluster size year over year to match the compute workload growth. Our largest cluster now has ~10,000 nodes, one of the largest (if not the largest) Hadoop clusters on the planet. Scaling Hadoop YARN has emerged as one of the most challenging tasks for our infrastructure over the years.

In this blog post, we will first discuss the YARN cluster slowdowns we observed as we approached 10,000 nodes and the fixes we developed for these slowdowns. Then, we will share the ways we proactively monitored for future performance degradations, including a now open-sourced tool we wrote called DynoYARN, which reliably forecasts performance for YARN clusters of arbitrary size. Finally, we will describe Robin, an in-house service which enables us to horizontally scale our clusters beyond 10,000 nodes.

Read on to learn about the problems they experienced and how they resolved them.

Comments closed

Order, Sort, Cluster, and Distribute in Hive

The Hadoop in Real World team give us three methods (and one synonym) to organize results in Hive:

Hive provides 3 options to order or sort the result of records – order by, sort by, cluster by and distribute by. Which option you choose has performance implications. So it is important to understand the difference between the options and choose the right one for the use case at hand.

Click through for a high-level overview of the techniques.

Comments closed

Creating and Using Variables in DAX

Jeet Kainth takes us through the process of working with variables in DAX:

Variables can simplify your DAX code, help with debugging and help improve performance. To use variables in your DAX measure you need to declare a variable using the VAR keyword, give the variable a name, and then assign an expression to the variable. You can use multiple variables in a measure but when using variables you must use a RETURN statement to specify the final output to be returned.

Read on for a demonstration, as well as several examples of how you can use variables to make your DAX-writing life easier.

Comments closed