Press "Enter" to skip to content

Day: April 21, 2021

EMR Studio Now Generally Availabile

Shuang Li announces that Amazon EMR Studio is now in GA:

EMR Studio provides fully managed Jupyter notebooks, and tools like Spark UI and YARN Timeline Service to simplify debugging. EMR Studio uses AWS Single Sign-On and allows you to log in directly with your corporate credentials without signing in to the AWS Management Console. You can install custom kernels and libraries, collaborate with peers using code repositories such as GitHub and Bitbucket, and run parameterized notebooks as part of scheduled workflows using orchestration services like Apache Airflow and Amazon Managed Workflows for Apache Airflow (Amazon MWAA).

With EMR Studio, you can run notebook code on Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS), and take advantage of the performance-optimized EMR runtime for Apache Spark. You can set up EMR Studio to run applications on existing EMR clusters or create new clusters using Cloud Formation templates for Amazon EMR.

Click through for more information.

Comments closed

reduceByKey and aggregateByKey in Spark

The Hadoop in Real World team compares two functions against RDDs in Spark:

Let’s examine the below aggregateByKey. The first parameter – 0 is the initial value and also indicates the type of the output.

First _+_  function indicates the function on the map side combine and second _+_ function indicates the reduce side combine. Both functions are the same in this case.

This is a demo-driven post, so check it out.

Comments closed

Offloading Maintenance Operations

Taryn Pratt has a process for offloading maintenance operations onto another server:

Early on when I started working on the SQL Servers at Stack Overflow, we were taking daily backups. We had a handful of databases that were being restored for other processes, but the majority weren’t actively tested to ensure the backups were good. Since you never want to be in a situation where you need to restore a database and find it doesn’t work, my goal was to create a process to automatically restore our backups to a separate server, and then run DBCC CHECKDB on it.

This is a T-SQL-driven process and I appreciate that. If you want a Powershell-driven process, Kevin Hill has you covered.

Comments closed

Unkillable Threads

Paul Randal gives us a supervillain origin story:

While I was teaching IEPTO2 last week, I was discussing why sometimes a thread cannot be terminated using the KILL command, and thought it would make a great topic for a post.

Some of you have likely seen a phenomenon called a non-yielding scheduler. This is where a thread is using the processor and doesn’t voluntarily yield after using more than the thread quantum (4 milliseconds, unchangeable). There’s a background task called the scheduler monitor that checks that progress is being made on the various schedulers inside SQL Server and issues a warning if it finds a problem.

Read on to learn more about how this can happen and what it means for you.

Comments closed

From Azure Analysis Services to Power BI PPU

Gilbert Quevauvilliers teases a new series:

I have been doing a lot of evaluation and investigations for organizations who currently are using Azure Analysis Services (AAS) and looking to see if they can leverage Power BI Premium Per User (PPU)

In this series I am going to cover the following details below, which I completed to see if the migration was not only feasible but should be the new normal.

Looks like it will be an 11-parter, so we have some reading to look forward to.

Comments closed

Additional Common Query Patterns for Joins

Erik Darling continues a series with two more posts. First up is sorting lookups:

Most people see a lookup and think “add a covering index”, regardless of any further details. Then there they go, adding an index with 40 included columns to solve something that isn’t a problem.

You’ve got a bright future in government, kiddo.

In today’s post, we’re going to look at a few different things that might be going on with a lookup on the inside.

The next post is around pre-fetching lookups:

One sort of interesting point about prefetching is that it’s sensitive to parameter sniffing. Perhaps someday we’ll get Adaptive Prefetching.

Until then, let’s marvel at at this underappreciated feature.

Check out both posts and prepare to be illuminated.

Comments closed