Hortonworks And Cloudera To Merge

Ashley Stirrup analyzes the merger of the two largest Hadoop vendors:

Overall, this is great news for customers, the Hadoop ecosystem and the future of the market.  Both company’s customers can now sleep at night knowing that the pace of innovation from Cloudera 2.0 will continue and accelerate.  Combining the Cloudera and Hortonworks technologies means that instead of having to pick one stack or the other, now customers can have the best of both worlds. The statement from their press release “From the Edge to AI” really sums up how complementary some of the investments that Hortonworks made in IoT complement Cloudera’s investments in machine learning.  From an ecosystem and innovation perspective, we’ll see fewer competing Apache projects with much stronger investments.  This can only mean better experiences for any user of big data open source technologies.

At the same time, it’s no secret how much our world is changing with innovation coming in so many shapes and sizes.  This is the world that Cloudera 2.0 must navigate.  Today, winning in the cloud is quite simply a matter of survival.  That is just as true for the new Cloudera as it is for every single company in every industry in the world.  The difference is that Cloudera will be competing with a wide range of cloud-native companies both big and small that are experiencing explosive growth.  Carving out their place in this emerging world will be critical.

The company has so many of the right pieces including connectivity, computing, and machine learning.  Their challenge will be, making all of it simple to adopt in the cloud while continuing to generate business outcomes. Today we are seeing strong growth from cloud data warehouses like Amazon RedshiftSnowflakeAzure SQL Data Warehouse and Google Big Query.  Apache Spark and service players like Databricks and Qubole are also seeing strong growth.  Cloudera now has decisions to make on how they approach this ecosystem and they choose to compete with and who they choose to complement.

Rob Bearden on the Hortonworks side:

Cloudera has a like-minded approach to next generation data management and analytics solutions for hybrid deployments. Like Hortonworks, Cloudera believes data can drive high velocity business model transformations, and has innovated in ways that benefit the market and create new revenue opportunities. We are confident that our combined company will be ideally positioned to redefine the future of data as we extend our leadership and expand our offerings.

This transformational event will create benefits and growth opportunities for our stakeholders. Together with Cloudera, we will accelerate market development, fuel innovation and produce substantial benefits for our customers, partners, employees and the community.

By merging Cloudera’s investments in data warehousing and machine learning with Hortonworks’ investments in end-to-end data management, we are generating a winning combination, which will establish the standard for hybrid cloud data management.

Mike Olson on the Cloudera side:

We’re announcing the combination today, but we don’t expect the deal to close for several months. We’ll undergo the normal regulatory review that any merger of scale involving public companies gets, and the shareholders from both companies will have to meet and approve the deal.

Between now and the close date, we remain independent companies. Our customers are running our respective products. Our sales teams are working separate from each other with current and new customers to win more business and to make those customers successful. We’ll both continue to do that.

Customers who are running CDH, HDP and HDF are getting a new promise. Those product lines will each be supported and maintained for at least three years from the date our merger closes. Any customer who chooses either can be sure of a long-term future for the platform selected.

Guy Shilo isn’t quite as pleased:

On the business side, the new company will be a de facto monopoly, as those two are the largest Hadoop vendors in terms of market share. Less competition often leads to lack of incentive to innovate and rising prices. Let’s hope the joint company will not go this way and leverage its funds and power to improving their products and services.

On the technological side, it will be interesting to see the way CDH and HDP will go. Will they keep both products alive ? will they continue only one ? which of them will it be ? Will they take the HortonWorks approach that embraces the Hadoop open source community and its fast changing versions or the Cloudera more conservative approach ?

I am cautiously pessimistic about this.  Cloudera and Hortonworks combined for a huge amount of the Hadoop market (approximately 80% as of a couple of years ago).  There are several competitors in the broader market, but I thought that Cloudera and Hortonworks gave us two separate visions for different types of companies.

Analyzing Update Dates For R Packages

Tomaz Kastrun takes a look at CRAN package update dates:

So more updates are coming in autumn times. But the results of correlation:

cor(dd_ym2010)[2,3]

is still just 0.155, making it hard to draw any concrete conclusions. Adding year 2018 will skew the picture and add several outliers, as the fact that year 2018 is still a running year (as of writing this blog post).

Read on for a descriptive analysis of this data set.

Missing Backup Directory When Trying To Upgrade SQL Server

Lori Brown walks us through the solution to an error she experienced:

I was recently performing an in-place upgrade of SQL 2008 R2 to SQL 2014 on one of my client’s servers.  I have done a ton of successful and uneventful in-place upgrades and was surprised when the upgrade failed with the error message:  “Failed to create a new folder ‘X:\SQLBackups’. The specified path is invalid (for example, it is on an unmapped drive).”  This client had over the years changed from using a local drive for all backups to having backups sent to a network share.  So, the X drive really was no longer in existence.

Read on for the solution.

Parameter Sniffing Issues With Table Variables

Milos Radivojevic points out the downside to table variable deferred compilation in SQL Server 2019:

Since the actual number of rows is significantly greater than in the call with the parameter ‘White’, you can see here sort warnings because this time 1MB of memory grant was not sufficient for sorting. But, the execution plan is exactly the same as for the first call.

Prior to SQL Server 2019, the execution plan for the second query in this stored procedure was always the same, regardless of parameter used for the first invocation and thus plan generation. Since the table variable has cardinality of 1, all estimations and the execution plan will be the same. We can say, using a table variable in this stored procedure and passing it to the second query neutralizes parameter sniffing effect. That does not mean, this is good or bad for all executions (you saw sort warnings and they are always bad), but the plan was stable, it did not change even after failover or clearing cache. If you call this stored procedure usually with high selective parameters, you can consider this plan as a good plan.

In SQL Server 2019, since table variable could have different cardinality, this stored procedure is prone to parameter sniffing and depending on the first execution parameter, you can have different execution plans:

This is a natural outcome and something we’d work with just like we would with a temp table or regular table in a stored procedure.

Creating Minesweeper In Power BI

Philip Seamark has fun with a classic Windows game reimplemented in Power BI:

The latest addition to my recent series of DAX based games is the classic Minesweeper game.  This is the game where you are presented with a 9 x 9 matrix of squares.  There are 10 hidden mines and you can either step on a square or place a flag where you think there might be a mine.  If you are lucky enough not to step on a square that contains a mine, you will get clues that help you identify where the mines are.

Click here if you would like to see the final publish to web version.

Click here if you would like to download the PBIX version to go through the code.

Still better than the Windows 10 version.

Taking Action With Wait Stats

Aaron Bertrand lays out a course of action (or inaction) when dealing with the most common wait types in SQL Server:

I started going a little further than this, mapping out some of the more common wait types, and noting some of the properties they shared. Translated into questions a tuner might have about a wait type they are experiencing:

  • Can the wait type be solved at the query level?
  • Is the core symptom of the wait likely to be affecting other queries?
  • Is it likely you will need more information outside the context of a single query and the wait types it experienced in order to “solve” the problem?

When I set out to write this post, my goal was just to group the most common wait types together, and then start jotting notes about them relating to the above questions. Jason pulled the most common ones from the library, and then I drew some chicken scratch on a whiteboard, which I later tidied up a bit. This initial research led to a talk that Jason gave on the most recent TechOutbound SQL Cruise in Alaska. I’m kind of embarrassed that he put a talk together months before I could finish this post, so let’s just get on with it. Here are the top waits we see (which largely match Paul’s survey from 2014), my answers to the above questions, and some commentary on each:

Read on for the top 10 list.

Categories

October 2018
MTWTFSS
« Sep  
1234567
891011121314
15161718192021
22232425262728
293031