Today, we are launching Quantum, a high-performance serverless SQL engine, available on Qubole Data Platform, that simplifies SQL access by offering a true serverless deployment option to enable data analysts to query petabyte-scale volumes of data using ANSI-SQL.
Quantum allows teams to realize value from their data much more quickly, and because of its serverless nature, users pay only for queries they run. Data analysts can query object stores on AWS, Azure, Google Cloud, and Oracle Cloud in seconds to achieve faster time to market with far less IT management overhead.
Existing serverless SQL service offerings do not provide users with the ability to use a metastore of their choice. With Quantum, data teams can use their own custom metastore and start using Quantum without recreating schemas or table metadata.
Most existing Qubole customers already use a custom metastore in the cloud. So there’s virtually no ramp up time to reap the benefits of Quantum.
The technical overview is a bit too much marketing for my tastes, but this is a move worth watching.
The future is not clear for either Cloudera and MapR. While there are similarities in the two companies’ positions, there are big differences too.
Cloudera does not have a permanent CEO at the moment, and it still hasn’t shipped the new converged Hadoop distribution, dubbed Cloudera Data Platform (CDP), that will replace the old Cloudera and Hortonworks distributions. During its first quarter ended April 30, Cloudera said customers are holding off investing in the old Hadoop products since they know the new CDP is due by the end of the year. That fact led Cloudera to dramatically lower its revenue expectations for the year, which upset stockholders, who pushed Cloudera’s stock (NYSE: CLDR) down 40% the following day.
The way I’m phrasing it is that the Hadoop ecosystem is strong (with the successes of companies like Databricks and Confluent), but core Hadoop companies are struggling.
Rolf Tesmer explains that machine learning and DevOps aren’t oil and water (or maybe they are and we just need to stir harder):
In talking with various development teams, customers and DevOps engineers, a lot of the potential problems of meshing ML development into an enterprise DevOps process can be boiled down to a few different areas this aims to address…
– ML stack might be different from rest of the application stack
– Testing accuracy of ML model
– ML code is not always version controlled
– Hard to reproduce models (ie explainability)
– Need to re-write featurizing + scoring code into different languages
– Hard to track breaking changes
– Difficult to monitor models & determine when to retrain
So DevOps helps with this, right? Right?
Well er, some of them yes, but not all.
DevOps is not a panacea but it can solve certain types of problems well.
The gripes I hear about fully fixing dynamic SQL are:
– The syntax is hard to remember (setting up and calling parameters)
– It might lead to parameter sniffing issues
I can sympathize with both. Trading one problem for another problem generally isn’t something people get excited about.
But there are good reasons fully to fix it, so read on.
The noticeable thing about the behavior of the slicer is that the two matrices are showing only the brands and colors purchased by Amanda. Yet, the Color slicer is still showing all the colors, even though we know Amanda only purchased three colors: Grey, Silver and White.
The reason is that the matrices, like most Power BI visuals, hide rows if the measure they are showing produces a blank. Because Amanda did not buy any pink product, the value of Sales Amount for Pink results in a blank, therefore the matrix removes the pink color from its result. Prior to the May 2019 release of Power BI, slicers did not display this behavior because slicers did not have a measure to evaluate – they would only show a list of values from a column; Moreover, visual-level filters were not allowed in slicer visuals whereas they were available in other visuals such as charts, tables, and matrices.
Read on to see how to do this.
I haven’t written much about them yet (key emphasis there …) but AGs now being supported for containers in SQL Server 2019 is a big deal. Recently, SQL Server 2019 CTP 3.0 was released, but there’s a slight problem: if you try to deploy an AG with Kubernetes, you may see the following errors when trying to deploy the pods with the YAML that contains their definition. The services (i.e. instances of SQL Server) get created, but the pods do not.
Read on for the root cause and the solution.
The Columnstore Index Scan is not really an actual operator. You can encounter it in graphical execution plans in SSMS (and other tools), but if you look at the underlying XML of the execution plan, you will see that it is either an Index Scan or a Clustered Index Scan operator.
SQL Server currently supports three types of index storage: rowstore, columnstore, and memory-optimized. Indexes of each of those types can be the target of an Index Scan or Clustered Index Scan, as indicated by the Storage property. When the Storage property is RowStore or MemoryOptimized, then the normal icon for (clustered) index scan is use, but when Storage is ColumnStore than SSMS (and other tools) choose to show a different icon instead.
Click through for more details.
The reason for using hexagons is that it is still pretty simple, and when you rotate the chart by 60 degrees (or a multiple of 60 degrees) you still get the same visualization. For squares, rotations of 60 degrees don’t work, only multiples of 90 degrees work. Is it possible to find a tessellation such that smaller rotations, say 45 or 30 degrees, leave the chart unchanged? The answer is no. Octogonal tessellations don’t really exist, so the hexagon is an optimum.
Every time I see one of these, I think of old-timey strategy war games.
Today we are excited to announce the release of MLflow 1.0. Since its launch one year ago, MLflow has been deployed at thousands of organizations to manage their production machine learning workloads, and has become generally available on services like Managed MLflow on Databricks. The MLflow community has grown to over 100 contributors, and the MLflow PyPI package download rate has reached close to 600K times a month. The 1.0 release not only marks the maturity and stability of the APIs, but also adds a number of frequently requested features and improvements.
The release is publicly available starting today. Install MLflow 1.0 using PyPl, read our documentation to get started, and provide feedback on GitHub. Below we describe just a few of the new features in MLflow 1.0. Please refer to the release notes for a full list.
And it looks like they’re going to keep pushing on it from there.
To perform the steps below, I set up a single Ubuntu 16.04 machine on AWS EC2 using local storage. In real-life scenarios you will probably have all these components running on separate machines.
I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Finally, I added a new elastic IP address and associated it with the running instance.
The example logs used for the tutorial are Apache access logs.
This is a great walkthrough on setup and basic configuration. If you don’t have something in place to manage logs, the ELK stack is fine.