Support subscription revenue during the quarter was up sharply from $13.1 million to $27.6 million, an increase of 110 percent compared to the first quarter of 2015, which was Hortonworks’ first quarter as a public company following an IPO in late 2014. Professional services revenue accounted for $13.7 million in revenue, a 49 percent increase.
Hortonworks holds about 40% of the Hadoop market share, with Cloudera holding another 40%.
Finally I’ve nailed down that topic and hopefully that will be my last post dedicated to SQL Server installations on Windows Core.
In this post I will show how it is easy to install SQL Server from a remote computer without remoting to a server, without any GUI, just by using simple command line.
I admit that setting up installation as a scheduled task on the remote machine is not something that ever came to mind before.
Walking through this, we just need to create a secure string for our password and then use the Set-AzureRmSqlServer cmdlet and pass the secure string to -SqlAdministratorPassword argument. Easy as that and we don’t even need to know what the previous password was. With this in mind, I also want to call out that you can only change the password and not the admin login name. While this is not such a big deal, be aware that once you have an admin login name, you are stuck with it.
Mike promises that his next blog post won’t take a month to publish. Here’s hoping he’s right.
Even though they’ve become prominent in the past few years, they have a long history. First notebooks were available in packages like Mathematica andMatlab, used primarily in academia. More recently they’ve started getting traction in Python community with iPython Notebook. Today there are many notebooks to choose from: Jupyter (successor to the iPython Notebook), R Markdown, Apache Zeppelin,Spark Notebook, Databricks Cloud, and more. There are kernels/backends to multiple languages, such as Python, Julia, Scala, SQL, and others.
Traditionally, notebooks have been used to document research and make results reproducible, simply by rerunning the notebook on source data. But why would one want to choose to use a notebook instead of a favorite IDE or command line? There are many limitations in the current browser based notebook implementations that prevent them from offering a comfortable environment to develop code, but what they do offer is an environment for exploration, collaboration, and visualization.
Back In The Day, developers and infrastructure staff used runbooks to make sure they listed and hit all of the steps in an operation. I don’t really know of one which integrates directly with SQL Server, but Jupyter is probably the best-known cross-platform notebook.
That’s the basics. Peeling back the onion more reveals other distinct differences, further making the case more strongly for a Hadoop-RDBMS coexistence strategy. RDBMS has the backing of the biggest names in the software industry, and as such has fostered an install base of IT talent probably second to none. RDBMS integrate very well with other systems, and represent a very mature technology having venerable, 40-year old roots. RDBMS are baked into the very fabric of just about every mid-to large sized IT organization in the world. Believe it – RDBMS aren’t going away any time soon, nor should they.
Relational databases have a strong mathematical footing which provides unparalleled data integrity. Hadoop has a strong mathematical footing which provides near-linear scale out. The key is knowing the problem you need to solve and how to integrate the results.
Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:
You might get lucky and have the database engine realize that it doesn’t need to give you a horrible execution plan, but it’s sound advice to ensure that data types match on joins and filters.
The first one disappears when my session is over, but the latter two persist until the SQL Server is restarted.
Why would you ever do the latter two? Say you need to share data between sessions, or between different applications, or staging tables for a data warehouse, or just faster tables that live on local SSDs in a cluster (as opposed to slower shared storage), or you wanna build a really crappy caching tier.
Brent also talks about stored procedures.
A question came up in class today about easily seeing the degree of parallelism for parallel query plans, so I’ve updated my waiting tasks script to pull in the dop field from sys.dm_exec_query_memory_grants. Here it is for your use.
This is a good one to have in your grab bag of scripts.
For an early- and mid-stage startup, a monolithic database is absolutely the appropriate architecture choice. With a small team and a small company, a single shared database made it simple to get started. Moving fast meant being able to make rapid changes across the entire system. A shared database made it very easy to join data between different tables, and it made transactions across multiple tables possible. These are pretty convenient.
As we have gotten larger, those benefits have become liabilities. It has become a single point of failure, where issues with the shared database can bring down nearly all of our applications. It has become a performance bottleneck, where long-running operations from one application can slow down others. Finally, and most importantly, the shared database has become a coupling point between teams, slowing down our ability to make changes.
I have my misgivings (as you’d expect from a database snob), particularly because I value highly the benefits of normalization and see sharded systems as a step backwards in that regard. But even with that said, there are absolutely benefits to slicing out orthogonal sections of data; the point of disagreement is in those places in which two teams’ entities and attributes overlap.
I may blog about that solution in the future, but with the Future of SharePoint event rapidly coming up, my BI Focal fellow collaborator, Jason Himmelstein convinced me that there was something more interesting that we could do with this. How about near real time monitoring of Twitter conversations for the event? All of the pieces were in place.
We rolled up our sleeves, and in relatively short order, had a solution. Jason has written about the experience on his SharePoint Longhorn blog, and he has included the videos that we put together, so I can be a little less detailed in this post.