Hortonworks Revenue Growth

Kevin Feasel

2016-05-05

Hadoop

Alex Woodie reports that Hortonworks has seen its revenue grow 85% 1Q YOY:

Support subscription revenue during the quarter was up sharply from $13.1 million to $27.6 million, an increase of 110 percent compared to the first quarter of 2015, which was Hortonworks’ first quarter as a public company following an IPO in late 2014. Professional services revenue accounted for $13.7 million in revenue, a 49 percent increase.

Hortonworks holds about 40% of the Hadoop market share, with Cloudera holding another 40%.

Remote Server Installation Using Powershell

Slava Murygin gives tips on using Powershell and task scheduler to schedule remote SQL Server installations:

Finally I’ve nailed down that topic and hopefully that will be my last post dedicated to SQL Server installations on Windows Core.

In this post I will show how it is easy to install SQL Server from a remote computer without remoting to a server, without any GUI, just by using simple command line.

I admit that setting up installation as a scheduled task on the remote machine is not something that ever came to mind before.

Azure SQL Database Management With Powershell

Mike Fal shows a few administration steps with Azure SQL Database, including resetting an admin password:

Walking through this, we just need to create a secure string for our password and then use the Set-AzureRmSqlServer cmdlet and pass the secure string to -SqlAdministratorPassword argument. Easy as that and we don’t even need to know what the previous password was. With this in mind, I also want to call out that you can only change the password and not the admin login name. While this is not such a big deal, be aware that once you have an admin login name, you are stuck with it.

Mike promises that his next blog post won’t take a month to publish.  Here’s hoping he’s right.

Data Science Notebooks

Dan Osipov discusses data science notebooks:

Even though they’ve become prominent in the past few years, they have a long history. First notebooks were available in packages like Mathematica andMatlab, used primarily in academia. More recently they’ve started getting traction in Python community with iPython Notebook. Today there are many notebooks to choose from: Jupyter (successor to the iPython Notebook), R Markdown, Apache Zeppelin,Spark Notebook, Databricks Cloud, and more. There are kernels/backends to multiple languages, such as Python, Julia, Scala, SQL, and others.

Traditionally, notebooks have been used to document research and make results reproducible, simply by rerunning the notebook on source data. But why would one want to choose to use a notebook instead of a favorite IDE or command line? There are many limitations in the current browser based notebook implementations that prevent them from offering a comfortable environment to develop code, but what they do offer is an environment for exploration, collaboration, and visualization.

Back In The Day, developers and infrastructure staff used runbooks to make sure they listed and hit all of the steps in an operation.  I don’t really know of one which integrates directly with SQL Server, but Jupyter is probably the best-known cross-platform notebook.

Hadoop And SQL Server Are Complements

Kevin Feasel

2016-05-05

Hadoop

Jim Scott explains that Hadoop and relational databases solve different problems:

That’s the basics. Peeling back the onion more reveals other distinct differences, further making the case more strongly for a Hadoop-RDBMS coexistence strategy. RDBMS has the backing of the biggest names in the software industry, and as such has fostered an install base of IT talent probably second to none. RDBMS integrate very well with other systems, and represent a very mature technology having venerable, 40-year old roots. RDBMS are baked into the very fabric of just about every mid-to large sized IT organization in the world. Believe it – RDBMS aren’t going away any time soon, nor should they.

Relational databases have a strong mathematical footing which provides unparalleled data integrity.  Hadoop has a strong mathematical footing which provides near-linear scale out.  The key is knowing the problem you need to solve and how to integrate the results.

Implicit Conversion (Sometimes) Harms Performance

Grant Fritchey looks at implicit conversion and the havoc it can wreak:

Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:

You might get lucky and have the database engine realize that it doesn’t need to give you a horrible execution plan, but it’s sound advice to ensure that data types match on joins and filters.

“Permanent” Temp Tables

Brent Ozar shows two ways of creating “permanent” temp tables:

The first one disappears when my session is over, but the latter two persist until the SQL Server is restarted.

Why would you ever do the latter two? Say you need to share data between sessions, or between different applications, or staging tables for a data warehouse, or just faster tables that live on local SSDs in a cluster (as opposed to slower shared storage), or you wanna build a really crappy caching tier.

Brent also talks about stored procedures.

Waiting Tasks Script Update

Paul Randal has updated his waiting tasks script to include degree of parallelism:

A question came up in class today about easily seeing the degree of parallelism for parallel query plans, so I’ve updated my waiting tasks script to pull in the dop field from sys.dm_exec_query_memory_grants. Here it is for your use.

Enjoy!

This is a good one to have in your grab bag of scripts.

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031