Press "Enter" to skip to content

Month: May 2016

Hortonworks Revenue Growth

Alex Woodie reports that Hortonworks has seen its revenue grow 85% 1Q YOY:

Support subscription revenue during the quarter was up sharply from $13.1 million to $27.6 million, an increase of 110 percent compared to the first quarter of 2015, which was Hortonworks’ first quarter as a public company following an IPO in late 2014. Professional services revenue accounted for $13.7 million in revenue, a 49 percent increase.

Hortonworks holds about 40% of the Hadoop market share, with Cloudera holding another 40%.

Comments closed

Remote Server Installation Using Powershell

Slava Murygin gives tips on using Powershell and task scheduler to schedule remote SQL Server installations:

Finally I’ve nailed down that topic and hopefully that will be my last post dedicated to SQL Server installations on Windows Core.

In this post I will show how it is easy to install SQL Server from a remote computer without remoting to a server, without any GUI, just by using simple command line.

I admit that setting up installation as a scheduled task on the remote machine is not something that ever came to mind before.

Comments closed

Azure SQL Database Management With Powershell

Mike Fal shows a few administration steps with Azure SQL Database, including resetting an admin password:

Walking through this, we just need to create a secure string for our password and then use the Set-AzureRmSqlServer cmdlet and pass the secure string to -SqlAdministratorPassword argument. Easy as that and we don’t even need to know what the previous password was. With this in mind, I also want to call out that you can only change the password and not the admin login name. While this is not such a big deal, be aware that once you have an admin login name, you are stuck with it.

Mike promises that his next blog post won’t take a month to publish.  Here’s hoping he’s right.

Comments closed

Data Science Notebooks

Dan Osipov discusses data science notebooks:

Even though they’ve become prominent in the past few years, they have a long history. First notebooks were available in packages like Mathematica andMatlab, used primarily in academia. More recently they’ve started getting traction in Python community with iPython Notebook. Today there are many notebooks to choose from: Jupyter (successor to the iPython Notebook), R Markdown, Apache Zeppelin,Spark Notebook, Databricks Cloud, and more. There are kernels/backends to multiple languages, such as Python, Julia, Scala, SQL, and others.

Traditionally, notebooks have been used to document research and make results reproducible, simply by rerunning the notebook on source data. But why would one want to choose to use a notebook instead of a favorite IDE or command line? There are many limitations in the current browser based notebook implementations that prevent them from offering a comfortable environment to develop code, but what they do offer is an environment for exploration, collaboration, and visualization.

Back In The Day, developers and infrastructure staff used runbooks to make sure they listed and hit all of the steps in an operation.  I don’t really know of one which integrates directly with SQL Server, but Jupyter is probably the best-known cross-platform notebook.

Comments closed

Hadoop And SQL Server Are Complements

Jim Scott explains that Hadoop and relational databases solve different problems:

That’s the basics. Peeling back the onion more reveals other distinct differences, further making the case more strongly for a Hadoop-RDBMS coexistence strategy. RDBMS has the backing of the biggest names in the software industry, and as such has fostered an install base of IT talent probably second to none. RDBMS integrate very well with other systems, and represent a very mature technology having venerable, 40-year old roots. RDBMS are baked into the very fabric of just about every mid-to large sized IT organization in the world. Believe it – RDBMS aren’t going away any time soon, nor should they.

Relational databases have a strong mathematical footing which provides unparalleled data integrity.  Hadoop has a strong mathematical footing which provides near-linear scale out.  The key is knowing the problem you need to solve and how to integrate the results.

Comments closed

Implicit Conversion (Sometimes) Harms Performance

Grant Fritchey looks at implicit conversion and the havoc it can wreak:

Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:

You might get lucky and have the database engine realize that it doesn’t need to give you a horrible execution plan, but it’s sound advice to ensure that data types match on joins and filters.

Comments closed

“Permanent” Temp Tables

Brent Ozar shows two ways of creating “permanent” temp tables:

The first one disappears when my session is over, but the latter two persist until the SQL Server is restarted.

Why would you ever do the latter two? Say you need to share data between sessions, or between different applications, or staging tables for a data warehouse, or just faster tables that live on local SSDs in a cluster (as opposed to slower shared storage), or you wanna build a really crappy caching tier.

Brent also talks about stored procedures.

Comments closed

Unsharing The Database

Randy Shoup talks about scaling up through breaking out a shared database:

For an early- and mid-stage startup, a monolithic database is absolutely the appropriate architecture choice. With a small team and a small company, a single shared database made it simple to get started. Moving fast meant being able to make rapid changes across the entire system. A shared database made it very easy to join data between different tables, and it made transactions across multiple tables possible. These are pretty convenient.

As we have gotten larger, those benefits have become liabilities. It has become a single point of failure, where issues with the shared database can bring down nearly all of our applications. It has become a performance bottleneck, where long-running operations from one application can slow down others. Finally, and most importantly, the shared database has become a coupling point between teams, slowing down our ability to make changes.

I have my misgivings (as you’d expect from a database snob), particularly because I value highly the benefits of normalization and see sharded systems as a step backwards in that regard.  But even with that said, there are absolutely benefits to slicing out orthogonal sections of data; the point of disagreement is in those places in which two teams’ entities and attributes overlap.

Comments closed

Using Flow + Power BI For Reporting

John White combines Microsoft Flow with Power BI to generate real-time reports:

I may blog about that solution in the future, but with the Future of SharePoint event rapidly coming up, my BI Focal fellow collaborator, Jason Himmelstein convinced me that there was something more interesting that we could do with this. How about near real time monitoring of Twitter conversations for the event? All of the pieces were in place.

We rolled up our sleeves, and in relatively short order, had a solution. Jason has written about the experience on his SharePoint Longhorn blog, and he has included the videos that we put together, so I can be a little less detailed in this post.

Today I learned about Microsoft Flow.  Looks a lot like Zapier.

Comments closed