Press "Enter" to skip to content

Category: Cloud

Automatic Approval For Data Lake Analytics

Yan Li reports that Azure Data Lake Analytics no longer requires waiting for approval:

We’re happy to announce that we’ve made it much faster to get started with the Data Lake Store and Analytics services starting today. Before today, when you tried to sign up for these services you had to go through an approval process that introduced a delay of at least one hour.

Now, you no longer have to wait for approval, and you can simply create an account immediately.

Yan also has some “getting started” links to help you out, now that you don’t have to wait for an account.

Comments closed

Azure SQL Database Supports JSON

Jovan Popvic reports that Azure SQL Database now has full JSON support:

JSON is available in all service tiers (basic, standard, and premium) but only in new SQL Database V12. You can see quick  introduction here or more details in Getting Started page. you can also find code samples that JSON functions in Azure Sql Database on official Sql Server/Azure Sql Database GitHub repository.

Note that OPENJSON function requires database compatibility level 130. If all functions work except OPENJSON, you would need to set the latest compatibility level in database.

It will be interesting to see adoption of JSON within Azure SQL Database.  I could see it being a bit more likely due to DocumentDB.

Comments closed

HBase Performance Tips

Ashish Thapliyal has nine tips for optimizing HBase performance:

Does your RowKey’s looks like 1,2,3…….. or 00000001, 00000002, 00000003, or do you have Row Key that starts with date-time (starting with the year)? If you answered yes, bad news is that HBase will not scale for you, you have so many options to improve the HBase performance but there is nothing that will compensate for the bad rowkey design.

When rowkey is in sorted order, all the writes go to the same region and other regions will sit ideal doing nothing. you will see one of your node is very stressed trying to cope up with all the writes where as other nodes are thanking you for not giving them enough work. So, always salt your keys by adding random numbers or characters to the row key prefix.

If you are using Phoenix on top of HBase, Phoenix provides a way to transparently salt the row key with a salting byte for a particular table. You need to specify this in table creation time by specifying a table property “SALT_BUCKETS” typical practice is to set the value of SALT_BUCKET =number of region server

I think the biggest one is to design your data structures correctly.  This is particularly important if you’re coming at it from a relational background and are thinking in terms of what makes relational databases fast.

Comments closed

Stripe Those Azure Disks!

Jens Vestergaard shows you how to create striped disks for Azure VMs:

As displayed in above screen shots, the single Azure Standard Storage VHD gives you (as promised) about 500 IOPS. Striping eight (8) of those, will roughly give you eight (8) times the IOPS, but not same magnitude of [MB/s] apparently. Still, the setup is better off, after, rather than before!

Do mind, that there are three levels of storage performance; P10, P20 and P30. For more information, read this.

I did this recently and can confirm that there’s a huge difference between using one virtual disk versus even three or four, and Windows Storage Spaces makes it easy to expose them as one combined mount point.

Comments closed

The Joy Of Hyperparameters

Koos van Strien shows how to tune hyperparameters using Azure ML:

Today, we’ll focus on tuning the model’s properties. We won’t discuss the details of all properties (you can easily look that up in the docs), instead we’ll look at how to test for different parameter combinations insize Azure ML Studio.

As soon as you click on an untrained model inside your experiment, you’ll be presented with some parameters – or, in ML parlance, hyperparameters – you can tweak.

Parameter tuning is pretty easy using Azure ML.

Comments closed

Self-Paced HDInsight Training

Ashish Thapliyal introduces three EdX courses on HDInsight:

Implementing Real-Time Analysis with Hadoop in Azure HDInsight

Start course

In this four week course, you’ll learn how to implement low-latency and streaming Big Data solutions using Hadoop technologies like HBase, Storm, and Spark on Microsoft Azure HDInsight.

Course Syllabus

Use HBase to implement low-latency NoSQL data stores.
Use Storm to implement real-time streaming analytics solutions.
Use Spark for high-performance interactive data analysis.

These are free courses on EdX.  I personally wouldn’t bother getting the certificate, but hey, it’s your money.

Comments closed

Benchmarking Azure SQL Database Wait Stats

John Sterrett explains wait stats and which stats are most important for Azure SQL Database:

With an instance of SQL Server regardless of using IaaS or on-premise, you would want to focus on all the waits that are occurring in your instance because the resources are dedicated to you.

In database as a service (DaaS), Microsoft gives you a special DMV that makes troubleshooting performance in Azure easier than any other competitor.  This feature is the dm_db_wait_stats DMV.  This DMV allows us specifically to get the details behind why our queries are waiting within our database and not the shared environment.  Once again it is worth repeating, wait statistics for our database in a shared environment.

Click through for a stored procedure John has provided to collect wait stats in a Waits schema.

Comments closed

Spark Usage Scenarios

Rimma Nehme has several usage scenarios for Spark on Azure:

For data scientists, we provide out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the world. Unlike other managed Spark offerings that might require you to install your own notebooks, we worked with the Jupyter OSS community to enhance the kernel to allow Spark execution through a REST endpoint.

We co-led “Project Livy” with Cloudera and other organizations to create an open source Apache licensed REST web service that makes Spark a more robust back-end for running interactive notebooks.  As a result, Jupyter notebooks are now accessible within HDInsight out-of-the-box. In this scenario, we can use all of the services in Azure mentioned above with Spark with a full notebook experience to author compelling narratives and create data science collaborative spaces. Jupyter is a multi-lingual REPL on steroids. Jupyter notebook provides a collection of tools for scientific computing using powerful interactive shells that combine code execution with the creation of a live computational document. These notebook files can contain arbitrary text, mathematical formulas, input code, results, graphics, videos and any other kind of media that a modern web browser is capable of displaying. So, whether you’re absolutely new to R or Python or SQL or do some serious parallel/technical computing, the Jupyter Notebook in Azure is a great choice.

If you could only learn one new thing in 2016, Spark probably should be that thing.  Also, I probably should agitate a bit more about wanting Spark support within Polybase…

Comments closed

U-SQL

Ginger Grant has a quick intro on U-SQL:

In my previous series on Stream Analytics, I wrote some U-SQL. That U-SQL didn’t look much different than Ansi-SQL, which is sort of the point of porting the functionality to a different yet familiar language. Another application which heavily uses U-SQL is Azure Data Lake. Data Lake stores its data in HDInsight, but you don’t need to write hive to query the data, as U-SQL will do it. Like Hive, U-SQL can be used to create a schema on top of some data, and then query it.

For example, to write a query on this csv file stored in a Data Lake, I would need to create the data definition for the data, then I could easily write a statement to query it.

I’m interested in seeing how much adoption we see in this language.

Comments closed

Azure SQL Data Warehouse Plans

Grant Fritchey shows how to build an execution plan for an Azure SQL Data Warehouse query:

So now we just save this as a .sqlplan file and open it in SSMS, right?

Nope!

See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

That’s an unfortunate outcome.  Reading is hard…

Comments closed