Learning JSON

Kevin Feasel

2016-08-17

JSON

Jason Brimhall wants to learn a bit of JSON:

Let’s just get this out there right now – I suck at JSON. I suck at XML. The idea of querying a non-normalized document to get the data is not very endearing to me. It is for that reason that I have written utilities or scripts to help generate my XML shredding scripts – as can be seen here.

Knowing that I have this allergy to features similar to XML, I need to build up some resistance to the allergy through a little learning and a little practice. Based on that, my plan is pretty simple:

  1. Read up on JSON

  2. Find some tutorials on JSON

  3. Practice using the feature

  4. Potentially do something destructive with JSON

I’m not particularly excited about JSON support in SQL Server 2016 but the fact that it is there, combined with the fact that so many developers love JSON means that it’s a good idea to learn how to integrate, if only to figure out when it’s a bad idea to parse JSON within your very expensive SQL Server instances.

R Or M?

Kevin Feasel

2016-08-17

Power BI, R

Ryan Wade gives a few scenarios in which R might be a better language choice than M for Power BI integration:

When referring to what can be done in iOS, Apple often say that there is an “app” for that. Likewise, when R developers refer to what can be done in R, we often say that there is a “package” for that. For instance:

· If one needs to scrap data from the web there are packages for that (rvest, rcurl, and others)

· If one needs to make complicated transformations to their data there are packages for that (dplyr, tidyr, lubrdiate, stringr, and others)

I like the F#-ness of M, but I admit that I’m happy there’s some fairly close R integration within Power BI, as that means there’s one fewer language I need to learn right now…

DBCC CHECKDB’s NOINDEX Option

Kendra Little explains the NOINDEX option of DBCC CHECKDB and how it relates to PHYSICAL_ONLY:

PHYSICAL_ONLY looks at page structures and makes sure something isn’t obviously wrong. If you have the database’s page verification option set to CHECKSUM (the default since SQL Server 2005, and a very good thing), it runs a checksum on the page and compares it to a checksum recorded in the header when the page was last modified.

When you use the PHYSICAL_ONLY option, you’re telling SQL Server to skip logical checks. It won’t do things like make sure the data in a clustered and nonclustered index on a table is consistent with one another.

I like the analogy, as well as the how-to guide on messing up your database.  But try not to do that to a production database…

Redirecting Linked Servers

Slava Murygin shows how to point a linked server to a new host:

In my example my linked server has an IP addressed name “192.168.1.103” and pointed to the same SQL Server host.

If we want to redirect that linked server to another SQL Server host we can do following:

April Fool’s Day joke, anybody?

SparkSession

Kevin Feasel

2016-08-16

Spark

Jules Damji shows off SparkSession:

Beyond a time-bounded interaction, SparkSession provides a single point of entry to interact with underlying Spark functionality and allows programming Spark with DataFrame and Dataset APIs. Most importantly, it curbs the number of concepts and constructs a developer has to juggle while interacting with Spark.

In this blog and its accompanying Databricks notebook, we will explore SparkSession functionality in Spark 2.0.

This looks to be an easier method for integrating various parts of Spark in one user session.  Read the whole thing.

Concurrency In Hadoop Using ZooKeeper

Kevin Feasel

2016-08-16

Hadoop

Garima Dosi discusses an architecture using ZooKeeper to introduce some limited protections for concurrent access in HDFS:

The ZooKeeper nodes topology as per the design looks like this. ZooKeeper works like a filesystem starting with a root directory followed with several nodes (analogous to folders) and finally the data nodes (analogous to files). The circles in the image represent the name of a property/folder that we are trying to maintain and the rounded boxes are the values/files for those properties/folders.

So, the image above shows that the “global version” is 100 and there are 10 & 20 read requests being executed on versions 98 and 99 respectively and since there is a write request in progress, no other write request would be taken up until it completes.

This feels a little overly complicated to me.

Analytic Tool Usage

Alex Woodie notes the increased popularity of Python for data analysis:

According to the results of the 2016 survey, R is the preferred tool for 42% of analytics professionals, followed by SAS at 39% and Python at 20%. While Python’s placing may at first appear to relegate the language to Bronze Medal status, it’s the delta here that really matters.

It’s interesting to see the breakdowns of who uses which language, comparing across industry, education, work experience, and geographic lines.

Locality

Kyle Kingsbury explains that sequential, serializable, and strictly serializable consistency models cannot provide locality:

We often speak of locality as a property of subhistories for a particular object x: “H|x is strictly serializable, but H is not”. This is a strange thing to say indeed, because the transactions in H may not meaningfully exist in H|x. What does it mean to run [(A enq y 1) (A enq x 1)] on x alone? If we restrict ourselves to those transactions that doapply to a single object, we find that those transactions still serialize in the full history.

So in a sense, locality is about the scope of legal operations. If we take single enqueue and dequeue operations over two queues x and y, the space of operations on the composite system of x and y is just the union of operations on x and those on y. Linearizability can also encompass transactions, so long as they are restricted to a single system at a time. Our single-key, multi-operation transactions still satisfied strict serializability even in the composite system. However, the space of transactions on a composite system is more than the union of transactions on each system independently. It’s their product.

Here’s the part where I pretend that of course I understand what Kyle wrote…  Seriously, though, this is a very interesting read.

Min And Max Server Memory

Kevin Hill explains the minimum and maximum server memory options in SQL Server:

Min Server Memory seems to get the most bad information spread around.   SQL Server does NOT automatically grab memory up to the Min setting when it starts.  However, once it gets there, it doesn’t give back.

Back to the car analogy…if you start up and head off down the road at 20 mph, you are above the default (0), but not at the max (100+, depending on the car and the tires…).  If you set the cruise control, you can accelerate up and down above 20, but you won’t go below that unless you hit the brakes.

I do like the car analogy to his post.

VirtualBox Setup

Kevin Feasel

2016-08-16

Hadoop

Jon Morisi has notes on downloading and preparing the Hortonworks sandbox VM via VirtualBox:

Lessons learned:

Use wget for downloads that repeatedly drop the connection.
Don’t try to run VirtualBox inside a VM.
Enable VTx in the BIOS to run VirtualBox.

This post is mostly about troubles in getting the VM and preparing the software.  Definitely take his advice on enabling VTx in your BIOS.

Categories

August 2018
MTWTFSS
« Jul  
 12345
6789101112
13141516171819
20212223242526
2728293031