After all the previous functions has been loaded, just type the following one-liner:
Process-PSObjectToSQL -SQLDataObj $SQLData;
This sample script code can serve as a startup Template to load data into SQL Server.
This sample SQL data load will fail. Here’s when the Try/Catch/Finally will work for you in trapping what went wrong. Adding the necessary code to provide that additional information to troubleshoot and fix the problem.
When you run a query, SQL Server estimates how “expensive” it is in a fake costing unit, let’s call it Estimated QueryBucks.
If a query’s Estimated QueryBucks is over the “Cost Threshold for Parallelism” setting in SQL Server, it qualifies to potentially use multiple processors to run the query.
The number of processors it can use is defined by the instance level “Max Degree of Parallelism” setting.
When writing TSQL, you can specify maxdop for individual statements as a query hint, to say that if that query qualifies to go parallel, it should use the number of processors specified in the hint and ignore the server level setting. (You could use this to make it use more processors, or to never go parallel.)
Read the whole thing, or watch/listen to the video.
Numbers suggest that Storm has come a long way in terms of performance but it still has room go faster. Here are some of the broad areas that should improve performance in future:
An effort to rewrite much of Storm’s Clojure code in Java is underway. Profiling has shown many hotspots in Clojure code.
Better scheduling of workers. Yahoo is experimenting with a Load Aware Scheduler for Storm to be smarter about the way in which topologies are scheduled on the cluster.
Based on microbenchmarking and discussions with other Storm developers there appears potential for streamlining the internal queueing for faster message transfer.
Operator coalescing (executing consecutive spouts/bolts in a single thread when possible) is another area that reduces intertask messaging and improve throughput.
Even with these potential improvements, Storm has come a long way—their benchmarks show around 5x throughput and a tiny fraction of the latency of Storm 0.9.1.
Table-Valued Parameters (TVPs)
TVP support allows a client application to send parameterized data to the server more efficiently by sending multiple rows to the server with a single call. You can use the JDBC Driver 6.0 to encapsulate rows of data in a client application and send the data to the server in a single parameterized command.
There are a couple of interesting features in this driver which could help your Hadoop cluster play nice with SQL Server.
Head rush moment: SQL may use information from histograms outside of the index it chooses for cardinality estimation.
Back to earth: If you just create a bunch of statistics instead of indexes, you’re (at best) using your Clustered Index for everything (which is still bad), or you’re using a HEAP for everything (which is usually worse). You’re still generally better off creating good indexes for your workload. They’ll get statistics objects created and associated with them, and if SQL thinks another column is interesting, it will create a single column statistics object for it, as long as you haven’t turned off auto create stats.
Sure, you can put on your black cloak and goat mask and create some multi-column or filtered statistics, but in the words of a wise man (Doug), you end up with more stats to maintain and understanding query behavior gets more difficult.
Erik argues that at the margin, keeping existing statistics up to date tends to be a better solution than creating new statistics.
The most important precursor to doing this is you will need a licensed installation of Mist. Full stop. The assemblies we’re going to use have security built into them to tell whether they are licensed and you cannot use the assemblies shipped with BidsHelper or BimlExpress as they’ve hardwired to the specific apps.
Ast, what is that? Abstract Syntax Tree – it’s a compiler theory thing.
I’m interested in seeing where this series goes.
If tabular data is already in memory, what’s the point of having a cache at all? Memory is memory, right? Both are in main memory and access speed is the same, right?
Good question! Yes, access speed is the same. However, there are other benefits to a cache store.
For example, even though the data is already in memory, queries against the tabular model can still be slow… very slow even… usually in cases where the execution of the query is bound to the single threaded formula engine. To be sure, this is not a tabular specific problem… formula engine bound queries can be (and are commonly) found in both technologies and the issue (usually) speaks more to the deign of the model and/or the way the query is written (be that DAX for tabular or MDX in multidimensional). That said, performance problems related to FE-bound queries can be terribly difficult to resolve as it usually requires a redesign of the data model and rewrite of the query or measure(s) involved.
Bill points out the limitations of this solution, but within those limitations this looks like it could be a huge time-saver for end users.
Third, how many instances can or should you run for your application? Is there an upper limit for the number of instances and, similarly, for the parallelism of your application? In a nutshell, the parallelism of a Kafka Streams application — similar to the parallelism of Kafka — is primarily determined by the number of partitions of the input topic(s) from which your application is reading. For example, if your application reads from a single topic that has 10 partitions, then you can run up to 10 instances of your applications (note that you can run further instances but these will be idle). In summary, the number of topic partitions is the upper limit for the parallelism of your Kafka Streams application and thus for the number of running instances of your application. Note: A scaling/parallelism caveat here is that the balance of the processing work between application instances depends on how well data messages are balanced between partitions.
Check it out. Kafka Streams is a potential alternative to Spark Streaming and Storm for real-time (for some definition of “real-time”) distributed computing.
Weighting is used to make sure samples reproduce the underlying characteristics of the population they are drawn from. For instance, in the UK 51.6% of voters are women, of these 22.5% are above 65 years of age, 23.1% have a higher education and so on. The people selected to make part of a sample may be recruited to ensure they match these proportions.
Within the context of weighting there are two major sampling procedures: Quota and stratified sampling. The relevant difference between them is the latter uses some sort of randomization device while with the former the interviewer decides whether or not to interview a person with certain characteristics. YouGov presumably applied Quota Sampling for the Brexit survey, and the gender, age and education weights they applied are shown in the following image.
Read the whole thing.
The example on the right adds more value over the one on the left. The example on the right uses a column chart instead of a slicer. The benefit of this is you can communicate more information to the user than you can with the static slicer. In this case I am displaying the total value of each of the scenarios which means it is easy to see the relative difference between them. And this all happens while still providing slicer capabilities thanks to the powerful cross filter feature in Power BI. This is what I mean by adding value by thinking outside your old paradigms.
Definitely read the comments on this one; they are full of great questions and suggestions.