If your application requires a precise LD value, this heuristic isn’t for you, but the estimates are typically within about 0.05 of the true distance, which is more than enough accuracy for such tasks as:
Confirming suspected near-duplication.
Estimating how much two document vary.
Filtering through large numbers of documents to look for a near-match to some substantial block of text.
The estimation process is pretty interesting. Worth a read.
Note that the parameters of xgboost used here fall in three categories:
- nthread (number of threads used, here 8 = the number of cores in my laptop)
- max.depth (of tree)
Learning task parameters
- objective: type of learning task (softmax for multiclass classification)
- num_class: needed for the “softmax” algorithm: how many classes to predict?
Command Line Parameters
nround: number of rounds for boosting
Read the whole thing.
With all of these changes to the OS, which setting should we use for SQL Server? In general, for all of these operating systems, I recommend that TCP Chimney Offload be disabled – because you can see odd connectivity problems in any other state. Notice in the above quote that Microsoft says that this feature is best used for applications with long-lived connections that transfer large amounts of data – hopefully your OLTP database is performing lots of short-lived connections and they are not transferring large amounts of data (if they are, I can help you with that!).
Definitely worth a read.
GET STARTED IN 4 STEP
1) Download HDP Sandbox as a VM image(VMware and Virtualbox or Docker
2) Setup and Start the VM image.
3) Try a Sandbox tutorial, check out the list of free tutorials, or jump directly into an Hello to HDP hands-on tutorial.
4) Need more help? Visit the Hortonworks Community Connection(HCC) and interact directly with the community and our development team.
It looks like they’ve bumped up the RAM requirements to 8 GB and have added new tutorials.
This one’s a bit more tricky but let’s walk through it. We’re getting data from the Posts table where the Tags column equals “<sql-server>” and selecting every column from both the Posts and PostTags tables. We can tell because there are no specified properties in the Select. Even though this statement looks more complex it’s only three lines and looks somewhat like a SQL statement. But it’s really a LINQ (Language Integrated Query) statement, specifically a LINQ to Entities statement. This LINQ statement will be translated into this SQL statement:
Read the whole thing.
The datetime returned by this query is in UTC. My query returns 9/19/2016 7:43:03 PM.
If I go into the properties of my SSAS database, I can see this same info, but the timezone conversion has already been done for me (this server is in Central time zone).
I think that on net, that’s the best way to do it: store everything in UTC and use the presentation layer to convert those to local times.
Hum! I just found out that in SQL Server 2014 (SP2 installed), while migrating from SQL Server 2005, one of my PowerShell script (I’ve been using for a long time) that uses SMO to truncate tables. But, when running it against a SQL Server 2014 database, I’m getting an error:
“..this property is not available on SQL Server 2014.”
For mi surprise, I ran the same PowerShell script against SQL Server 2016 and it works fine.
That seems rather odd. If this affects you, vote up his UserVoice item.
Well Paul told me this wasn’t the case. Now when Paul tells me something I believe him, but I also like to run tests. So I decided to usesys.fn_PhysLocCracker(%%physloc%%). %%physloc%% returns a varbinary that gives you the location of the row. When passed tosys.fn_PhysLocCracker(%%physloc%%) it returns the database file, page in the file, and slot number where the row can be found. So to start with I create an identity(1,1) and I run 20 inserts, one at a time, checking row locations each time. This is to confirm I’m right about this part.
Clicking through is worth it for the hypnotizing animated GIFs.
A simple but effective setting in SQL Server Management Studio is using custom colours to identify which server you are about to execute a query on. It’s simple to setup but not everyone who uses SSMS is aware of it so I thought I’d quickly run through the steps here.
This is a nice visual way of figuring out you’re in production before you run that truncate table script.
The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.
The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.
Arun also looks at running a simple Power BI report off of Azure SQL Data Warehouse; click through for that.