Divolte Collector is a scalable and performant application for collecting clickstream data and publishing it to a sink, such as Kafka, HDFS or S3. Divolte has been developed by GoDataDriven and made available to the public under the Apache 2.0 open source license.
Click through for the example.
We’re enabling the plugin to work as both a source and a sink. In the
NEO4J_streams_sink_topic_cypher_friendsitem, we’re writing a Cypher query. In this query, we’re
Personnodes. The plugin gives us a variable named
event, which we can use to pull out the properties we need. When we
MERGEnodes, it creates them only if they do not already exist. Finally, it creates a relationship between the two nodes
This sink configuration is how we’ll turn a stream of records from Kafka into an ever-growing and changing graph. The rest of the configuration handles our connection to a Confluent Cloud instance, where all of our event streaming will be managed for us. If you’re trying this out for yourself, make sure to replace
API_KEYwith the values that Confluent Cloud gives you when you generate an API access key.
Click through for the example.
In continuation with our announcement of SQL Server 2019 release candidate last week, we’re announcing that the release candidate refresh for SQL Server 2019 is now available to download. The release candidate now includes bits for Big Data Clusters in SQL Server 2019 in this refresh.
Back in July, we announced the preview of Big Data Clusters in SQL Server 2019 and since then we’ve seen our customers actively bringing their big data analytical workloads to SQL Server 2019 to operationalize their AI and machine learning projects.
Read on for more.
I always try to impart on people that SQL injection isn’t necessarily about vandalizing or trashing data in some way.
Often it’s about getting data. One great way to figure out how difficult it might be to get that data is to figure out who you’re logged in as.
There’s a somewhat easy way to figure out if you’re logged in as sa.
Wanna see it?
Of course you do.
Well, not exactly, but it’s definitely like that. The default Power Setting is “Balanced” which means during periods of lower activity the clock speeds of your CPUs are reduced to conserve power and save your battery.
Apparently all Windows installations think they are on laptops. SPOILER ALERT: your database servers are probably not laptops.
Jeff has a T-SQL script to fix this. Unfortunately, it won’t fix the other power-based performance killer: power settings in BIOS.
Even though this query only touches two different data sources, it is a good way to analyze the queries sent to the data sources. To track these queries I used the built-in Performance Analyzer of Power BI desktop which can be enabled on the “View”-tab. It gives you detailed information about the performance of the report including the actual SQL queries (under “Direct query”) which were executed on the data sources. The plain text queries can also be copied using the “Copy queries” link at the bottom.
Read on for the queries and for Gerhard’s analysis.
This coincidentally has caused an issue if we are using Windows Task Scheduler to schedule the synchronization process, especially if we use a SAS (Shared Access Signature) token which can be quite long. What then happens is we have a command that is longer than Windows Task Scheduler allows, and the task will fail with a very unhelpful error message:
Task Scheduler failed to execute task "\AzureBlobStorageSync". Attempting to restart. Additional Data: Error Value: 2147942487.
Click through to see how Randolph fixed this problem, which created a new problem necessitating Randolph solve it as well.
I have the impression that
CSelCalcColumnInInterval“fails” if the predicate doesn’t fall within any of the histogram intervals. The estimation logic then chooses to try the
CSelCalcAscendingKeyFiltercalculator (a reference to the “ascending key problem”) if the predicate is specifically higher than the last histogram interval.
Josh includes a couple of demos as well, so check them out.
There are some pretty common mistakes people make (myself included!), most common I have seen recently have been having a semi-colon in JAVA_HOME/SPARK_HOME/HADOOP_HOME or having HADOOP_HOME not point to a directory with a bin folder which contains winutils.
To help, I have written a small powershell script that a) validates that the setup is correct and then b) runs one of the spark examples to prove that everything is setup correctly.
Click through for the script.
The idea is: one may want to eliminate use of the
Pythonlanguage call-stack in the case of a “tail calls” (a function call where the result is not used by the calling function, but instead immediately returned). Tail call elimination can both speed up programs, and cut down on the overhead of maintaining intermediate stack frames and environments that will never be used again.
Click through for John’s riff on the topic.