Building Graph Tables

Tomaz Kastrun uses a set of e-mails as his SQL Server 2017 graph table data source:

To put the graph database to the test, I took bunch of emails from a particular MVP SQL Server distribution list (content will not be shown and all the names will be anonymized). On my gmail account, I have downloaded some 90MiB of emails in mbox file format. With some python scripting,  only FROM and SUBJECTS were extracted:

writer.writerow(['from','subject'])
for index, message in enumerate(mailbox.mbox(infile)): content = get_content(message) row = [ message['from'].strip('>').split('<')[-1], decode_header(message['subject'])[0][0],"|" ] writer.writerow(row)

This post walks you through loading data, mostly.  But at the end, you can see how easy it is to find who replied to whose e-mails.

Related Posts

P-Hacking and Multiple Comparison Bias

Patrick David has a great article on hypothesis testing, p-hacking, and multiple comparison bias: The most important part of hypothesis testing is being clear what question we are trying to answer. In our case we are asking:“Could the most extreme value happen by chance?”The most extreme value we define as the greatest absolute AMVR deviation from […]

Read More

An Explanation Of Convolutional Neural Networks

Shirin Glander explains some of the mechanics behind Convolutional Neural Networks: Convolutional Neural Nets are usually abbreviated either CNNs or ConvNets. They are a specific type of neural network that has very particular differences compared to MLPs. Basically, you can think of CNNs as working similarly to the receptive fields of photoreceptors in the human eye. Receptive fields in our […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930