jisc.ac.uk
Value and benefits of text mining | Jisc
https://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
We use cookies to give you the best experience and to help improve our website. Find out more about how we use cookies. Thanks for letting me know. Skip to main content. Learning and research resources. We’re supporting institutions with the growing need for robust research data management. Janet is our world-class, high-speed network dedicated to the UK research and education community. For further education, higher education and research will help us to proritise our R&D. 2 Text mining: UKFHE and beyond.
digitalpebble.blogspot.com
DigitalPebble's Blog: NUTCH FIGHT! 1.7 vs 2.2.1
http://digitalpebble.blogspot.com/2013/09/nutch-fight-17-vs-221.html
Monday, 16 September 2013. 17 vs 2.2.1. We've had releases in the Nutch 2.x branch for over a year now. As I described in a. The main difference with the 1.x branch is the use of Apache Gora as a storage abstraction layer, which allows to use various flavours of NoSQL databases such as HBase, Cassandra or Accumulo as backends. We have measured the performance of Nutch 1.7 against 2.2.1 (HBase and Cassandra) using 3 million URLs from the CommonCrawl. Project. These URLs were. It is important to note that ...
michaelnielsen.org
How to crawl a quarter billion webpages in 40 hours | DDI
http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours
Michael’s main blog. How to crawl a quarter billion webpages in 40 hours. By Michael Nielsen on August 10, 2012. More precisely, I crawled 250,113,669 pages for just under 580 dollars in 39 hours and 25 minutes, using 20 Amazon EC2 machine instances. What does it mean to crawl a non-trivial fraction of the web? By Googler Jeff Dean. As of November 2010 Google was indexing “tens of billions of pages”. (Note that the number of urls. Claimed to index 120 billion pages. Here’s the basic architecture:. And th...
internetoffline.org
Links to collections | Internet Offline
https://internetoffline.org/groups/content/links-collections
Skip to main content. Created Sat, 04/13/2013 - 22:00. Group by Jason Skomorowski. For now a rough list of things to get us started, with guidelines for mirroring where available. We'll work together in the website. Group to come up with a good structure before this becomes too unwieldy. 81Tb scraped from all over the web, might well be enough to be useful as a reference and not just a research tool. About the Data Set. Global Village Construction Set. Plans for building useful machines.
bit-player.org
600613 | bit-player
http://bit-player.org/2014/600613
An amateur’s outlook on computation and mathematics. Traffic Jams in Javascript. Sunshine In = Earthshine Out. Dancing with the Spheres. Statistical mechanics of magnet balls. World3, the public beta. Bertrand Russell, Donald Trump, and Archimedes. The 39th Root of 92. Getting to Know Julia. Where’s My Petabyte Disk Drive? The Bug That Ate Thursday. Carnival of Mathematics #130. Ten years of bit-playing. Deep Dreaming with Every Card I Write. Geotargeted by the NY Times. The Carnival Is Coming to Town.
smerity.com
Smerity.com: About Me
http://smerity.com/abme.html
Laquo; Smerity.com. My name is Stephen Merity, though I'm most commonly referred to as Smerity. I'm a senior research scientist working on deep learning in San Francisco with MetaMind. I've been lucky enough to work with fascinating people and groups over the years including Google Sydney, Freelancer.com, the Schwa Lab. At the University of Sydney, the team at Grok Learning. The non-profit Common Crawl. And IACS @ Harvard. You can read my full history in my resume. Stephen Merity and James R. Curran.
uni-weimar.de
Bauhaus-Universität Weimar: Home
https://www.uni-weimar.de/en/media/institutes/digital-bauhaus-lab/home
Close Menu ►. Visualization and Visual Analytics. Prof Dr. Bernd Fröhlich. Prof Dr. Eva Hornecker. Prof Dr. Carsten Könke. Prof Dr. Stefan Lucks. Prof Dr. Volker Rodehorst. Prof Dr. Benno Stein. Prof Dr. Charles Wüthrich. Visualization and Visual Analytics. Prof Dr. Bernd Fröhlich. Prof Dr. Eva Hornecker. Prof Dr. Carsten Könke. Prof Dr. Stefan Lucks. Prof Dr. Volker Rodehorst. Prof Dr. Benno Stein. Prof Dr. Charles Wüthrich. Analyze Big Data on the Betaweb cluster. 18 billion web pages of the CommonCrawl.
fightswithbytes.com
php | Fights With Bytes
http://www.fightswithbytes.com/category/php
Sifting for nuggets of info in data ocean. Sample wordcount streaming job using PHP on Commoncrawl dataset. April 5, 2013. The easiest way to start working on Commoncrawl dataset is probably using Amazon’s own hadoop framework called Elastic Mapreduce. For it to use you need to sign in to amazonaws.com services, and be aware that EMR is not free. The mapper/reducer scripts plus output files have to be stored on your own Amazon S3. Count) { / tab-delimited echo "$word t$count n"; }? Go to Elastic Map Redu...
fightswithbytes.com
commoncrawl | Fights With Bytes
http://www.fightswithbytes.com/tag/commoncrawl
Sifting for nuggets of info in data ocean. Sample wordcount streaming job using PHP on Commoncrawl dataset. April 5, 2013. The easiest way to start working on Commoncrawl dataset is probably using Amazon’s own hadoop framework called Elastic Mapreduce. For it to use you need to sign in to amazonaws.com services, and be aware that EMR is not free. The mapper/reducer scripts plus output files have to be stored on your own Amazon S3. Count) { / tab-delimited echo "$word t$count n"; }? Go to Elastic Map Redu...
SOCIAL ENGAGEMENT