digitalpebble.com

Home - DigitalPebble Ltd

DigitalPebble Ltd is a consultancy specialised in web crawling, natural language processing, search and machine learning. Our expertise is based on open source solutions, such as Apache Nutch, StormCrawler, GATE, ElasticSearch or SOLR.

http://www.digitalpebble.com/

OVERVIEW OF digitalpebble.com

TRAFFIC RANK

>1,000,000

REVIEWS

0

PAGES IN THIS WEBSITE

3

LINKS TO THIS WEBSITE

CONTACTS

ADDRESSES

SOCIAL LINKS

ONLINE SINCE

WEBSITE DETAILS

SEO

PAGES

SIMILAR SITES

TRAFFIC RANK FOR DIGITALPEBBLE.COM

TODAY'S RATING

>1,000,000

TRAFFIC RANK - AVERAGE PER MONTH

BEST MONTH

December

AVERAGE PER DAY Of THE WEEK

HIGHEST TRAFFIC ON

Saturday

TRAFFIC BY CITY

Sign up

CUSTOMER REVIEWS

Average Rating: 4.0 out of 5 with 16 reviews

5 star

7

4 star

4

3 star

4

2 star

0

1 star

1

Hey there! Start your review of digitalpebble.com

AVERAGE USER RATING

Write a Review

WEBSITE PREVIEW

LOAD TIME

0.6 seconds

CONTACTS AT DIGITALPEBBLE.COM

DIGITALPEBBLE LTD

JULIEN NIOCHE

16 COD●●●●●●N ROAD

BR●●OL , BS7 8ET

GB

44.7●●●●5585

JU●●●●@DIGITALPEBBLE.COM

View this contact

DIGITALPEBBLE LTD

JULIEN NIOCHE

16 COD●●●●●●N ROAD

BR●●OL , BS7 8ET

GB

44.7●●●●5585

JU●●●●@DIGITALPEBBLE.COM

View this contact

DIGITALPEBBLE LTD

JULIEN NIOCHE

16 COD●●●●●●N ROAD

BR●●OL , BS7 8ET

GB

44.7●●●●5585

JU●●●●@DIGITALPEBBLE.COM

View this contact

ADD CONTACT

Login

TO VIEW CONTACTS

Remove Contacts

FOR PRIVACY ISSUES

DOMAIN REGISTRATION INFORMATION

REGISTERED: 2005 May 02
UPDATED: 2014 April 10
EXPIRATION: EXPIRED REGISTER THIS DOMAIN

BUY YOUR DOMAIN

DOMAIN AGE

20

YEARS
0

MONTHS
6

DAYS

NAME SERVERS

1: ns1.eapps.com
2: ns2.eapps.com
3: ns5.eapps.com
4: ns6.eapps.com

REGISTRAR

ENOM, INC.

WHOIS : whois.enom.com

REFERRED : http://www.enom.com

CONTENT

PAGES IN
THIS WEBSITE

3

SSL

EXTERNAL LINKS

29

SITE IP

192.30.252.154

LOAD TIME

0.559 sec

SCORE

6.2

PAGE TITLE

Home - DigitalPebble Ltd | digitalpebble.com Reviews

<META> DESCRIPTION

DigitalPebble Ltd is a consultancy specialised in web crawling, natural language processing, search and machine learning. Our expertise is based on open source solutions, such as Apache Nutch, StormCrawler, GATE, ElasticSearch or SOLR.

<META> KEYWORDS

1 web crawl

2 consultant

3 consultancy

4 consulting

5 information extraction

6 information retrieval

7 search

8 NLP

9 IE

10 nutch

CONTENT

Page content here

KEYWORDS ON PAGE

services,clients,digitalpebble ltd,gate,or apache solr,or apache storm,our clients

SERVER

GitHub.com

CONTENT-TYPE

utf-8

GOOGLE PREVIEW

Home - DigitalPebble Ltd | digitalpebble.com Reviews

https://digitalpebble.com

DigitalPebble Ltd is a consultancy specialised in web crawling, natural language processing, search and machine learning. Our expertise is based on open source solutions, such as Apache Nutch, StormCrawler, GATE, ElasticSearch or SOLR.

INTERNAL PAGES

digitalpebble.com

1

Open Source Solutions for Text Engineering

http://digitalpebble.com/solutions.html

Our expertise is pretty unique in that it covers a wide range of activities related to document engineering : web crawling. We provide consulting services and custom development using leading open source projects such as Apache Nutch. DigitalPebble's director, Julien Nioche. Is a member of the Apache Software Foundation. And a long standing committer. Julien is a contributor and committer on several other open source projects as well as a conference speaker. We'd be happy to help!

2

Open Source Solutions for Text Engineering

http://digitalpebble.com/references.html

Design and implementation of a WARC. Strategy review and design for low latency scalable fetching of web pages using Storm-Crawler. Low latency web scraper for job adverts based on Storm-Crawler. Design and implementation of crawlers for shopping sites based on Storm-Crawler. Customisation and recommendations on best practices for Apache Nutch. For low latency scalable fetching of web pages. Customization of Apache Nutch. For crawling images using Amazon Web Services. Customization of Apache Nutch. Imple...

3

Open Source Solutions for Text Engineering

http://digitalpebble.com/index.html

Is a consultancy and solution provider specialising in web crawling, natural language processing, document retrieval and information extraction. We advise, evaluate and implement solutions based on leading open source software. Such as Apache Nutch. We aim to combine open source tools to provide efficient, reliable and low cost made-to-order solutions. Not only to we have an extensive knowledge of open source software, we are also active contributors and provide some of the resources.

UPGRADE TO PREMIUM TO VIEW 0 MORE

TOTAL PAGES IN THIS WEBSITE

3

LINKS TO THIS WEBSITE

digitalpebble.blogspot.com

DigitalPebble's Blog: NUTCH FIGHT! 1.7 vs 2.2.1

http://digitalpebble.blogspot.com/2013/09/nutch-fight-17-vs-221.html

Monday, 16 September 2013. 17 vs 2.2.1. We've had releases in the Nutch 2.x branch for over a year now. As I described in a. The main difference with the 1.x branch is the use of Apache Gora as a storage abstraction layer, which allows to use various flavours of NoSQL databases such as HBase, Cassandra or Accumulo as backends. We have measured the performance of Nutch 1.7 against 2.2.1 (HBase and Cassandra) using 3 million URLs from the CommonCrawl. Project. These URLs were. It is important to note that ...

digitalpebble.blogspot.com

DigitalPebble's Blog: What's new in Storm-Crawler 0.5

http://digitalpebble.blogspot.com/2015/06/whats-new-in-storm-crawler-05.html

Friday, 5 June 2015. What's new in Storm-Crawler 0.5. We've just released the version 0.5 of Storm-Crawler. Just over three months after the previous one. As you can read below, we've been pretty busy! The project got some great contributions from new users and is seeing an increase in adoption, which is very encouraging. One of the main improvements provided in the new release is the introduction of a Metadata object. Which replaces the Map String,String[]. This is now the one we use by default, the one...

mevivs.wordpress.com

Hector-Kundera | Vivek Mishra's Blog

https://mevivs.wordpress.com/2011/02/12/hector-kundera

Vivek Mishra's Blog. JPA Compliant& Annotation Based:. Makes it an entity class. Assign ColumnFamily type and name. The email address. */. The country. */. The registered. */. The name. */. Instantiates a new author. Author() { / must have a default constructor. Defines column family and keyspace of given entity. Configuration conf = new Configuration();. ConfgetEntityManager( unit-name );. Persistence and Search Using EntityManager :. String key = System. Key, "a@a.org", "India", new. AObj, aObj db);.

github.com

GitHub - DigitalPebble/storm-crawler: Web crawler SDK based on Apache Storm

https://github.com/DigitalPebble/storm-crawler

Web crawler SDK based on Apache Storm. Use Git or checkout with SVN using the web URL. Aug 23, 2016. Flush BulkProcessor before closing connection. Failed to load latest commit information. Jul 19, 2016. AbstractHttpProtocol : added utility class to get agent string from conf. Jul 21, 2016. Aug 23, 2016. Ignore OSX system files. Jan 29, 2016. Update .travis.yml. Jul 1, 2016. Added LICENSE and NOTICE; fixed license headers in files. Sep 5, 2014. May 25, 2016. Uped version of archetype in readme. This will...

github.com

DigitalPebble Ltd · GitHub

https://github.com/DigitalPebble

Http:/ www.digitalpebble.com. X67;ithub@digitalpebble.com. Web crawler SDK based on Apache Storm. Aug 23, 2016. WARC resources for StormCrawler. Jul 22, 2016. Behemoth is an open source platform for large scale document analysis based on Apache Hadoop. Apr 26, 2016. Azazello is an open source platform for large scale document analysis based on Apache Spark. Apr 20, 2016. Mirror of Apache Storm. Mar 16, 2016. A set of reusable Java components that implement functionality common to any web crawler. GATE Pr...

digitalpebble.blogspot.com

DigitalPebble's Blog: What's new in Storm-Crawler 0.4

http://digitalpebble.blogspot.com/2015/01/whats-new-in-storm-crawler-04.html

Wednesday, 28 January 2015. What's new in Storm-Crawler 0.4. We've recently released the version 0.4 of. Which is a collection of resources for building low-latency, large scale web crawlers with. The project has been really active in the last few months, thanks partly to our 2 fantastic new committers (Jake Dodd and Gui Forget) and as a result contains some important changes and improvements. Reorganisation of the code. That can be used to index documents with ElasticSearch. Stream, which is meant to be...

digitalpebble.blogspot.com

DigitalPebble's Blog: DigitalPebble is hiring!

http://digitalpebble.blogspot.com/2013/06/digitalpebble-is-hiring.html

Wednesday, 5 June 2013. We are looking for a candidate with the following skills and expertise :. Experience in web crawling, ideally with Apache Nutch. Storm, Hadoop and related technologies. Interest in text processing, NLP and ML. Good social and presentation skills. Good spoken and written English, knowledge of other languages would be a plus. Taste for challenges and problem solving. More details on our activities can be found on our website. The position is in Bristol, UK. Posted by Julien Nioche.

digitalpebble.blogspot.com

DigitalPebble's Blog: Nutch training course

http://digitalpebble.blogspot.com/2013/07/nutch-training-course.html

Monday, 29 July 2013. We are planning to run a 2-day training courses on Apache Nutch. On the 24/25 October 2013. It will take place in Bristol, UK (the exact venue will be announced later). The course has been put on hold for now. Please do get in touch if you are interested and I will keep you updated as soon as we reach a sufficient number of attendees. Note that the demonstrations and exercises will be based on a Linux OS. The program given here is an indication only and might change slightly. The pr...

digitalpebble.blogspot.com

DigitalPebble's Blog: January 2015

http://digitalpebble.blogspot.com/2015_01_01_archive.html

Wednesday, 28 January 2015. What's new in Storm-Crawler 0.4. We've recently released the version 0.4 of. Which is a collection of resources for building low-latency, large scale web crawlers with. The project has been really active in the last few months, thanks partly to our 2 fantastic new committers (Jake Dodd and Gui Forget) and as a result contains some important changes and improvements. Reorganisation of the code. That can be used to index documents with ElasticSearch. Stream, which is meant to be...

UPGRADE TO PREMIUM TO VIEW 20 MORE

TOTAL LINKS TO THIS WEBSITE

29

OTHER SITES

digitalpeasant.blogspot.com

The Digital Peasant

For (Almost) All you gaming needs, I will help you as much as I can. Please take the time to visit: www.alteriw.net www.alterops.net www.dumboratsuk.blogspot.com. Friday, 29 April 2011. Minecraft Auto Updater - Works for offline servers. Http:/ www.mediafire.com/? Then you can connect to OFFLINE. Servers. The www.ds9clan.co.uk. Offline server IP is:. Wednesday, 27 April 2011. Download link: http:/ www.multiupload.com/7GUY56WXKU. Monday, 25 April 2011. Wow, That long? I think I should have light red.

digitalpeasant.org

Bird Pix

digitalpeasants.com

digital peasants unite

Of, relating to, or resembling a digit, especially a finger. Operated or done with the fingers: a digital switch. Expressed in numerical form, especially for use by a computer. Of or relating to a device that can read, write, or store information that is represented in numerical form. See Usage Note at virtual. Using or giving a reading in digits: a digital clock. A country person; a rustic. An uncouth, crude, or ill-bred person; a boor. From Old French paisant. Country, from Late Latin pāgēnsis. Such da...

digitalpebble.blogspot.com

DigitalPebble's Blog

Friday, 23 March 2018. Grafana StormCrawler metrics v4. The Grafana dashboard for StormCrawler. Is a good starting point for monitoring the behaviour of your StormCrawler. Topology. This is typically used with Elasticsearch as a storage backend for the metrics generated by Storm but should work with any other Storm-compatible backend like Grafite or CloudWatch. To add SOLR as a datasource but to my knowledge, this is not yet available). The latest version (4) brings the following changes. In the graph ab...

digitalpebble.com

Home - DigitalPebble Ltd

Is a consultancy and solution provider specialising in web crawling, natural language processing, document retrieval and information extraction. We advise, evaluate and implement solutions based on leading open source software. Such as Apache Nutch. We aim to combine open source tools to provide efficient, reliable and low cost made-to-order solutions. Not only to we have an extensive knowledge of open source software, we are also active contributors and provide some of the resources.

digitalpebbles.wordpress.com

digitalpebbles | The Best Tips On The Web For Print & Design Professionals

The Best Tips On The Web For Print and Design Professionals. Stay updated via RSS. Error: Twitter did not respond. Please wait a few minutes and refresh this page. Follow DigitalPebbles' Blog via Email. Enter your email address to follow this blog and receive notifications of new posts by email. Your Guide to the Social Media Jungle. Instagram Adds Hashtag and Profile Links in Bio. March 24, 2018. How to Get Started With Messenger Bots. March 23, 2018. The Big Event: The Journey, Episode 22. 912Graphics’...

digitalpec.com

www.digitalpec.com

digitalpecos.com

www.digitalpecos.com

digitalpecs.com

Speaks4me(tm) - Speaking through pictures(tm)

Sorry, you don"t appear to have frame support. Go here instead - Speaks4me(tm) - Speaking through pictures(tm).

digitalpecsbook.com

Speaks4me(tm) - Speaking through pictures(tm)

Sorry, you don"t appear to have frame support. Go here instead - Speaks4me(tm) - Speaking through pictures(tm).