what could be nodesearch ?

A de-centralized search engine based on a kind of peer-to-peer and clustering architecture.

un constat antagoniste

Search is a key point of Internet usage, nowdays, Internet search is owned by a few private companies. This goes against the essence of Internet : an open network based on free exchange of data. Yet, the main use of this astounding amount of data, it's classification, organisation, usability is search, and this —free— data usage is owned by a few companies.

an open architecture paradigm

Internet, as it was invented and as it expanded is an open architecture everyone can be hosting a part of it (that's what happens in the peer-to-peer networks). Moreover, most of our ground lines are connected h24 on Internet, why not using our *boxes1) why can't we host in a de-centralized way our own search engine ?

what is websearch ?

Search engines are evaluated on 3 main factors :

  • relevancy : the accuracy of the search engine to find what the user expects
  • velocity : the ability to search in a large amount of data in a very short time
  • depth : the number of indexed pages and the ability to answer very specific queries

Risks

Regarding relevancy

This search engine, even if it could have 1000 times more computation power than any company could afford, should base it's relevancy on some strong algorithm, which is the most critical point… but PageRank patterns won't last forever ;-)

Crawlers aren't a problem, because of a de-centralized architecture, the indexing power is not an issue.

De-centralized system is weak, contributors could hack it to cheat on the results.

Regarding velocity

According to the Pareto rule, most of the common searched terms should be available faster than from the nearest Search engine's datacenter because the results should be cached on the user's local network or on it's neighbor's one… But what happens to the long tail ? even with the recent development of many lightweight server solutions such as APE, nodejs, etc. how could the architecture resist under stress ?

Regarding community

This project can't be efficient on a small scale, it has to be sustained by a lot of contributors (hosters) to begin giving results, this means that it has to be promoted by a well established opensource community (mozilla, wikipedia …)

Should this search engine display very few ads to sustain itself ? this would go against it's initial vocation of improving the openweb.

Regarding privacy

More and more, search algorithms are using user's feedback to improve themselves. Even more, they catch their usages to give more personalized answers. Should those private data flaw into the wild of a peer to peer network ? is it safer being distributed on volunteer users than aggregate into a don't be evil capitalistic big company ?

a few ressources

is this for real ?

No. This is more like a thought or a concept than a project, I am not that ambitious. The idea worth thinking about it, no ? If anyone reading this have started some advendture going this way, I'd gradly help.

1) in France : each provider has his box : freebox, neufbox, livebox, bbox, cbox
projects/nodesearch/index.txt · Last modified: 2011/06/18 01:41 by gaspard
 
 
Creative Commons License Powered by PHP Valid XHTML 1.0 April