====== what could be nodesearch ?====== A de-centralized search engine based on a kind of peer-to-peer and clustering architecture. ===== un constat antagoniste ===== Search is a key point of Internet usage, nowdays, Internet search is owned by a few private companies. This goes against the essence of Internet : an **open** network based on free exchange of data. Yet, the main use of this astounding amount of data, it's classification, organisation, usability is search, and this —free— data usage is owned by a few companies. ===== an open architecture paradigm ===== Internet, as it was invented and as it expanded is an open architecture everyone can be hosting a part of it (that's what happens in the peer-to-peer networks). Moreover, most of our ground lines are connected h24 on Internet, why not using our *boxes((in France : each provider has his box : [[g>freebox]], [[g>neufbox]], [[g>livebox]], [[g>bbox]], [[g>cbox]]... )) why can't we host in a de-centralized way our own search engine ? ===== what is websearch ?===== Search engines are evaluated on 3 main factors : * relevancy : the accuracy of the search engine to find what the user expects * velocity : the ability to search in a large amount of data in a very short time * depth : the number of indexed pages and the ability to answer very specific queries ===== Risks ===== ==== Regarding relevancy ==== This search engine, even if it could have 1000 times more computation power than any company could afford, should base it's relevancy on some strong algorithm, which is the most critical point... but PageRank patterns won't last forever ;-) Crawlers aren't a problem, because of a de-centralized architecture, the indexing power is not an issue. De-centralized system is weak, contributors could hack it to cheat on the results. ==== Regarding velocity ==== According to the Pareto rule, most of the common searched terms should be available faster than from the nearest Search engine's datacenter because the results should be cached on the user's local network or on it's neighbor's one... But what happens to the long tail ? even with the recent development of many lightweight server solutions such as APE, nodejs, etc. how could the architecture resist under stress ? ==== Regarding community ==== This project can't be efficient on a small scale, it has to be sustained by a lot of contributors (hosters) to begin giving results, this means that it has to be promoted by a well established opensource community (mozilla, wikipedia ...) Should this search engine display very few ads to sustain itself ? this would go against it's initial vocation of improving the openweb. ==== Regarding privacy ==== More and more, search algorithms are using user's feedback to improve themselves. Even more, they catch their usages to give more personalized answers. Should those private data flaw into the wild of a peer to peer network ? is it safer being distributed on volunteer users than aggregate into a //don't be evil// capitalistic big company ? ===== a few ressources ===== * Emmanuel Benazera's research : [[http://2009.rmll.info/An-Open-Source-Architecture-for.html|An Open Architecture for search]] * Tim berner's lee expressed it's willing to build [[http://lists.w3.org/Archives/Public/www-tag/2010Dec/0006.html|some kind of P2P HTTP protocol]] * Update : looks like the [[http://www.seeks-project.info/site/|Seeks Project]] is sth like that ===== is this for real ? ===== No. This is more like a thought or a concept than a project, I am not that ambitious. The idea worth thinking about it, no ? If anyone reading this have started some advendture going this way, I'd gradly help.