|
Is there a solution for a distributed grep? here's the story: I have a bunch of web servers and want to query their application logs (I'm using tomcat, if it matters). I don't want to have to copy the files to a common storage, they are too big, network is espensive and storage is too expensive so I want to keep them on the same web servers. So even haddop+hive or similar solutions won't fly.
Thanks! |
|
You need to apply the master-worker pattern, sending a "grep" task to a master which will in turn forward the task to remote workers, each one running in a different web server machine whose logs you want to grep. Each worker will grep its local log and then send back the results so that they can be assembled. The Terracotta Master Worker framework works exactly this way: http://forge.terracotta.org/releases/projects/tim-messaging/docs/mw-guide.html Also, you could take a look at GridGain, which implements a map-reduce like framework for distributed computations: http://www.gridgain.com |
|
There exists tools which allow you to run the same command on multiple hosts over ssh and collect the results. One such tool is PSSH (parallel-ssh). Running the same grep on multiple machines (listed in
|
|
http://www.gnu.org/software/parallel/man.html#example__parallel_grep |