Processamento de Dados Massivos/Projeto e implementação de aplicações Big Data/Processamento de streams de tweets: diferenças entre revisões

[edição não verificada][edição não verificada]
Conteúdo apagado Conteúdo adicionado
Linha 112:
 
The following bolts, projects the full JSON tweet to comunicate only the text of the message to the next bolt. This following bolt will process those texts.
 
[[File:Image-med6.png|centro|Proposed Tweet Processing Topology]]
 
==Implementation==
Linha 161 ⟶ 163:
 
====Python Bolts====
We have python libraries that implements Storm's protocol. It is quite easy to use such libraries. I am showing the code used of the language cleaner bolt. In that code, tup is the incoming tuple. and emit sends the outgoing tuple.
 
<syntaxhighlight lang="python">
import storm
class StreamFilterBolt(storm.BasicBolt):
def process(self, tup):
try:
text = cleanText(tup.values[0])
except:
text = ''
storm.emit([text])
 
StreamFilterBolt().run()
</syntaxhighlight>
 
==Avaliation==
Considering the requirements of this project which focus scalability and fault tolerance over latency. I evaluated the Storm capabilities of distributing process and making a massive stream scalable to be processed. The main results of this project is to build the foundation to many bolts and spouts designed specially to the Web Observatory project. It also helps the software development processes because each bolt is a black box that may be implemented by many different people in different languages.
 
 
 
===Load on Bolts===
For simple Twitter jobs, Storm managed to distribute jobs quite good