Dataflow monitoring in LHCb

David Svantesson (Institutionen för teknisk fysik) ; R. Schwemmer ; G. Liu ; N. Neufeld
Journal of Physics: Conference Series. International Conference on Computing in High Energy and Nuclear Physics, CHEP 2010; Taipei; Taiwan; 18 October 2010 through 22 October 2010 (1742-6588). Vol. 331 (2011), PART 2, p. Art. no. 22036.
[Konferensbidrag, refereegranskat]

The LHCb data-flow starts from the collection of event-fragments from more than 300 read-out boards at a rate of 1 MHz. These data are moved through a large switching network consisting of more than 50 routers to an event-filter farm of up to 1500 servers. Accepted events are sent through a dedicated network to storage collection nodes which concatenate accepted events in to files and transfer them to mass-storage. At nominal conditions more than 30 million packets enter and leave the network every second. Precise monitoring of this data-flow down to the single packet counter is essential to trace rare but systematic sources of data-loss. We have developed a comprehensive monitoring framework allowing to verify the data-flow at every level using a variety of standard tools and protocols such as sFlow, SNMP and custom software based on the LHCb Experiment Control System frame-work. This paper starts from an analysis of the data-flow and the involved hardware and software layers. From this analysis it derives the architecture and finally presents the implementation of this monitoring system.

