Runtime Mechanism

The overall structure of spout/bolt in a topology won’t be discussed in detail here. The focus is on the potential differences between Storm/JStorm topology runtime and traditional Java programs. In fact, there are very few differences, mainly体现在 initialization. The purpose of this article is to help developers troubleshoot potential topology program issues without needing to understand Storm’s internal principles.

A topology contains multiple spout threads and bolt threads, distributed across several workers (processes). A single worker may run multiple threads of several bolts/spouts simultaneously.

Differences from Regular Java Programs

main Method

The main method only runs on the nimbus during startup. Therefore, besides Storm’s own configuration items, other program-related configurations such as Spring configurations configured in the main method will not take effect.

bolt

The main structure of a bolt consists of three parts: prepare, execute, and cleanup.

Among them, prepare executes once during initialization, cleanup executes once before exit, and execute runs for each message.

Some configurations, including encryption and Spring loading, are recommended to be placed in the prepare method. When multiple bolts need to load Spring, it is recommended to use the same configuration to avoid some tricky issues.

Serialization

All variables initialized in static code blocks and variables that are emitted need to be serializable due to network transmission.

Storm defaults to using Kryo serialization, which requires classes to have a no-argument constructor. If adding a no-argument constructor is not possible, set topology.fall.back.on.java.serialization: true to use Java’s built-in serialization.