bayesmix/collectors

Collectors

class BaseCollector

Abstract base class for a collector that contains a chain in Protobuf form.

This is an abstract base class for a structure called data collector, or collector for short. A collector is used to store a sequence of Google Protobuf’s objects, also known as messages, by serializing them. Data can be retrieved and de-serialized into a Protobuf object, either in C++ or in other programming languages. In particular, within this library, collectors are used to save the states of the Markov chain generated by a Gibbs sampling algorithm at each iteration. This includes allocations and unique values vectors, as well as other relevant or convenient values: the clusters’ cardinality, the mixing state, and the iteration number. The skeleton corresponding to this AlgorithmState message is described in the proto/algorithm_state.proto file. A collector is needed since it allows communication of the stored information among different scripts, which otherwise would be impossible. Also, one may want to save the whole Markov chain in order to perform subsequent, separate analysis on it, so having access to detailed information about the MCMC run may prove extremely useful. This class spawns two inherited classes: the FileCollector, which stores states in the computer memory, and the MemoryCollector, which writes states to a binary file. Please refer to their respective files for more information about them.

Subclassed by FileCollector, MemoryCollector

Public Functions

virtual void start_collecting() = 0

Initializes collector.

virtual void finish_collecting() = 0

Closes collector.

inline bool get_next_state(google::protobuf::Message *const out)

Reads the next state and deserializes it into the pointer out.

virtual void collect(const google::protobuf::Message &state) = 0

Writes the given state to the collector.

virtual void reset() = 0

Resets the collector to the beginning of the chain.

inline unsigned int get_size() const

Returns the number of stored states.

class FileCollector : public BaseCollector

Class for a collector that writes (and reads) its content to a file.

An instance of FileCollector saves a sequence of Protobuf objects to a file and is able to read them back, returning them one by one. When writing to the file, the objects are simply serialized into bytes. When reading, for efficiency’s sake, we instead read a chunk of ‘chunk_size’ objects and deserialized them into a buffer, asynchronously. When the buffer has been read, we erase it and fill it again with the next chunk of objects.

Public Functions

virtual void start_collecting() override

Initializes collector.

virtual void finish_collecting() override

Closes collector.

virtual void collect(const google::protobuf::Message &state) override

Writes the given state to the collector.

virtual void reset() override

Resets the collector to the beginning of the chain.

class MemoryCollector : public BaseCollector

Class for a collector that writes its content into memory.

An instance of MemoryCollector saves a sequence of Protobuf objects by storing byte-serialized objects into a deque of strings. When reading, the objects are simply deserialized from the deque.

Public Functions

inline virtual void start_collecting() override

Initializes collector.

inline virtual void finish_collecting() override

Closes collector.

virtual void collect(const google::protobuf::Message &state) override

Writes the given state to the collector.

virtual void reset() override

Resets the collector to the beginning of the chain.

void get_state(const unsigned int i, google::protobuf::Message *out)

Writes the i-th state in the collector to the given message pointer.

template<typename MsgType>
inline void write_to_file(const std::string &outfile)

Templatized utility for writing states directly to file.

template<typename MsgType>
inline void read_from_file(const std::string &infile)

Templatized utility for reading states directly from a file.