![]() |
Kea
1.5.0
|
Holds communication state between the two HA peers. More...
#include <communication_state.h>
Inheritance diagram for isc::ha::CommunicationState:Public Member Functions | |
| CommunicationState (const asiolink::IOServicePtr &io_service, const HAConfigPtr &config) | |
| Constructor. More... | |
| virtual | ~CommunicationState () |
| Destructor. More... | |
| virtual void | analyzeMessage (const boost::shared_ptr< dhcp::Pkt > &message)=0 |
| Checks if the DHCP message appears to be unanswered. More... | |
| bool | clockSkewShouldTerminate () const |
| Indicates whether the HA service should enter "terminated" state as a result of the clock skew exceeding maximum value. More... | |
| bool | clockSkewShouldWarn () |
| Indicates whether the HA service should issue a warning about high clock skew between the active servers. More... | |
| virtual bool | failureDetected () const =0 |
| Checks if the partner failure has been detected based on the DHCP traffic analysis. More... | |
| int64_t | getDurationInMillisecs () const |
| Returns duration between the poke time and current time. More... | |
| int | getPartnerState () const |
| Returns last known state of the partner. More... | |
| bool | isCommunicationInterrupted () const |
| Checks if communication with the partner is interrupted. More... | |
| bool | isHeartbeatRunning () const |
| Checks if recurring heartbeat is running. More... | |
| std::string | logFormatClockSkew () const |
| Returns current clock skew value in the logger friendly format. More... | |
| void | poke () |
| Pokes the communication state. More... | |
| void | setPartnerState (const std::string &state) |
| Sets partner state. More... | |
| void | setPartnerTime (const std::string &time_text) |
| Provide partner's notion of time so the new clock skew can be calculated. More... | |
| void | startHeartbeat (const long interval, const boost::function< void()> &heartbeat_impl) |
| Starts recurring heartbeat (public interface). More... | |
| void | stopHeartbeat () |
| Stops recurring heartbeat. More... | |
Protected Member Functions | |
| virtual void | clearUnackedClients ()=0 |
| Removes information about clients which the partner server failed to respond to. More... | |
| bool | isClockSkewGreater (const long seconds) const |
| Checks if the clock skew is greater than the specified number of seconds. More... | |
| void | startHeartbeatInternal (const long interval=0, const boost::function< void()> &heartbeat_impl=0) |
| Starts recurring heartbeat. More... | |
Protected Attributes | |
| boost::posix_time::time_duration | clock_skew_ |
| Clock skew between the active servers. More... | |
| HAConfigPtr | config_ |
| High availability configuration. More... | |
| boost::function< void()> | heartbeat_impl_ |
| Pointer to the function providing heartbeat implementation. More... | |
| long | interval_ |
| Interval specified for the heartbeat. More... | |
| asiolink::IOServicePtr | io_service_ |
| Pointer to the common IO service instance. More... | |
| boost::posix_time::ptime | last_clock_skew_warn_ |
| Holds a time when last warning about too high clock skew was issued. More... | |
| int | partner_state_ |
| Last known state of the partner server. More... | |
| boost::posix_time::ptime | poke_time_ |
| Last poke time. More... | |
| asiolink::IntervalTimerPtr | timer_ |
| Interval timer triggering heartbeat commands. More... | |
Holds communication state between the two HA peers.
The HA service constantly monitors the state of the connection between the two peers. If the connection is lost it is an indicator that the partner server may be down and failover actions should be triggered.
Any command successfully sent over the control channel is an indicator that the connection is healthy. The most common command sent over the control channel is a lease update. If the DHCP traffic is heavy, the number of generated lease updates is sufficient to determine whether the connection is healthy or not. There is no need to send heartbeat commands in this case. However, if the DHCP traffic is low there is a need to send heartbeat commands to the partner at the specified rate to keep up-to-date information about the state of the connection.
This class uses an interval timer to run heartbeat commands over the control channel. The implementation of the heartbeat is external to this class and is provided via CommunicationState::startHeartbeat method. This implementation is required to run the poke method in case of receiving a successful response to the heartbeat command. It must also run poke when the lease update is successful.
The poke method sets the "last poke time" to current time, thus indicating that the connection is healty. The getDurationInMillisecs method is used to check for how long the server hasn't been able to communicate with the partner. This duration is simply a time elapsed since last successful poke time. If this duration becomes greater than the configured threshold, the server assumes that the communication with the partner is interrupted.
The derivations of this class provide DHCPv4 and DHCPv6 specific mechanisms for detecting server failures based on the analysis of the received DHCP messages, i.e. how long the clients have been trying to communicate with the partner and message types they sent. In particular, the increased number of Rebind messages may indicate issues with the DHCP server.
This class is also used to monitor the clock skew between the active servers. Maintaining a reasonably low clock skew is essential for the HA service to function properly. This class calculates the clock skew by comparing local time of the server with the time returned by the partner in response to a heartbeat command. If this value exceeds the certain thresholds, the CommunicationState::clockSkewShouldWarn and the CommuicationState::clockSkewShouldTerminate indicate whether the HA service should continue to operate normally, should start issuing a warning about high clock skew or simply enter the "terminated" state refusing to further operate until the clocks are synchronized. This requires administrative intervention and the restart of the HA service.
Definition at line 74 of file communication_state.h.
| isc::ha::CommunicationState::CommunicationState | ( | const asiolink::IOServicePtr & | io_service, |
| const HAConfigPtr & | config | ||
| ) |
Constructor.
| io_service | pointer to the common IO service instance. |
| config | pointer to the HA configuration. |
Definition at line 44 of file communication_state.cc.
|
virtual |
Destructor.
Stops scheduled heartbeat.
Definition at line 52 of file communication_state.cc.
References stopHeartbeat().
Here is the call graph for this function:
|
pure virtual |
Checks if the DHCP message appears to be unanswered.
This method is used to provide the communication state with a received DHCP message directed to the HA partner, to detect if the partner fails to answer DHCP messages directed to it. The DHCPv4 and DHCPv6 specific derivations implement this functionality.
This check is orthogonal to the heartbeat mechanism and is usually triggered after several consecutive heartbeats fail to be responded.
The general approach to server failure detection is based on the analysis of the "secs" field value (DHCPv4) and "elapsed time" option value (DHCPv6). They indicate for how long the client has been trying to complete the DHCP transaction. If these values exceed a configured threshold, the client is considered to fail to communicate with the server. This fact is recorded by this object. If the number of distinct clients failing to communicate with the partner exceeds a configured maximum value, this server considers the partner to be offline. In this case, this server will most likely start serving clients which would normally be served by the partner.
All information gathered by this method is cleared when the poke method is invoked.
| message | DHCP message to be analyzed. This must be the message which belongs to the partner, i.e. the caller must filter out messages belonging to the partner prior to calling this method. |
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
|
protectedpure virtual |
Removes information about clients which the partner server failed to respond to.
This information is cleared by the CommunicationState::poke. The derivations of this class must provide DHCPv4 and DHCPv6 specific implementations of this method. The poke method is called to indicate that the connection has been successfully (re)established. Therefore the clients counters are reset and the failure detection procedure starts over.
See CommunicationState::analyzeMessage for details.
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
Referenced by poke().
| bool isc::ha::CommunicationState::clockSkewShouldTerminate | ( | ) | const |
Indicates whether the HA service should enter "terminated" state as a result of the clock skew exceeding maximum value.
If the clocks on the active servers are not synchronized (perhaps as a result of a warning message caused by clockSkewShouldWarn) and the clocks further drift, the clock skew may exceed another threshold which should cause the HA service to enter "terminated" state. In this state the servers still respond to DHCP clients normally, but they will neither send lease updates nor heartbeats. In this case, the administrator must correct the problem (synchronize the clocks) and restart the service. This method indicates whether the service should terminate or not.
Currently, the terminal threshold for the clock skew is hardcoded to 60 seconds. In the future it may become configurable.
Definition at line 206 of file communication_state.cc.
References isClockSkewGreater().
Here is the call graph for this function:| bool isc::ha::CommunicationState::clockSkewShouldWarn | ( | ) |
Indicates whether the HA service should issue a warning about high clock skew between the active servers.
The HA service monitors the clock skew between the active servers. The clock skew is calculated from the local time and the time returned by the partner in response to a heartbeat. When clock skew exceeds a certain threshold the HA service starts issuing a warning message. This method returns true if the HA service should issue this message.
Currently, the warning threshold for the clock skew is hardcoded to 30 seconds. In the future it may become configurable.
This method is called for each heartbeat. If we issue a warning for each heartbeat it may flood logs with those messages. This method provides a gating mechanism which prevents the HA service from logging the warning more often than every 60 seconds. If the last warning was issued less than 60 seconds ago this method will return false even if the clock skew exceeds the 30 seconds threshold. The correction of the clock skew will reset the gating counter.
Definition at line 179 of file communication_state.cc.
References isClockSkewGreater(), and last_clock_skew_warn_.
Here is the call graph for this function:
|
pure virtual |
Checks if the partner failure has been detected based on the DHCP traffic analysis.
In the special case when max-unacked-clients is set to 0 this method always returns true. Note that max-unacked-clients set to 0 means that failure detection is not really performed. Returning true in that case simplifies the code of the HAService which doesn't need to check if the failure detection is enabled or not. It simply calls this method in the 'communications interrupted' situtation to check if the server should be transitioned to the 'partner-down' state.
Implemented in isc::ha::CommunicationState6, and isc::ha::CommunicationState4.
| int64_t isc::ha::CommunicationState::getDurationInMillisecs | ( | ) | const |
Returns duration between the poke time and current time.
Definition at line 167 of file communication_state.cc.
References poke_time_.
Referenced by isCommunicationInterrupted().
|
inline |
Returns last known state of the partner.
Definition at line 92 of file communication_state.h.
References partner_state_.
|
protected |
Checks if the clock skew is greater than the specified number of seconds.
| seconds | a positive value to compare the clock skew with. |
Definition at line 212 of file communication_state.cc.
References clock_skew_.
Referenced by clockSkewShouldTerminate(), and clockSkewShouldWarn().
| bool isc::ha::CommunicationState::isCommunicationInterrupted | ( | ) | const |
Checks if communication with the partner is interrupted.
This method checks if the communication with the partner appears to be interrupted. This is the case when the time since last successful communication is longer than the confgured max-response-delay value.
Definition at line 174 of file communication_state.cc.
References config_, and getDurationInMillisecs().
Here is the call graph for this function:
|
inline |
Checks if recurring heartbeat is running.
Definition at line 129 of file communication_state.h.
References timer_.
| std::string isc::ha::CommunicationState::logFormatClockSkew | ( | ) | const |
Returns current clock skew value in the logger friendly format.
Definition at line 226 of file communication_state.cc.
References clock_skew_.
| void isc::ha::CommunicationState::poke | ( | ) |
Pokes the communication state.
Sets the last poke time to current time. If the heartbeat timer has been scheduled, it is reset (starts over measuring the time to the next heartbeat).
Definition at line 138 of file communication_state.cc.
References clearUnackedClients(), poke_time_, startHeartbeatInternal(), and timer_.
Here is the call graph for this function:| void isc::ha::CommunicationState::setPartnerState | ( | const std::string & | state | ) |
Sets partner state.
| state | new partner's state in a textual form. Supported values are those returned in response to a ha-heartbeat command. |
| BadValue | if unsupported state value was provided. |
Definition at line 57 of file communication_state.cc.
References isc::ha::HA_HOT_STANDBY_ST, isc::ha::HA_LOAD_BALANCING_ST, isc::ha::HA_PARTNER_DOWN_ST, isc::ha::HA_READY_ST, isc::ha::HA_SYNCING_ST, isc::ha::HA_TERMINATED_ST, isc::ha::HA_UNAVAILABLE_ST, isc::ha::HA_WAITING_ST, isc_throw, and partner_state_.
| void isc::ha::CommunicationState::setPartnerTime | ( | const std::string & | time_text | ) |
Provide partner's notion of time so the new clock skew can be calculated.
| time_text | Partner's time received in response to a heartbeat. The time must be provided in the RFC 1123 format. |
| isc::http::HttpTimeConversionError | if the time format is invalid. |
Definition at line 218 of file communication_state.cc.
References clock_skew_, isc::http::HttpDateTime::fromRfc1123(), and isc::http::HttpDateTime::getPtime().
Here is the call graph for this function:| void isc::ha::CommunicationState::startHeartbeat | ( | const long | interval, |
| const boost::function< void()> & | heartbeat_impl | ||
| ) |
Starts recurring heartbeat (public interface).
| interval | heartbeat interval in milliseconds. |
| heartbeat_impl | pointer to the heartbeat implementation function. |
Definition at line 81 of file communication_state.cc.
References startHeartbeatInternal().
Here is the call graph for this function:
|
protected |
Starts recurring heartbeat.
| interval | heartbeat interval in milliseconds. |
| heartbeat_impl | pointer to the heartbeat implementation function. |
Definition at line 87 of file communication_state.cc.
References heartbeat_impl_, interval_, io_service_, isc_throw, and timer_.
Referenced by poke(), and startHeartbeat().
| void isc::ha::CommunicationState::stopHeartbeat | ( | ) |
Stops recurring heartbeat.
Definition at line 128 of file communication_state.cc.
References heartbeat_impl_, interval_, and timer_.
Referenced by ~CommunicationState().
|
protected |
Clock skew between the active servers.
Definition at line 315 of file communication_state.h.
Referenced by isClockSkewGreater(), logFormatClockSkew(), and setPartnerTime().
|
protected |
High availability configuration.
Definition at line 295 of file communication_state.h.
Referenced by isc::ha::CommunicationState4::analyzeMessage(), isc::ha::CommunicationState6::analyzeMessage(), isc::ha::CommunicationState4::failureDetected(), isc::ha::CommunicationState6::failureDetected(), and isCommunicationInterrupted().
|
protected |
Pointer to the function providing heartbeat implementation.
Definition at line 307 of file communication_state.h.
Referenced by startHeartbeatInternal(), and stopHeartbeat().
|
protected |
Interval specified for the heartbeat.
Definition at line 301 of file communication_state.h.
Referenced by startHeartbeatInternal(), and stopHeartbeat().
|
protected |
Pointer to the common IO service instance.
Definition at line 292 of file communication_state.h.
Referenced by startHeartbeatInternal().
|
protected |
Holds a time when last warning about too high clock skew was issued.
Definition at line 319 of file communication_state.h.
Referenced by clockSkewShouldWarn().
|
protected |
Last known state of the partner server.
Negative value means that the partner's state is unknown.
Definition at line 312 of file communication_state.h.
Referenced by getPartnerState(), and setPartnerState().
|
protected |
Last poke time.
Definition at line 304 of file communication_state.h.
Referenced by getDurationInMillisecs(), and poke().
|
protected |
Interval timer triggering heartbeat commands.
Definition at line 298 of file communication_state.h.
Referenced by isHeartbeatRunning(), poke(), startHeartbeatInternal(), and stopHeartbeat().