Ok, quick review:
1. Two servers set up in a localnode configuration.
2. heartbeat/ldirectord is used.
3. Connections (ssh in this case) are accepted by both real
servers. (Telnet also tested just to make sure it wasn't an ssh
issue)
4. Connection state information is being mcast as expected to the
backup director. Connection table is updated as exptected.
5. Heartbeat works exactly as expected (ip failover, ldirectord
takeover)
6. Connections initiated to the real server on the backup director
fail over as expected, no matter how many failovers happen back
and forth.
7. Everything works beautifully, except the problem . . .
Problem:
Connections initiated to the real server on the master director do not
fail over when the director is failed (heartbeat is shut down).
Note: Only connections that are opened to the master director's real
server (remember: localnode) fail. If a server has a connection open to
it, and later becomes the master director, the connection stays alive
even if that server is later failed back to the backup state.
I have started from scratch on two clean servers and did a basic
configuration and the same problem exists.
My configurations are in an earlier post. I've trimmed them down a bit
since then to remove extraneous stuff to try to track down the problem
to no avail. Feel free to ask for them again if you can't find the
earlier post and I'll post them again, but I assure you, there are no
changes out of the ordinary. Nothing that should cause this. The
configurations are identical on both server (Even cleared them out and
rsynced to be sure).
Any questions, comments, suggestions are appreciated. Thanks!
--
Sal Tepedino <stepedino@xxxxxxxxxxxxxx>
|