Yeah, that’s the way it always is with these questions — for any particular example, it’s easy enough to see how the standard state-machine approaches could be extended to cover them. After all, FSMs are Turing-complete; they can do anything that’s computable, in theory.
In practice, though, the problem is the gross proliferation of states required to deal with every possible eventuality (and all the developer time needed to create and maintain all those states). It’s really hard to think of everything, and even harder to code up responses for everything you can think of.
To deal with that, you need some level of more general sensors and behaviors. Sensors should include high-level concepts like “unexpected event” and “aggression”, and some serious time would have to be put into these. But, for example, any animation could be tagged with objects that are expected to be there, and if those objects are suddenly not there, it could fire the unexpected-event sensor. Similarly, any attack (including fus-ro-dah) aimed in a character’s direction could trigger the aggression sensor.
Then, your state machine can respond to these very general sensors. While in the eating state, there should be a bail-out for unexpected event that makes them stop eating and look up with expressions of surprise. An aggression event should jump to the “hey, what’s the big idea?!?” type response. Either of these would stop the eating, which is a big step up.
Hierarchical state machines (HFSMs) can help prune the work of this sort of thing.
It still only gets you so far, though, IMHO. At some point the industry is going to move away from those entirely, and switch to something like goal-oriented planners. These provide some actual intelligence, rather than mere scripted behaviors, and is really the only way to deal with unexpected events in a rich environment.