Difference is obviously there but nothing prevents you from undirected conditioning of long range dependencies. There’s no need to chain anything.
The problem from a math standpoint is that it’s an intractable exercise. The moment you start relaxing the joint opt problem you’ll end up at a similar place.
Difference is obviously there but nothing prevents you from undirected conditioning of long range dependencies. There’s no need to chain anything.
The problem from a math standpoint is that it’s an intractable exercise. The moment you start relaxing the joint opt problem you’ll end up at a similar place.