Stannum/blog/

Request for software: byte-oriented merge with moves

2021-11-13 Permalink

Most merge tools use line-oriented comparison algorithms, and employ heuristics to detect moved lines, if at all.

Instead, the comparison should compute the shortest sequence of operations required to transform one input into the other, where the permitted operations include character deletion, insertion, replacement, and a substring move:

This problem is known to be NP-hard though. However, good practical approximations exist. In particular one such algorithm is described in The String Edit Distance Matching Problem with Moves by G. Cormode and S. Muthukrishnan.

The algorithm works by constructing an edit-sensitive parsing tree in O(N log* N) time. Such a tree is tolerant to local edits of the string under the above operations. Consequently a metric between two strings can be approximated by comparing the cardinalities of node labels of their parsing trees.

It should be possible to efficiently traverse such two trees and emit a sequence of plausible edit operations that would transform one of them into the other. Then the two sequences between ‘left’ and ‘right’ in a 3-way merge can be compared and merged.