I would say neither.
I think it's more due to the muxing tool.
So MTX could make sure that when the input file changes, the timestamps for the audios are also adjusted to the timestamp of the video.
But I know that MTX already has a mega heuristic and these additional specifications would inflate the heuristic even further.
There was one thing I did differently than Rocky during the test. I don't have the video stream demuxed, so MTX then has to access the M2TS file to get the video frames.
However, the video is 100% identical as if it had been extracted with DGDemux. (A great proof that MTX can handle multi m2st segments really well, at least for the video track)
My plan for a workaround is the following:
The timestamps when a new M2TS file begins are known and cE also sets 100% correct timestamps in the chapters.
After the file is created, the audio timestamps could be shifted a little.
I don't mean ALL timestamps here, just the timestamps from the first audio frame in this cluster.
In our case, the timestamps only need to be shortened to 4ms. This can be done very quickly as only two bytes in the mkv need to be overwritten.