Today, I will like to blog about the differences between various types of identical matches in SDL Trados Studio 2014. Similar principles apply for earlier versions of SDL Trados.
Target audience: Newbies to CAT tools and for those who are mystified about the various types of matches.
100% Match
A segment in the new document analysed that is identical to a segment in the TM (a TU) from a document that has already previously been translated and confirmed in the TM.
Repetition
A segment that is repeated within the new document being analysed. It is not available in the TM. When you have a new file to translate, and there is a segment that is new (not included in your TM) but it is repeated in the new file, the first instance of it will be a no-match, and the remaining ones will be repetitions. A repetition can be found in the analysis even when the analysis has been done against a brand new, empty TM.
Repetitions (and no matches) are the only types of matches that can be found when you analyse the document against an empty translation memory.
Context Match
To be a context match, the TM segment must be a 100% match for the document segment and both of them must have been preceded by the same segment (in the TM and the document). If there is no preceding segment, other context information is stored, for example, the segment is a document header. Every segment is stored along with its preceding TU and document structure information.
Graphical representation of above –
Type A:
Type B:
Other optional components in the analysis: Locked segments, cross-file repetitions, internal fuzzy match leverage.
Example of Type A (new, empty TM)
A document consists of 5 segments, say
Each segments consists of 10 words each. Total word count = 50.
S1 = S3 and S5 (identical segments).
Analysis
No match: S1 (first instance of repeated segments)
S2, S4 (30 words)
Repetitions: S3, S5 (same as S1) (20 words)
Example of Type B (existing populated TM)
A new document consists of 10 segments
Existing TM consists of eight translation units (TUs)
Scenario:
Analysis
S1 = Context match (first segment in the current document and also in the document from which TU1 was produced. In real-life scenario, this could have been TU20 in case the 20th TU was the first segment of a document).
S2 = No match (1st instance of a repetition)
S3 = No match
S4 = 100% match (same as TU2)
S5 = Repetition (same as S2)
S6 = Fuzzy match (90% match of TU5)
S7 & S8 context match (100% match of TU2 & TU3 and in same order)
S9 = 100% match (same as TU6)
S10 = No match (as minimum match value is set as 70% by default and not 50%)
Word count analysis
Total no. of words: 100
* Even though we have a 55% match, it will be reflected as a ‘New/No match’ segment due to the default minimum match value of 70%.
4 Comments
Nice article. Can you give a definition of what a segment is? Is it a paragraph, sentence, group of words etc.? I’ve been trying to find out but can’t. Thanks!
Hello!
A segment is text that is a single translation unit for the purposes of finding a match in a translation memory. Usually a segment corresponds to a sentence. Translation memories store translations by segment, and segments are also the unit for matching presented text with stored translations. In a translation editor, each segment is displayed on a separate row. SDL Trados segments sentences based on segmentation rules. Typical segment delimiters for English are: ., ?, !, :. Simply put, SDL Trados will consider any text which is followed by a full stop (excluding known abbreviations) as a segment.
Hope that helps! 🙂
This clears out all the doubts. The flowchart explains the concepts really well.
Without understanding this, its almost impossible to understand the analysis report.
Great job Ashok.
Thanks Kiran, glad you found it useful. 🙂