


Recently, researchers in made a systematic study and pointed out that alignments can improve the translation performance depending on the SMT systems and the type of corpus used. Another group of researchers hold an opposite point of view: significant decreases AER (alignment error rate) will not always result in significant increases the translation quality. Some researchers have shown that translation quality depends on word alignment quality. On the one hand, different processes of interfering word alignments were studied for better translation results. When talking about the relationship between machine translation and word alignment or phrase table, researchers seek for better translation performance from at least two independent research efforts. However, most commonly, this table is acquired from word alignments, which exhaustively enumerates all phrases up to a certain length consistent with the alignment. Until now, several methods to extract phrase pairs from a parallel corpus have been proposed, such as using a probabilistic model, pattern mining methods, matrix factorization, heuristic-based method, MBR-based method, and model-based method. The fundamental data structure in phrase-based models is a table of phrase pairs with associated scores which may come from probability distributions. One of the best performing translation systems in Statistical Machine Translation (SMT) nowadays is the phrase-based model, which takes continual word sequences as translation units. The corpus-motivated pruning results show that nearly 98% of phrases can be reduced without any significant loss in translation quality.

Experiment proves that the deduced formula is feasible, which not only can be used to predict the size of the phrase table, but also can be a valuable reference for investigating the relationship between the translation performance and phrase tables based on different links of word alignment. On the other hand, a corpus-motivated pruning technique is proposed to prune the default large phrase table. In this paper, on one hand, we focus on formulating such a relationship for estimating the size of extracted phrase pairs given one or more word alignment points. So far, there is no discussion from the aspect of providing a formula to describe the relationship among word alignments, phrase table, and machine translation performance. However, existing methods usually employ ad-hoc heuristics without theoretical support. In the last years, researchers conducted several studies to evaluate the machine translation quality based on the relationship between word alignments and phrase table.
