BLEU score

Back to Glossary

What is BLEU score?

The BLEU score, or Bilingual Evaluation Understudy score, is a crucial metric in the artificial intelligence industry for assessing the performance of machine translation systems. It measures how closely machine-generated text matches human-written reference texts. The BLEU score evaluates translations by comparing n-grams (contiguous sequences of words) in the candidate translation to those in the reference text. The score ranges from 0 to 1, where a higher score indicates closer similarity to the reference text. Despite its widespread use, the BLEU score has limitations, such as not accounting for the semantic meaning of the text and sometimes rewarding overly literal translations. Nonetheless, it remains a standard tool for benchmarking in the field of natural language processing.

A metric used to evaluate the quality of text generated by machine learning models, particularly in machine translation.

Examples

Machine Translation: When evaluating a new machine translation model, a company might use the BLEU score to compare the model's translations of English text into French against a set of professionally translated French texts.

Text Summarization: In a research project, scientists might use the BLEU score to assess the quality of automatically generated summaries of scientific articles by comparing them to summaries written by experts.

Additional Information

The BLEU score was introduced in 2002 and has since become a standard metric in natural language processing.

While useful, BLEU scores should be interpreted with caution and supplemented with human evaluations for a comprehensive assessment.