self-supervised learning

Back to Glossary

What is self-supervised learning?

In the artificial intelligence industry, self-supervised learning is a paradigm where models learn from the data itself without needing manually labeled examples. This approach leverages the inherent structure and relationships within the data to generate supervisory signals. For instance, in natural language processing, a model might be trained to predict the next word in a sentence, learning linguistic patterns in the process. In computer vision, a model might be trained to colorize black-and-white images, learning about textures and object boundaries. Self-supervised learning is particularly valuable because it reduces the need for extensive, costly, and time-consuming labeled datasets, enabling the development of robust AI systems using widely available, unlabeled data.

Self-supervised learning is a type of machine learning where the model is trained to predict part of its input from other parts of its input, without using external labels.

Examples

Predicting masked words in sentences: Models like BERT (Bidirectional Encoder Representations from Transformers) are trained to predict missing words, which helps them understand language context and semantics.

Image colorization: Algorithms are trained to add color to black-and-white images. This task helps models learn about object textures, edges, and color distributions without needing labeled data.

Additional Information

Reduces the need for labeled data, making training less resource-intensive.

Enables learning from large-scale datasets, improving model performance and generalization.