YAGO is a large semantic knowledge base combining info from Wikipedia, WordNet, and GeoNames, extracting facts through automation and linking them into a semantic network with over 10 million entities and 120 million facts.
YAGO is a large semantic knowledge base that was developed at the Max Planck Institute for Informatics in Germany. It combines information automatically extracted from several sources including:
Using natural language processing and data mining techniques, YAGO is able to extract entities and facts from the unstructured text of Wikipedia. It can detect definitions of concepts, hierarchical is-a relations between classes, structured attributes and factual relations about entities. This extracted information is then passed through a consistency check and cleaned to remove redundancies and contradictions.
The key benefit of YAGO is integrating all this information into one large semantic knowledge graph. This allows complex semantic queries to finds facts and connections spanning multiple domains and levels of abstraction. The facts and connections are encoded using RDF and entities are linked to WordNet and GeoNames identifiers.
As of 2022, YAGO contains over 10 million entities and 120 million facts. It supports querying in the SPARQL language and has linked open data that can be explored through a web browser. YAGO data has been used for applications in question answering, semantic search, and data mining.