Ontologies are an attempt to formalize a certain field of knowledge by defining dependencies between objects, their belonging to classes, refining their properties, and so on.
In most ontologies, a unique identifier is used for each concept, with the help of which it is possible to determine representations of a concept in a natural language, its meaning, relationships with other entities. The question is, why do I need to artificially enter identifiers? The answer is simple: such a representation of knowledge in ontology allows you to separate the variety of meanings that have the same words of a natural language from each other.
Ontologies are a kind of bridge between a person in all the diversity of his speech and thought and a computer that is able to "understand" only clear commands. At the same time, man is able to expose part of the system of his world into the framework of ontology.
Currently there are many ontologies: DBpedia, Freebase, Wikidata, WordNet, Cyc and many others. From them you can extract a large number of facts, relationships between concepts, belonging to classes and much more.
The problem of ontologies is the incomplete reliability of the data, their complexity and even some inconsistency (manifested in ontologies created automatically). Manually created ontologies are smaller, but have a greater degree of reliability and often a hierarchy of concepts.
However, they are also not without problems: some areas have an excessive number of connections between concepts, some are practically not covered. It is not surprising, because it is impossible to foresee everything in practice.
Ontologies are used everywhere when it comes to the intellectual processing of information in order to provide “understanding” of natural language texts by machines. The better the ontology is built, the more deeply the machine can understand a person. But neither full understanding, nor understanding at a good level due to only ontologies can be achieved.
Nevertheless, ontologies have a very important property: they allow the reuse of data. They are based not on statistics, but on real knowledge that a person owns.
Ontologies are a great help in creating artificial intelligence systems. The main question is how to use them better, how to remove confusion in the data, and how to consider the connections between concepts.
Google, for example, uses the Freebase ontology to respond to user queries. It helps to find brief and useful information about the object that the user is looking for. But in this version, ontologies lose their initial purpose, many connections are not taken into account. They become just a convenient tool for storing data.
Large ontologies often have references to each other. This makes it possible to integrate large-scale data in order to find answers to a wide range of user questions using a context-relevant network.
Let us present the pros and cons of existing ontologies in the table using the four large ontologies Dbpedia, Freebase, OpenCyc, Wikidata as an example and compare them with the Mind Simulation ontology.
We list and give a brief description of the criteria by which we will compare ontologies:
Support for languages other than English.
Ranks of relations between entities: how strong is the relationship between entities, how much is it used.
Ranks and order of entities: how often the entity is used (in the corpus of texts).
The presence of a description of entities: a description of an entity in a natural language.
The time of occurrence of the fact: storage of the time of appearance of the fact for each entity (helps to monitor the relevance of the facts)
Is the ontology updated: do new knowledge appear in the ontology (an important aspect for the modern, rapidly changing world).
Presence of facts: storage of facts in ontology (for example, Pushkin was born in 1799).
Storage of the truth of knowledge. The ability to store knowledge with negation: storage of the coefficient of truth of knowledge, due to it you can store negative knowledge (for example, a penguin does not fly).
Inheritance and polymorphism of knowledge: knowledge from the upper levels of ontology extends to lower levels (for example, an animal has the “breathe” action, then it applies to all representatives of this class in ontology).
Ranking of knowledge by frequency of use: how often people use this or that knowledge in communication, correspondence, on the Internet.
DBpedia | Freebase | OpenCYC | Wikidata | MSL | |
---|---|---|---|---|---|
Support for languages other than English | + | + | - | + | + |
Ranks of relations between entities | - | - | - | + | + |
Ranks and order of entities | - | + | - | - | + |
The presence of a description of entities | + | + | + | + | + |
The time of occurrence of the fact | - | - | - | + | + |
Is the ontology updated | - | - | - | + | + |
Presence of facts | - | - | + | - | + |
Storage of the truth of knowledge. The ability to store knowledge with negation |
- | - | - | - | + |
Inheritance and polymorphism of knowledge | - | - | - | - | + |
Ranking of knowledge by frequency of use | - | - | - | - | + |
Almost all companies want to make informed decisions, spending as little time and resources as possible. This is the main reason why a wide range of companies from various industries invests in the field of artificial intelligence.
In our work, we wanted to use the knowledge already accumulated in ontologies. To do this, we downloaded several large ontologies, parsed them and built an interface that takes into account their features. Using this interface, you can see the relationship between entities and their hierarchy. Due to this, we noticed a large number of errors in these ontologies. If you try to build an intelligent system on one of these ontologies, the answers will be illogical and unpredictable.
Thus, the ontologies listed above are much more suitable for search problems, but are practically not applicable to the tasks that arise when constructing systems of General Artificial Intelligence.
That is why our laboratory pays so much attention to the quality of basic knowledge and the technological properties of the abstract-ideal layer that contains them.