Author: Daniil Gorbenko
Every day the number of chatbots is increasing, people are inventing new voice assistants, and the existing ones are improving. But no bot is capable of performing its functions without understanding what is required of it. One way of communication with a computer is through text.
However, it should be taken into consideration that every single program is trained to understand the text written in a natural language. The problem is that most bots have just a banal description of all possible language constructions, for example: if {phrase 1 } then {execute action 1} else if {phrase 2 } then {execute action 2 } else ... .
The problem is visible to the naked eye — if you replace at least one word in the input phrase. you will get a completely different result at best, or the system simply will not understand you, at the worst. However, this is a problem typical for simple chatbots, that are trained to complete only a narrow range of tasks. More complex and advanced assistants are trained to understand the meaning behind your words. But what does this "understanding" imply?
Text understanding ( text written in a natural language is referred to as just “text” from now on) is studied by the branch of science known as "computational linguistics". There are various approaches to text processing, among which are: deep learning, corpus linguistics, machine learning, etc.
Let’s look at the pros and cons of each approach, starting with, perhaps, the most controversial one - deep learning. Behind this beautiful concept, there are neural networks that are doing most of the work. They can classify texts with the help of various stylistic devices, but they are still very far from actual text understanding. This can be explained by the fact that neural networks can find various hidden connections between words in texts, but can not explain these connections. However for the computer to understand you, it should be able to explain why it has made this or that choice.
Also, the homonymy problem in neural networks is solved in a quite simple, but not entirely correct way: if you train some neural network purely on ornithology literature, and then try to prove it that a crane (for example in the building field) is a machine, not a bird. Nowadays, neural networks have a big problem with forgetting unnecessary information and being retrained, which can lead to an incorrect understanding of many words and phrases.
The situation with the chat-bot “Oleg” from the "Tinkoff" bank can serve as a vivid example of an incorrect understanding of meaning. Oleg was trained on raw data, so the bot began to be rude to people and give "bad" advice.
Corpus linguistics methods work similarly to neural networks. In order to train algorithms of text understanding, it is necessary to have a big corpus of texts. Quite often the algorithms that work with special dictionaries show good results since the data received from the corpus is statistically processed. However, it should be noted that such systems are highly dependent on the quality of a corpus and the amount of data in it.
Thus, methods based largely on the large body of text processing are not sufficient for achieving a deep understanding of the text’s meaning. A deep understanding of the text's meaning requires a much more formal description of knowledge about the world and all the connections within it.
Among the stages of processing and understanding the text meaning are the following: morphological analysis, identification of mentioned entities, syntactic analysis and extraction of relations between entities, which is closely related to semantic analysis.
Most of the modern algorithms perform all types of analysis step by step, trying to identify all possible types of word formation, to obtain information about the mentioned entities, to determine the role of the entity in the sentence based on received data, and only then try to determine what meaning has the word in the sentence.
However, this approach has a number of drawbacks: at a new stage the data from the previous stages can not be changed; knowledge from the following stages can not be involved in the analysis of the previous stages.
More universal algorithms run several stages at the same time, thus solving problems arising at each stage. For example, knowing the semantics of a word, one can unambiguously define its morphological features or, knowing the relations between entities in the real world, one can unambiguously define the role of a particular entity in a sentence.
Our system runs all kinds of stages at the same time. For example, by selecting the basic morphology of a word, you can start searching for the mentioned entities. After that, with info about all the entities, you can get rid of excess morphology, reducing the amount of information processed, and increase the accuracy of the analysis. For example, if you meet the entity "Union of Soviet Socialist Republics" in the text, you can distinguish the morphology of the word "Union" and process only it, since other words are dependent and can not be processed independently in the sentence. The case, when the entity may refer to several parts of speech and may have different meanings, depending on the morphology choice, is considered separately.
In such cases, 4 stages are run at the same time. All possible word morphology is singled out, all variants of entities are searched for, then, knowing the morphology of each significant word, partial syntactic analysis is performed for those variants where it is possible to perform it, after that all ambiguities are solved by applying semantic analysis. In our case, semantic analysis is presented as the system's intellectual abilities.
The system is able to analyze relationships between entities in the text and give out the weight of the relation for any pair or any triplet of entities in the sentence, which allows identifying with great accuracy the most significant relationships between the entities. For example, let's consider the sentence "Mama was washing a frame with soap". We will build a network where we will select all the entities in the sentence as the vertices, and at the same network level there will be all possible entities for one word:
We see that there are several variants of meanings for the word "soap" and the word "frame". The system can analyze pairs: wash (action) + frame (part of the window), wash (action) + frame (programming), wash(noun) + frame (part of the window), wash (noun) + frame (programming) and decide which on is more logical to choose in this situation.
Thus, the system will leave a couple of washing(action) + frame(s). Unlike neural networks, we can explain why exactly this choice was made by the system. Obviously, the knowledge of "wash -> window" will have more weight than "wash -> programming", and since the frame is part of the window, the system makes a conclusion, as "soap -> window" and "soap -> programming " does not carry any semantic load, then the sentence will have the form: Mom was washing (action) a frame (part of the window) with soap.
Syntactic analysis is based on a system of rules. This approach allows you to circumvent the demand for a large amount of data, on which the correctness and correctness of the analysis of the sentence depends. Each correctly represents a construction of the form: if {the construction in question has the form A} then {define the roles of entities in accordance with the rule A ’} else if ....
Depending on the language chosen, the order of applying the rules and the choice of constructions will be different, which avoids the problems of ambiguous definitions of the roles of entities. Such a system is flexible, since it immediately covers all the basic rules of a particular language, but the process of creating such a structure is rather laborious. It is worth noting that, depending on the language, the number of rules varies, for example, in Russian, the number of rules can reach 300.
Emphasizing the role of an entity in a sentence is the most important step in text processing. Based on the roles of entities, the AI will make its conclusions. As in the case of a person, if the interlocutor wants to tell the system some unrelated speech, then he risks receiving an indistinct answer, so it is important that the interlocutor understands that AI learns to understand thoughts like a person.
So what conclusion can be drawn? The set of methods: parallel analysis of all stages, the application of a system of rules at some stages, the use of an intelligent system to determine the meaning, gives a high result in understanding the meaning of the text.
We did a little test. The objective of this test was to retell text from one language to another. Both the quality of translation into another language and the quality of how correctly the meaning of the text in another language was conveyed were evaluated:
During the 20th century not many people travelled abroad on their holidays. The majority of people spent their holidays at home then. Today the situation is different, the world feels much smaller. Nowadays people can travel by car, by train or by plane. Some prefer hitchhiking. It is possible to buy tickets online or via phone at home. They can book a room in a hotel on the other side of the world. People also can make new friends, try out new things, visit new places and take a lot of pictures.
Люди не много путешествовали в течение 20 века в зарубежье на праздниках. Большинство людей проводило праздники дома в 20 веке. Сегодняшняя ситуация является другой. Земля чувствуется много маленькой. Люди сегодня могут путешествовать автомобилем поездом самолётом. Некоторый предпочитает автостоп. Возможно покупать онлайн билеты с помощью телефона дома. Люди могут бронировать помещение в отеле на иной стороне Земли. Люди могут находить новых друзей. Люди могут испытывать новые вещи. Люди могут посещать новые места. Люди могут много делать снимков.
As you can see from the example: the system of rules coped well even with complex sentences, breaking them into simple ones, while the quality of the translation of the text remained at a rather high level.
Thus, we were able to achieve an understanding of simple texts through applications of mixing various technologies to parse the text and understand the relationship of entities in the text.
Thinking does not depend on language. Language is just a symbolic system that allows you to conveniently describe the world around you and transmit this knowledge. In our system, thinking occurs at a level independent of the presentation of the input data, this allows you to conveniently manipulate the incoming information and process it, given that the information can be presented in a mixed form (for example, a picture and its description).
All these advantages make it possible to mix languages when communicating. So, for example, you can ask a question by building its construction in a language convenient for you, but at the same time replacing some entities with representations from another language.
Where can this help? Imagine that you came to another country and you need to learn something about the subject that you heard about during a conversation with a foreigner. To more accurately understand the information, you can ask the question in a way that is convenient for you and get the answer in a language that will be clear to you. For example, imagine that you were in London, and during the conversation you were advised to visit the West End.
However, you have no idea what it is. Then you can ask your voice assistant “Что такое West End”, to which you will receive an answer in your own language: “This is the western part of the center of London”, or even easier to ask “Расскажи про West End”, and get a full-fledged answer: “This is the western part of the center Of London. Here you can visit South Kensington, Soho Theater ... ” This is much more convenient than opening a translator (which can also translate poorly some concepts), then opening search engines trying to find something.
Considering the structure of text parsing described above, we can safely state that the system of rules and the intellectual core (independent of the representation of input information) have successfully helped to achieve a multi-lingual understanding of the text.
However, it should be noted that when languages are mixed, the quality of analyzing decreases because there are no clear rules describing such structures. We have isolated the most important rules from each language, compared them and created our own universal system of rules, which allows to understand the meaning of simple expressions with sufficient quality.
Human: how many человек живет in Russia?
AI: 146 880 432 humans.
Human: What is similar между penguin and курицей
AI: Are birds.
Systems of text understanding are extensively discussed in many articles, but there is little information about how AI should talk and make sentences. Probably due to the fact that modern systems of machine learning allow to bypass a stage of construction of the sentence for a conclusion and give already ready results in the form of the generated sentence, or because systems have "question-answer" templates for all types of questions known in advance. However, questions about how to build a sentence remain unanswered when the system has given an answer in the form of a structure abstract from language concepts.
As mentioned earlier, our system thinks independently from the languages. Within the knowledge system, knowledge is stored in a special form, so when the system tries to give an answer, it tries to link the data in a certain way. These connections express the interconnections between the entities. Depending on what the system has to give as an answer, it forms relationships between entities in certain ways. Thanks to these relations, we know what should be used as a noun, what should be used as a verb and can safely identify the time intervals and specific places.
Rule systems for constructing sentences also help with the output of information in a natural language. These systems are different for each language, they consist of a large number of rules and exceptions. Thanks to the special rules for constructing a sentence, a thought takes the form of a text that can be easily read. It is worth noting that the text can also be mixed with various other forms of presentation of information, it allows you to make the system's response more flexible and diverse.
In conclusion, it should be noted that thinking, understanding the input information and the conclusion of the answer are closely related. It is impossible to separate one from another or try to build independent systems.