158x Filetype PDF File size 0.40 MB Source: troindia.in
AN ENGLISH TO ASSAMESE, BENGALI AND HINDI MULTILINGUAL E-DICTIONARY Md. Saiful Islam Department of Computer Science Assam University, Silchar, Assam, India E-mail:sislam.mca@gmail.com Abstract alphabetically with their meaning, synonyms, Dictionary is a very demandable components phonetics, POS, and examples [5][6]. It is one of of Natural Language Processing system the important tools to assist students in nowadays. A dictionary is one of the understanding as well as enlightening the skill of important tools that can be used for learning reading. There are two types of dictionary, new languages. A word is basically an namely Paper dictionary which is also known as association of linguistic sound and meaning. hard or printed dictionary and Electronic The spelling does not always easily correlate dictionary which is also known as digital or with the sound of a word. A dictionary helps Internet dictionary. us both with the spelling and pronunciation of Electronic Dictionary (E-Dictionary) is one kind such words. Electronic dictionaries are very of dictionary whose data exists in digital form popular nowadays. It can be accessed by many and can be accessed through a number of users simultaneously on online. The main different media. The E-Dictionary is a very objective of this paper is to develop an English important and powerful tool for any person who to Assamese, Bengali and Hindi (E-ABH) is learning a new language using computer on multilingual electronic dictionary in such a both online and offline. It has the advantage of way that it is user friendly dictionary and user providing the user to access much larger database can easily look up the meaning of word and than a single book. The most important other related information of the word like advantage of an E-Dictionary is that it is very word Id, POS, synonyms and examples from convenient to use. In modern electronic form, English to Assamese, Bengali and Hindi electronic dictionaries have tremendous potential. languages. This dictionary will be beneficial and must be improved the knowledge of According to the languages involve, the Assamese, Bengali, English and Hindi languages basically for people of North-East dictionaries are found in three categories as India. below: Keywords: Electronic Dictionary, Languages, 1. Monolingual Dictionary: Here, user can Natural Language Processing, Sequential search the meaning of word and other related Search Technique information of the word from one language to I. INTRODUCTION same language. English-English and Bengali- A. Electronic Dictionary Bengali are some of the examples of Dictionary is a book of words with one or more monolingual dictionary. specific languages and the words are listed 2. Bilingual Dictionary: Here, user can search the meaning of word and other related ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016 74 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) information of the word from one language to C. Languages another language. Assamese-English and In this section, we discuss briefly about the English-Bengali are some of the examples of Assamese, Bengali, English and Hindi languages bilingual dictionary. as follows: 3. Multilingual Dictionary: Here, user can search the meanings of words and other related 1. Assamese Language: Assamese is an information of the words from one language to Eastern Indo-Aryan language used mainly in several languages. English-Assamese, Bengali the state of Assam. It is the state language as and Hindi is an example of multilingual well as official language of Assam. The dictionary. Assamese language is also known as Asamiya (Axomiya). It is the mother tongue/language of According to Al-Rabi’i, the E-Dictionary can be Assamese people. Assamese language is divided into two different types [5] as follows: spoken mainly by the people of Assam and by the some people of other North-Eastern states. 1. Online E-Dictionary: This dictionary is Nearly 15 to 20 million people speak the directly used in digital form through Internet Assamese language. Assamese is one of the using web browsers from anywhere place in the recognized languages of India [6][7]. It is world. It is also known as Internet dictionary. evolved in the 7th century AD having its roots Many users can be accessed it simultaneously on from the Sanskrit language. However, its online. vocabulary, phonology and grammar have been substantially influenced by the original 2. Offline E-Dictionary: This dictionary can be inhabitants of Assam, such as the Boros and the used in digital computer, PDA (Personal Data Kacharis. Assamese script is derived from Assistant), and mobile phone. It is also known as Brahmi script. The Assamese language is portable digital dictionary. We can carry and written using Assamese scripts that are backup Offline E-Dictionary using CD, DVD, developed from the Gupta alphabets around HD and pen drive. We can also download this 1200 AD and which closely resemble the type of dictionary from Internet and can be Mithilakshar and Bengali alphabets. installed in our own computer or other devices. 2. Bengali Language: Bengali language is an B. Natural Language Processing Indo-Aryan language spoken mostly in the East Natural languages are most commonly used by Indian subcontinent. It is also known as Bangla humans for communication purposes naturally. language. It has evolved from the Magadhi Natural Language Processing (NLP) is a field of Prakrit and Sanskrit language. Bengali is one of computer science and linguistics concerned with the recognised languages of India. It is the the interactions between computers and natural official language of West Bengal and Tripura. It languages[4]. NLP deals with computer is also a major language in the Indian Union programs to understand human languages both in Territory of Andaman and Nicobar Islands. The written and oral form. The major goal of the NLP Bengali is mainly spoken by the people of Indian group is to design and build software that will states like West Bengal, Tripura and Assam. It is analyze, understand, and generate languages that the seventh most spoken language in the world humans use naturally. NLP is an area of research and second most spoken language in India. and application that explores how computer can The Bengali language is written using Bengali be used to understand and manipulate natural th scripts and is the 6 most widely used writing language text or speech to do useful things. Some system in the world. The script with minor of the most common research tasks in NLP are variations is shared by Assamese and is the basis Machine Translation, Electronic Dictionary, for the other languages like Morphological Segmentation, Natural Language Manipuri and Bishnupriya Manipuri [6]. Generation, Optical Character Recognition, Part of Speech (POS) Tagging, Question Answering, 3. English Language: English is the West Speech Recognition, Information Retrieval (IR), Germanic language that was first spoken in early and Speech Segmentation[6]. medieval England. English is spoken mainly by the people of Canada, Australia, United Kingdom, United States, Ireland, and New ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016 75 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) Zealand. It is an official language of almost sixty c. The Compact Oxford English Dictionary, sovereign states. It is the third most common edited by J. A. Simpson and E. S. C. Weiner native language in the world. It has become in 1991[15]. the leading language of international discourse d. The Oxford Dictionary of Current English, [6]. English was introduced in India in 1830 compiled by Catherine Soanes in 2006. during the rule of the East India Company. At the e. The Concise Oxford English Dictionary, time of Independence of India in 1947, English edited by Angus Stevenson and Maurice was the only functional lingua franca in the Waite in 2011 [16]. country. The Constitution of India (1951) declared English as the associate official III. DATAFLOW DIAGRAM OF E-ABH language of India. It has various dialects in India DICTIONARY due to the influence of local languages. A Data Flow Diagram (DFD) is a pictorial 4. Hindi Language: Hindi is the fourth most representation of information flows in a system. widely spoken language in the world. It is spoken The DFD is often used as a preliminary step to widely by the people of Indian states like Delhi, create an overview of the system [12]. It is an Madhya Pradesh, Bihar, Uttar attractive technique because it provides what Pradesh, Chhattisgarh, Haryana. Himachal users do rather than what computers do. The Pradesh, Chandigarh, and Rajasthan. It is the DFD technique is very popular, because it is very primary spoken language of Madhya Pradesh and simple to understand and use. We have used two Uttar Pradesh [6]. In the 2001 census of India, types of DFD to implement the E-ABH 258 million people is reported Hindi to be their dictionary which are as below: native language. Hindi is also spoken in the other neighbouring countries of India, such as A. Level 0 DFD Bangladesh, Bhutan and Nepal. Hindi derives its The Level 0 DFD is also known as Context vocabulary from several major sources like Diagram (CD). Sanskrit, Persian and Arabic. II. REVIEW OF RELATED LITERATURE A CD is the most basic form of the DFD. It aims to show how the entire system works at a glance. Lots of English paper dictionaries have been CD demonstrates the interactions between the compiled by many lexicographers in different process and external entities. The CD of E-ABH times. The first English dictionary was compiled dictionary is shown in figure1. by Robert Cawdrey in 1604 [17]. It contains about 2,543 words. The first electronic version of Oxford English Dictionary (OED) was made available in 1988 [14]. The digital OED was developed by Tony Smith and published by Oxford University Press in 1999. The online version of OED has been available since 2000. Presently, there are many English-Assamese[1], English-Bengali[2], English-Hindi[3] and English-English paper dictionaries available in Fig.1: Context Diagram of E-ABH dictionary market. There are also a few number of English- Assamese, English-Bengali[8][19], English- In CD, the Administrator and User are two Hindi and English-English electronic dictionaries available on both online and offline external entities. The Administrator can enter nowadays. data into the database of the system, whereas the Some examples of English dictionaries with their User can search data from the database of the lexicographer names are mentioned as below: system. a. A Dictionary of the English Language, compiled by Samuel Johnson in 1755 [14]. B. Level 1 DFD b. The Oxford English Dictionary, published by Level 1 DFD is the next level of CD that shows Oxford University Press in 1989. the overview of the full system of the E-ABH dictionary. It is used to describe more details on ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016 76 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) how the data are processed and what type of data Go to next step; is needed in the system. Level 1 DFD of the E- 2. Search headword with its POS ABH dictionary is shown in figure 2. If (found) { Print- headword already exists in the dictionary; Stop } Else Go to next step; 2. Enter new word Id, headword and other related information of the headword (POS, synonyms and examples) of Assamese, Bengali, English and Hindi languages. Fig. 2: Level 1 DFD of E-ABH dictionary 3. Submit. In Level 1 DFD, the Administrator and End- According to this algorithm, suppose, an user are two external entities. The Administrator Administrator wants to enter a word (headword) needs to login first; if the login is successful, then into this dictionary. The Administrator needs to the Administrator can enter data into the E-ABH check desired word Id for the headword first. If dictionary. The End-user can search the meaning the word Id is not available in the dictionary, then of word. In addition, the End-user can also give the Administrator needs to also check the feedback to the Administrator about the headword with its POS in the dictionary. If the performance of the E-ABH dictionary. headword and its corresponding POS are not available in the system, then the Administrator IV. IMPLEMENTATION can enter the desired word Id, the headword and The implementation part of E-ABH dictionary other related information of the word like word contains three phases which are: meaning, POS, synonyms and example in the dictionary. A. Necessary Software C. Word Search (or look up) We have used PHP, HTML, CSS and JavaScript There are lots of word search techniques as Front-End and MySQL as Back-End for the available for E-Dictionary. We have used development of E-ABH dictionary [10][11][20]. Sequential Search Technique to look up (or search) the meaning of the word quickly and B. Data (or word) Entry easily in E-ABH dictionary. In E-ABH dictionary, only the Administrator can Sequential Search Technique (SST) is the enter data (or word). The Administrator needs to simplest and most popular word search technique login first with proper username and password. If for electronic dictionaries It is a very useful and the login is successful, then he/she can be able to efficient technique to look up the words easily enter words into the dictionary based on the and quickly. If we want to search a particular following word entry algorithm. word in a database table using SST, then the SST checks each word one by one in sequence until 1. Enter word Id the desired word is found in the table. It starts to If (found) compare with each word from the beginning of the database table. In SST, the database table { need not be sorted. The average number of Print- word Id already exists in the comparisons in SST is (N+1)/2, where N is the dictionary; size of the row in the table. Its worst case cost is Stop proportional to the number of elements in the list. } The searching time for SST is O(n) [9][13].. Else ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016 77
no reviews yet
Please Login to review.