столько раз учителя, ученики и родители
посетили сайт «Инфоурок»
за прошедшие 24 часа
+Добавить материал
и получить бесплатное
свидетельство о публикации
в СМИ №ФС77-60625 от 20.01.2015
Дистанционные курсы профессиональной переподготовки и повышения квалификации для педагогов

Дистанционные курсы для педагогов - курсы профессиональной переподготовки от 5 480 руб.;
- курсы повышения квалификации от 1 400 руб.
Московские документы для аттестации


ВНИМАНИЕ: Скидка действует ТОЛЬКО до 28 февраля!

(Лицензия на осуществление образовательной деятельности №038767 выдана ООО "Столичный учебный центр", г.Москва)

Инфоурок / Иностранные языки / Конспекты / A Sentence Reduction in Modern English

A Sentence Reduction in Modern English

Напоминаем, что в соответствии с профстандартом педагога (утверждён Приказом Минтруда России), если у Вас нет соответствующего преподаваемому предмету образования, то Вам необходимо пройти профессиональную переподготовку по профилю педагогической деятельности. Сделать это Вы можете дистанционно на сайте проекта "Инфоурок" и получить диплом с присвоением квалификации уже через 2 месяца!

Только сейчас действует СКИДКА 50% для всех педагогов на все 111 курсов профессиональной переподготовки! Доступна рассрочка с первым взносом всего 10%, при этом цена курса не увеличивается из-за использования рассрочки!

Скачать материал целиком можно бесплатно по ссылке внизу страницы.

A Sentence Reduction in Modern English

1. The Sentence

The notion of sentence has not so far received a satisfactory definition, which would enable us by applying it in every particular case to find out whether a certain linguistic unit was a sentence or not.

Thus, for example, the question remains undecided whether such shop notices as Book Shop and such book titles as English are sentences or not. In favour of the view that they are sentences the following consideration can be brought forward. The notice Book Shop and the title English Grammar mean 'This is a book shop', 'This is an English Grammar'; the phrase is interpreted as the predicative of a sentence whose subject and link verb have been omitted, that is, it is apprehended as a unit of communication. According to the other possible view, such notices as Book Shop and such titles as English Grammar are not units of communication at all, but units of nomination, merely appended to the object they denote. Since there is as yet no definition of a sentence which would enable us to decide this question, it depends on everyone's subjective view which alternative he prefers. We will prefer the view that such notices and book titles are not sentences but rather nomination units.

We also mention here a special case. Some novels have titles formulated as sentences, e. g. The Stars Look Down, by A. Cronin, or They Came to a City, by J.B. Priestley. These are certainly sentences, but they are used as nomination units, for instance, Have you read The Stars Look Down? Do you like They Came to a City?

With the rise of modern ideas of paradigmatic syntax yet another problem concerning definition of sentence has to be considered.

In paradigmatic syntax, such units as He has arrived, He has not arrived, Has he arrived, He will arrive, He will not arrive, Will he arrive, etc., are treated as different forms of the same sentence, just as arrives, has arrivedwill arrive etc., are different forms of the same verb. We may call this view of the sentence the paradigmatic view.

Now from the point of view of communication, He has arrived and He has not arrived are different sentences since they convey different information (indeed, the meaning of the one flatly contradicts that of the other).

2. Structure of English Sentence

When studying the structure of a unit, we find out its components, mostly units of the next lower level, their arrangement and their functions as parts of the unit.

Many linguists think that the investigation of the components and their arrangement suffices. Thus Holliday writes: «Each unit is characterized by certain structures. The structure is a syntagmatic framework of interrelated elements, which are paradigmatically established in the systems of classes and stated as values in the structure…. if a unit 'word' is established there will be dimensions of word-classes the terms in which operate as values in clause structures: given a verb /noun/ adverb system of word classes, it might be that the structures ANV and NAV were admitted in the clause but NVA excluded».

Now `a syntagmatic framework of interrelated elements' may describe the structure of a combination of units as well as that of a higher unit, a combination of words as well as a sentence or a clause. The-important properties that unite the interrelated elements into a higher unit of which they become parts, the function of each element as part of the whole, are not mentioned.

Similarly, Z. Harris thinks that the sentence The fear of war grew can be described as TN1PN2V, where T stands for article, N for noun, P for preposition and V for verb.

Such descriptions are feasible only if we proceed from the notion that the difference between the morpheme, the word and the sentence is not one of quality but rather of quantity and arrangement.

Z. Harris does not propose to describe the morpheme (as he calls it) is as VC, where V stands for vowel and C for consonant. He does not do so because he regards a morpheme not as an arrangement of phonemes, but as a unit of a higher level possessing some quality (namely, meaning) not found in any phoneme or combination of phonemes outside the morpheme. Since we assume that not only the phoneme and the morpheme, but also the word and the sentence are units of different levels, we cannot agree to the view that a sentence is merely an arrangement of words.

In our opinion, The fear of war grew is a sentence not because it is TNPNV, but because it has properties not inherent in words. It is a unit of communication and as such it possesses predicativity and intonation. On the other hand, TNPNV stands also for the fear of war growingthe fear of war to grow, which are not sentences.

As to the arrangement of words in the sentence above, it fully depends upon their combinability. We have TN and not NT because an article has only right-hand connections with nouns. A prepositional phrase, on the contrary has left-hand connections with nouns; that is why we have TNPN, etc.

The development of transform grammar (Harris, Chomsky) and tagmemic grammar (Pike) is to a great extent due to the realization of the fact that «an attempt to describe grammatical structure in terms of morpheme classes alone - even successively inclusive classes of classes - is insufficient».

As defined by Harris, the approach of transformational grammar differs from the above-described practice of characterizing «each linguistic entity… as composed out of specified ordered entities at a lower level» in presenting «each sentence as derived in accordance with a set of transformational rules, from one or more (generally simpler) sentences, i.e. from other entities of the same level. A language is then described as consisting of specified sets of kernel sentences and a set of transformations».

For English Harris lists seven principal patterns of kernel sentences:

1. NvV (v stands for a tense morpheme or an auxiliary verb, i.e. for a (word-) morpheme containing the meanings of predicativity).

2. NvVPN

3. NvVN

4. N is N

5. N is A (A stands for adjective)

6. N is PN

7. N is D (D stands for adverb)

As one can easily see, the patterns above do not merely represent arrangements of words, they are such arrangements which contain predicativity - the most essential component of a sentence. Given the proper intonation and replaced by words 4hat conform to the rules of combinability, these patterns will become actual sentences. Viewed thus, the patterns may be regarded as language models of speech sentences.

One should notice, however, that the difference between the patterns above is not, in fact, a reflection of any sentence peculiarities. It rather reflects the difference in the combinability of various subclasses of verbs.

The difference between `NvV and `NvVN', for instance, reflects the different combinability of a non-transitive and a transitive verb (He is sleeping: He is writing letters. Cf. to sleep, to write letters). The difference between those two patterns and `N is A' reflects the difference in the combinability of notional verbs and link verbs, etc.

A similar list of patterns is recommended to language teachers under the heading These are the basic patterns for all English sentences:

1. Birds fly.

2. Birds eat worms.

3. Birds are happy.

4. Birds are animals.

5. Birds give me happiness.

6. They made me president.

7. They made me happy.

The heading is certainly rather pretentious. The list does not include sentences with zero predications or with partially implied predicativity while it displays the combinability of various verb classes.

S. Potter reduces the number of kernel sentences to three: «All simple sentences belong to one of three types:

A. The sun warms the earth;

B. The sun is a star; and

C. The sun is bright.»

And as a kind of argument he adds: «Word order is changeless in A and B, but not in C. Even in sober prose a man may say Bright is the sun.»

The foregoing analysis of kernel sentences, from which most English sentences can be obtained, shows that «every sentence can be analysed into a centre, plus zero or more constructions… The centre is thus an elementary sentence; adjoined constructions are in general modifiers». S In other words, the essential structure constituting a sentence is the predication; all other words are added to it in accordance with their combinability. This is the case in an overwhelming majority of English sentences. Here are some figures based on the investigation of modern American non-fiction.



Frequency of occurrence

(per cent)

as sole pattern

in combination






Subject + verb

Babies cry.

Subject + verb + objec

Girls like clothes.

Subject + verb + predicative

Dictionaries are books.

Dictionaries are useful.

Structural subjects + verb +

+ notional subject

There is evidence.

It is easy\o learn knitting.

Minor patterns

Are you sure?

Whom did you invite?

Brush your teeth. What a day










Some analogy can be drawn between the structure of a word and the structure of a sentence. The morphemes of a word are formally united by stress. The words of a sentence are formally united by intonation.

The centre of a word is the root. The centre of a sentence is the predication.

Some words have no other morphemes but the root (ink, too, but). Some sentences have no other words but those of the predication (Birds fly. It rains. Begin.).

Words may have some morphemes besides the root (unbearable). Sentences may have some words besides the predication (Yesterday it rained heavily.).

Sometimes a word is made of a morpheme that is usually not a root (ism). Sometimes sentences are made of words that are usually not predications (Heavy rain).

Words may have two or more roots (blue-eyed, merry-go-round). Sentences may have two or more predications (He asked me if I knew where she lived.).

The roots may be co-ordinated or subordinated (Anglo-Saxon, blue-bell). The predications may be co-ordinated and subordinated (She spoke and he listened. He saw Sam did not believe).

The roots may be connected directly (footpath) or indirectly, with the help of some morpheme salesman. The predications may be connected directly (7 think he knows) or indirectly, with the help of some word (The day passed as others had-passed.).

The demarcation line between a word with more than one root and a combination of words is often very vague (cf. blackboard and black board, brother-in-law and brother in arms). The demarcation line between a sentence with more than one predication and a combination of sentences is often very vague.

Cf. She'd only to cross the pavement. But still she waited. (Mansfield).

As we know, a predication in English is usually a combination of two words (or word-morphemes) united by predicativity, or, in other words, a predicative combination of words. Apart from that the words of a predication do not differ from other' words in conforming to the general rules of. Combinability. The rules of grammatical combinability do not admit of *boys speaks or *he am. The combination *the fish barked is strange as far as lexical combinability is concerned, etc.

All the other words of a sentence are added to those of the predication in accordance with their combinability to make the communication as complete as the speaker wishes. The predication Boys play can make a sentence by itself. But the sentence can be extended by realizing the combinability of the noun boys and the verb play into the three noisy boys play boisterously upstairs. We can develop the sentence into a still more extended one. But however extended the sentence is it does not lose its integrity. Every word in it is not just a word, it becomes part of the sentence and must be evaluated in its relation to other parts and to the whole sentence much in the same way as a morpheme in a word is not just a morpheme, but the root of a word or a prefix, or a suffix, or an inflection.

Depending on their relation to the members of the predication the words of a sentence usually fall into two groups - the group of the subject and the group of the predicate.

Sometimes there is a third group, of parenthetical words, which mostly belongs to the sentence as a whole. In the sentence below the subject group is separated from the predicate group by the parenthetical group.

That last thing of yours, dear Flora, was really remarkable.

As already mentioned, the distribution and the function of a word-combination in a sentence are usually determined by its head-word: by the noun in noun word-combinations, by the verb in verb word-combinations, etc.

The adjuncts of word-combinations in the sentence are added to their head-words in accordance with their combinability, to develop the sentence, to form its secondary parts which may be classified with regard to their head-words.

All the adjuncts of noun word-combinations in the sentence can be united under one name, attributes. All the adjuncts of verb (finite or non-finite) word-combinations may be termed complements. In the sentence below, the attributes are spaced out and the complements are in heavy type.

He often took Inene to the theatre. Instinctively choosing the modern Society plays with the modern Society conjugal problems. (Galsworthy).

The adjuncts of all other word-combinations in the sentence may be called extensions. In the sentences below the extensions are spaced out.

You will never be free from dozing and dreams. (Shaw).

She was ever silent, passive, gracefully averse. (Gals-worthy).

The distribution of semi-notional words in the sentence is determined by their functions - to connect notional words or to specify them. Accordingly they will be called connectives or specifies. Conjunctions and prepositions are typical connectives. Particles are typical specifies.

3. Sentence Reduction

Sentence reduction is the removal of redundant words or phrases from an input sentence by creating a new sentence in which the gist of the original meaning of the sentence remains unchanged.

3.1 A Sentence Reduction Using Syntax Control Abstract

Methods of sentence reduction have been used in many applications. Grefenstette (G.Grefenstette, 1998) proposed removing phrases in sentences to

produce a telegraphic text that can be used to provide audio scanning services for the blind. Dolan (S.H. Olivers and W.B.Dolan, 1999) proposed removing clauses in sentences before indexing document for information retrieval. Those methods remove phrases based on their syntactic categories but not rely on the context of words, phrases and sentences around. Without using that information can be reduced the accuracy of sentence reduction problem. Mani and Maybury also present a process of writing a reduced sentence by reversing the original sentence with a set of revised rules to improve the performance of summarization. (Inderject Mani and Mark Maybury, 1999). Jing and McKeown(H. Jing, 2000) studied a new

method to remove extraneous phrase from sentences by using multiple source of knowledge to decide which phrase in the sentences can be removed. The multiple sources include syntactic knowledge, context information and statistic computed from a corpus that consists of examples written by human professional.

Their method prevented removing some phrases that were relative to its context around and produced a grammatical sentence. Recently, Knight and Marcu(K.Knight and D.Marcu, 2002) demonstrated two methods for sentence compression problem, which are similar to sentence reduction one. They devised both noisychannel and decision tree approach to the problem. The noisy-channel framework has been used in many applications, including speech recognition, machine translation, and information retrieval. The decision tree approach has been used in parsing sentence. (D. Magerman, 1995)(Ulf Hermijakob and J.Mooney, 1997) to define the rhetorical of text documents (Daniel Marcu, 1999). Most of the previous methods only produce a short sentence whose word order is the same as that of the original sentence, and in the same language, e.g., English. When nonnative speaker reduce a long sentence in foreign language, they usually try to link the meaning of words within the original sentence into meanings in their language. In addition, in some cases, the reduced sentence and the original sentence had their word order are difference. Therefore, two reduced sentences are performed by non-native speaker, one is the reduced sentence in foreign language and another is in their language. Following the behavior of nonnative speaker, two new requirements have been arisen for sentence reduction problem as follows:

1) The word order of the reduced sentence may different from the original sentence.

2) Two reduced sentences in two difference languages can be generated.

With the two new perspectives above, sentence reduction task are useful for many applications such as: information retrieval, query text summarization and especially cross-language information retrieval. To satisfy these new requirements, we proposed a new algorithm using semantic information to simulate the behavior of nonnative-speaker. The semantic information obtained from the original sentence will be integrated into the syntax tree through syntax control.


Let and be two difference languages. Given a long sentence e1; e2;:::; en in the language E. The task of sentence reduction into two languages and is to remove or replace some redundant words in the sentence to generate two new

sentences e0 1; e0 2; :::; e0 and v1; v2; :::; vk in language E and so that their gist meanings are unchanged. In practice, we used English language as a source language and the target language are in English and Vietnamese. However, the reader should understand that our method can apply for any pair of languages. In the following part we present an algorithm of sentence reduction using syntax control with rich semantic information.

Sentence reduction algorithm

We present an algorithm based on a semantic parsing in order to generate two short sentences into difference languages. There are three steps in a reduction

algorithm using syntax control. In the first step, the input sentence will be parsed into a syntax tree t through a syntax parser. In the second step, the syntax tree will be added rich semantic information by using a semantic parser, in which each node of the syntax tree is associated with a specific syntax control. The final step is

a process of generating two deference sentences into language and language from the syntax tree t that has been annotated with rich semantic information.

Syntax parsing

First, We parse a sentence into a syntax tree. Our syntax parser locates the subject, object, and head word within a sentence. It also recognizes phrase verbs, cue phases or expressions in English sentences. These are useful information to reduce sentence. The Figure 2 explains the equivalent of our grammar symbol with English grammar symbol. Figure 1 shows an example of our syntax parsing for the sentence ”Like FaceLift, much of ATM's screen performance depends on the underlying application”. To reduce the ambiguity, we design a syntactic parsing

base on grammar symbols, which classified in detail. Part of speech of words was extended to cope with the ambiguity problem. For example, in Figure 2, ”noun” was dived into ”private noun” and ”general noun”.

The bilingual dictionary was built including about 200,000 words in English and its meaning in Vietnamese. Each English word entry includes several meanings in Vietnamese and each meaning was associated with a symbol meaning. The set of symbol meanings in each word entry is defined by using WordNet database.(C. Fellbaum, 1998) The dictionary also contained several phrases, expressions in Figure 1: An example of syntax tree of ”Like FaceLift, much of ATM's screen performance depends on the underlying application” English and its equivalent to Vietnamese.

Semantic parsing using syntax control

After producing a syntax tree with rich information, we continue to apply a semantic parsing for that syntax tree. Let be an internal node of the syntax tree and N has children nodes: n1; n2; :::nk . The node based on semantic information from its children nodes to consider what the remained part in the reducing sentence should be. When parsing semantic for the syntax tree t, each must be used the information of children nodes to define its information. We call that information is semantic-information of the node and define it as N:sem . In addition, each semantic-information of a given node Nwas mapped with a meaning in the Figure 2: Example of symbol Equivalent target language. For convince, we define SI is a set of semanticinformation and assume that the jth semanticinformation of the node nj is nj [i]. To understand what the meaning of the node N should be, we have to know the meaning of each children node and know how to combine them into meanings for the node . Figure 3: Syntax control Figure 3 shows two choices for sequence meanings of the node in a reduction process . It is easy for human to understand exactly which meaning of ni should be and then decoding them as objects to memorize. With this basic idea, we design a control language to do this task. The children nodes n1; n2; :::nkare associated

with a set of a syntax control to conduct the reducing sentence process. The node and its children are associated with a set of rules. To present the set of rules we used a simple syntax of a control language as follows:

1) Syntax to present the order of children nodes and nodes to be removed.

2) Syntax to constraint each meaning of a children node with meanings of other children nodes.

3) Syntax to combine sequence meanings into one symbol meaning (this process called a inherit process from the node Nto its children). A syntax rule control will be encoded as onegeneration rules and a set of condition rules so that the generation rule has to satisfy. With a specification condition rule, we can define its generation rule directly.

Condition rule

A condition rule is formulated as follows: if nj1:sem v^ nj2:sem v2::: ^ njm:sem vm then N:sem with and vj 2 SI

Generation rule

A generation rule is a sequence of symbols in order to transfer the internal node into the internal node of a reduced sentence. We used two generation rules, one for and other one for . Given a sequence symbols g1g2:::gm , in which gi is an

integer or a string. The equation gi means the children node be remained at position in the target node. If gi = "v1v2:::vl", we have that string will in the children node ni of the target node. Figure 1 shows a syntax tree of the input sentence: ”Much of ATM's performance depends on the underlying application.”. In this syntax tree, the syntax rule:”S1=Bng-daucau Subj cdgt Bng-cuoicau” will

be used the syntax control bellow to reduce < Con > default < =Con > < Gen > 1 2 < =Gen > The condition rule is ”default” mean the generation rule is applied to any condition rule. The generation rule be ”1 2” mean only the node (Subj) in the

index 1 and the node (cdgt) in the index 2 of the rule ”S1=Bng-daucau Subj cdgt Bng-cuoicau” are remained in the reduced sentence. If the syntax control is changed to < Con > Subj HUMAN < =Con > < Gen > 1 2 < =Gen > This condition rule means that only the case the semantic information in the children node "Subj" is "HUMAN" the generation rule ”1 2” is applied for reduction process. Using the default condition rule the reduced sentences to be generated as follows. Original sentence: Like FaceLift, much of ATM's screen performance depends on the underlying application. Reduced sentence in English: Much of ATM's performance depends on the underlying application. Reduced sentence in Vietnamese: Nhieu hieu suat cua ATM phu thuoc vao nhung ung dung tiem an.

In order to generating reduced sentence in Vietnamese language, the condition rule and generation is also designed. This process is used the same way as transfer translation method. Because the gist meaning of a short sentence is unchanged in comparing with the original sentence, the gist meaning of a node after applying the syntax control will be unchanged. With this assumption, we can reuse the syntax control for translating the original sentence into other languages (English into Vietnamese) for translating the reduced sentence. Therefore, our sentence reduction program can produce two reduced sentences in two difference languages.

Our semantic parsing used that set of rules to select suitable rules for the current context. The problem of selecting a set of suitable rules for the current context

of the current node is to find the most likely condition rule among the set of syntax control rules that associated with it. Thus, semantic parsing using syntax control problem can be described mathematically as follows:

Given a sequence of children nodes n1; n2; :::; nk of a node N, each node ni consist of a list of meaning, in which each meaning was associated with a symbol meaning. The syntax rule for the node was associated with a set of condition rules. In addition, one condition rule is mapped with a specification generation rule. Find the most condition rules for that node sequences. This problem can be solved by using a variant of the Viterbi algorithm (A.J. Viterbi, 1967). Firstly, we define each semantic-information of a children node with all index condition rules. Secondly, we try to find all sequences that come from the same condition rules.

3.2 A New Sentence Reduction based on Decisions tree model

This chapter is about a novel sentence reduction algorithm base on decision tree model where semantic information is used to enhance the accuracy of sentence reduction. The proposed algorithm is able to deal with the changeable order problem in sentence reduction. Experimental show a better result when comparing with the original methods.

Many researches in automatic text summarization were focused on extraction or identifying the important clauses and sentences, paragraphs in texts. Meanwhile, humans used to produce summaries by creating new sentences that are grammatical, that cohere with one another, and capture the most salient parts of information in the original document. Sentence reduction is the problem of removing some redundant words or some phrases from the original sentence by creating a new sentence, in which the gist meaning of the original sentence was changed . Methods of sentence reduction have been applied in many applications. Grefenstette (Grefenstette,S,1998) proposed removing phrases in sentences to produce a telegraphic text that can be used to provide audio scanning services for the blind. Dolan (Donlan,W.B, 1999) proposed removing clauses in sentences before indexing document for information retrieval. Those methods removed phrases based on their syntactic categories without relying on the context of words, phrases and sentences around. Therefore, those methods are unsuitable for text summarization task. Sentence reduction for text summarization is pointed out by Mani and Maybury (Mani and Maybury,1999). The authors present a process of writing reduced sentences by reversing the original sentence with a set of revised rules. Jing (Jing,H, 2000) also studied a method to remove extraneous phrases from sentences by using multiple source of knowledge to decide which phrase can be removed. The multiple sources include syntactic knowledge, context information and statistic computed from a corpus that consists of examples written by human professional. Their method prevented removing some phrases that were relative to its context around and produced a grammatical sentence, and applied to the cut and paste summarization strategy. Recently, Knight and Marcus (Knight and Marcu,D, 2002) demonstrated two methods for sentence compression problem based on corpus. They devised both noisy-channel and decision tree approach to the problem. The decision tree approach has been applied in parsing sentence and defining the rhetorical of text documents and achieved a good results in sentence compression.

In almost previous methods, the order of reduced sentences is the same with the original sentence. Meanwhile, in summrizing document, human may perform a changeable order to ensure the summary document is smooth and coherence. This fact requires a new sentence reduction with the order of reduced sentence is different from the orignal. In addition to using sentence reduction for text summarization, the information of syntactic is not enough. The semantic information of original sentences should be incorporated with reduction process to enhance the accuracy of reduction process. This fact is also similar to the behavior of human in reduction sentence that they can understand the meaning of original sentences to ensure that important words is remained in reduced sentences. To satisfy the new requirements mentioned above, we proposed a new sentence reduction based on decision tree model where semantic information is used to support reduction process. The decision tree model is also extended to cope with the changeable order between original sentences and reduced sentences.

Decision tree model for sentence reduction

The following sections will present a sentence reduction based on decision tree model using rich semantic information. Let tand be a syntax tree of the original sentence and a reduced sentence respectively. To perform a rewriting process we used an Input list, two stacks and some rewriting operators are defined as follows.

An Input list consists of a sequence of words subsumed by the tree where each word in the input list is labeled with the name of all syntactic constituents in that start with it. CSTACK is a stack consists of all sub trees in order to rewrite a small tree. RSTACK is a stack consists of all removed nodes in rewriting process from a large tree into a small tree s .Five operators are used to rewrite a larger tree into a smaller tree s are as follows;

* smFr-operator transfers a first word from the input list into CSTACK. It was written in

mathematic by the label sta-T.

* REDUCE-operators pops the k syntactic trees located at the top of CSTACK and combine them into a new tree. These operators are formulated as REDUCE (k, x ) , in which k is an integer and is a grammar symbol.

* DROP-operators are used to remove from the input list subsequences of word that correspond to syntactic constituents to RSTACK. Both REDUCE-operators and DROP-operators are used to derive the structure of the syntactic tree of the short sentence. They were written as DROP x with X is a grammar symbol.

* ASSIGN TYPE-operators are used to change the label of trees at the top of the CSTACK. These POS tags may be different from the POS tags in the original sentence. These operators are written as ASSIGN TYPE (X) , which are POS tags.

* RESTORE-operators take the leh element in RSTACK to remove that element into the Input list. These operators are designed with the assumption that a sub-tree was removed from the input list still affects the current decision. We also formulated it as RESTORE where is an integer.

A DROP x operators deletes from the input list all words that are spanned by constituent x in and store them into CSTACK. The operator RESTORE is designed to restore some words in RSTACK to generate a small tree s . With these operators, the order of words within a small tree s can be changed in comparing with the word order of the large tree t .


The features we used in this model consist of:

* Some features come from the input list.

* Some features come from the configuration of CSTACK.

* Some features come from the configuration of RSTACK.

There are two kinds of features were described followings:

Operation features

Theses operators reflect the number of trees in CSTACK and the number of elements in RSTACK and the type of the last five operators. We also consider the information of two stacks as the information denotes the syntactic category of the root nodes of the partial trees build up to a certain time.

Original tree specific features denote the syntactic constituents that start with the first unit in the input list.

Semantic feature

The semantic features we used including: The semantic information of current words within the input list. The semantic type we used including some general semantic types such as, HUMAN, THINGS, ANIMAL, CONCEPT, INSTRUCTOR, COMPUTER, etc. Some semantic information such as, the word in the input list is head word or not. The boolean value is used to define whether or not a word is in the subcategorization table.

Process of reduction sentence

After using decision tree learning to generate a set of rules, we have each configuration of two stacks and input list that correspond to a decision action. A given input sentence was parsed and each word within the input list was corresponding to the word in the sentence and the sequence of syntactic constituents that begins with at each word. We simulate the rewriting process, in which each configuration of two stacks and one input list were executed with an operator and change to a new state and so on. The processes repeat until the Input list is empty and there is only one sub tree in CSTACK with its root node is the one of terminal symbols (the symbol to recognize it as a root symbol) or RSTACK is empty. An order traversal of the leaves of this tree that produces the reduced version of the sentence was given as input.

Reduction Procedure

Input : an input sentence

Output: a reduced sentence

Step 1. The input sentence is parsed into a syntax tree.

Step 2. The syntax tree is enriched semantic information .

Step 3. create an input list and set CSTack and RStack to empty.

Step 4. Call a traversal procedure to obtain a reduced syntax tree

Step 5. Generate a reduced sentence from the reduced syntax tree

Traversal procedure

Input: Input list, CSTack, RStack

Output: A reduced tree while(not terminal condition) { feature=get contextual feature(); action= get action(feature); parameter=get_parameter(action); switch (action) { case SHIFT: SHIFT(); case ASSIGN TYPE: ASSIGN TYPE(parameter); break; case REDUCE: Reduce(parameter); break; case DROP: Drop(parameter); break; case RESTORE: Restore(parameter); break;}

The process of reduction sentence

In the traversal procedure, we use some functions and sub procedures are as follows: get contextual features, get actionand get parameter. The function get contextual, features extracts the vector of features. The function get action and get parameter are used to get information of operator and parameter for performing the procedure SHIFT, DROP, RESTORE, ASSIGNT TYPE and REDUCE.

We have presented the new algorithms that allow rewriting a long sentence into reduced sentences with the order of short sentence is able to be different from the original sentence. We claimed that the semantic information of the original sentence was very useful for sentence reduction problem. Experimental results showed that the proposed algorithm improved the original algorithm. For future work we continue testing on the large corpus and integrating with a summarization system are currently underway.

Краткое описание документа:

The actuality of this work caused by several important points. We seem to say that the sentence reduction is one of the main trends in development of Modern English, especially in its colloquial layer, which, in its turn at high degree is supported by development of modern informational technologies and simplification of alive speech. So the significance of our work can be proved by the following reasons:

Общая информация

Номер материала: 307545

Вам будут интересны эти курсы:

Курс профессиональной переподготовки «Английский язык: лингвистика и межкультурные коммуникации»
Курс «Русский для иностранцев»
Курс профессиональной переподготовки «Французский язык: теория и методика обучения иностранному языку в образовательной организации»
Курс профессиональной переподготовки «Испанский язык: теория и методика обучения иностранному языку в образовательной организации»
Курс «Английский язык для начинающих (Beginner)»
Курс повышения квалификации «Специфика преподавания английского языка с учетом требований ФГОС»
Курс повышения квалификации «Специфика преподавания немецкого языка с учетом требований ФГОС»
Курс повышения квалификации «Специфика преподавания французского языка с учетом требований ФГОС»
Курс повышения квалификации «Специфика преподавания итальянского языка с учетом требований ФГОС»
Курс повышения квалификации «Специфика преподавания китайского языка с учетом требований ФГОС»
Курс профессиональной переподготовки «Теория и методика преподавания иностранных языков: английский, немецкий, французский»
Курс профессиональной переподготовки «Теория и методика билингвального обучения иноcтранным языкам»
Курс повышения квалификации «Организация кросс-культурной адаптации иностранных студентов в образовательных организациях в сфере профессионального образования»
Курс повышения квалификации «Теория и методика преподавания основ латинского языка с медицинской терминологией в организациях СПО»
Курс повышения квалификации «Специфика преподавания русского языка как иностранного»

Благодарность за вклад в развитие крупнейшей онлайн-библиотеки методических разработок для учителей

Опубликуйте минимум 3 материала, чтобы БЕСПЛАТНО получить и скачать данную благодарность

Сертификат о создании сайта

Добавьте минимум пять материалов, чтобы получить сертификат о создании сайта

Грамота за использование ИКТ в работе педагога

Опубликуйте минимум 10 материалов, чтобы БЕСПЛАТНО получить и скачать данную грамоту

Свидетельство о представлении обобщённого педагогического опыта на Всероссийском уровне

Опубликуйте минимум 15 материалов, чтобы БЕСПЛАТНО получить и скачать данное cвидетельство

Грамота за высокий профессионализм, проявленный в процессе создания и развития собственного учительского сайта в рамках проекта "Инфоурок"

Опубликуйте минимум 20 материалов, чтобы БЕСПЛАТНО получить и скачать данную грамоту

Грамота за активное участие в работе над повышением качества образования совместно с проектом "Инфоурок"

Опубликуйте минимум 25 материалов, чтобы БЕСПЛАТНО получить и скачать данную грамоту

Почётная грамота за научно-просветительскую и образовательную деятельность в рамках проекта "Инфоурок"

Опубликуйте минимум 40 материалов, чтобы БЕСПЛАТНО получить и скачать данную почётную грамоту

Включите уведомления прямо сейчас и мы сразу сообщим Вам о важных новостях. Не волнуйтесь, мы будем отправлять только самое главное.