A
Sentence Reduction in Modern English
1. The Sentence
The
notion of sentence has not so far received a satisfactory definition, which
would enable us by applying it in every particular case to find out whether a
certain linguistic unit was a sentence or not.
Thus,
for example, the question remains undecided whether such shop notices as Book
Shop and such book titles as English are sentences or not. In favour of the
view that they are sentences the following consideration can be brought forward.
The notice Book Shop and the title English Grammar mean 'This is a book shop',
'This is an English Grammar'; the phrase is interpreted as the predicative of a
sentence whose subject and link verb have been omitted, that is, it is
apprehended as a unit of communication. According to the other possible view,
such notices as Book Shop and such titles as English Grammar are not units of
communication at all, but units of nomination, merely appended to the object
they denote. Since there is as yet no definition of a sentence which would
enable us to decide this question, it depends on everyone's subjective view
which alternative he prefers. We will prefer the view that such notices and
book titles are not sentences but rather nomination units.
We
also mention here a special case. Some novels have titles formulated as
sentences, e. g. The Stars Look Down, by A. Cronin, or They
Came to a City, by J.B. Priestley. These are certainly sentences, but
they are used as nomination units, for instance, Have you read The Stars
Look Down? Do you like They Came to a City?
With
the rise of modern ideas of paradigmatic syntax yet another problem concerning
definition of sentence has to be considered.
In
paradigmatic syntax, such units as He has arrived, He has not arrived,
Has he arrived, He will arrive, He will not arrive, Will he arrive, etc.,
are treated as different forms of the same sentence, just as arrives,
has arrived, will arrive etc., are
different forms of the same verb. We may call this view of the sentence the
paradigmatic view.
Now
from the point of view of communication, He has arrived and He has not arrived
are different sentences since they convey different information (indeed, the
meaning of the one flatly contradicts that of the other).
2. Structure of English Sentence
When
studying the structure of a unit, we find out its components, mostly units of
the next lower level, their arrangement and their functions as parts of the
unit.
Many
linguists think that the investigation of the components and their arrangement
suffices. Thus Holliday writes: «Each unit is characterized by certain
structures. The structure is a syntagmatic framework of interrelated elements,
which are paradigmatically established in the systems of classes and stated as
values in the structure…. if a unit 'word' is established there will be
dimensions of word-classes the terms in which operate as values in clause
structures: given a verb /noun/ adverb system of word classes, it might be that
the structures ANV and NAV were admitted in the clause but NVA excluded».
Now `a
syntagmatic framework of interrelated elements' may describe the structure of a
combination of units as well as that of a higher unit, a combination of words
as well as a sentence or a clause. The-important properties that unite the
interrelated elements into a higher unit of which they become parts, the
function of each element as part of the whole, are not mentioned.
Similarly,
Z. Harris thinks that the sentence The fear of war grew can be described as TN1PN2V,
where T stands for article, N for noun, P for preposition and V for verb.
Such
descriptions are feasible only if we proceed from the notion that the
difference between the morpheme, the word and the sentence is not one of
quality but rather of quantity and arrangement.
Z.
Harris does not propose to describe the morpheme (as he calls it) is as VC,
where V stands for vowel and C for consonant. He does not do so because he
regards a morpheme not as an arrangement of phonemes, but as a unit of a higher
level possessing some quality (namely, meaning) not found in any phoneme or
combination of phonemes outside the morpheme. Since we assume that not only the
phoneme and the morpheme, but also the word and the sentence are units of
different levels, we cannot agree to the view that a sentence is merely an
arrangement of words.
In our
opinion, The fear of war grew is a sentence not because it is
TNPNV, but because it has properties not inherent in words. It is a unit of
communication and as such it possesses predicativity and intonation. On the
other hand, TNPNV stands also for the fear of war growing, the
fear of war to grow, which are not sentences.
As to
the arrangement of words in the sentence above, it fully depends upon their
combinability. We have TN and not NT because an article has only right-hand
connections with nouns. A prepositional phrase, on the contrary has left-hand
connections with nouns; that is why we have TNPN, etc.
The
development of transform grammar (Harris, Chomsky) and tagmemic grammar (Pike)
is to a great extent due to the realization of the fact that «an attempt to
describe grammatical structure in terms of morpheme classes alone - even
successively inclusive classes of classes - is insufficient».
As
defined by Harris, the approach of transformational grammar differs from the
above-described practice of characterizing «each linguistic entity… as composed
out of specified ordered entities at a lower level» in presenting «each
sentence as derived in accordance with a set of transformational rules, from
one or more (generally simpler) sentences, i.e. from other entities of the same
level. A language is then described as consisting of specified sets of kernel
sentences and a set of transformations».
For
English Harris lists seven principal patterns of kernel sentences:
1. NvV
(v stands for a tense morpheme or an auxiliary verb, i.e. for a (word-)
morpheme containing the meanings of predicativity).
2.
NvVPN
3.
NvVN
4. N
is N
5. N
is A (A stands for adjective)
6. N
is PN
7. N
is D (D stands for adverb)
As one
can easily see, the patterns above do not merely represent arrangements of
words, they are such arrangements which contain predicativity - the most
essential component of a sentence. Given the proper intonation and replaced by
words 4hat conform to the rules of combinability, these patterns will become
actual sentences. Viewed thus, the patterns may be regarded as language models
of speech sentences.
One
should notice, however, that the difference between the patterns above is not,
in fact, a reflection of any sentence peculiarities. It rather reflects the
difference in the combinability of various subclasses of verbs.
The
difference between `NvV and `NvVN', for instance, reflects the different
combinability of a non-transitive and a transitive verb (He is sleeping: He
is writing letters. Cf. to sleep, to write letters). The
difference between those two patterns and `N is A' reflects the difference in
the combinability of notional verbs and link verbs, etc.
A
similar list of patterns is recommended to language teachers under the heading
These are the basic patterns for all English sentences:
1.
Birds fly.
2.
Birds eat worms.
3.
Birds are happy.
4.
Birds are animals.
5.
Birds give me happiness.
6.
They made me president.
7.
They made me happy.
The
heading is certainly rather pretentious. The list does not include sentences
with zero predications or with partially implied predicativity while it
displays the combinability of various verb classes.
S.
Potter reduces the number of kernel sentences to three: «All simple sentences
belong to one of three types:
A. The
sun warms the earth;
B. The
sun is a star; and
C. The
sun is bright.»
And as
a kind of argument he adds: «Word order is changeless in A and B, but not in C.
Even in sober prose a man may say Bright is the sun.»
The
foregoing analysis of kernel sentences, from which most English sentences can
be obtained, shows that «every sentence can be analysed into a centre, plus
zero or more constructions… The centre is thus an elementary sentence; adjoined
constructions are in general modifiers». S In other words, the essential
structure constituting a sentence is the predication; all other words are added
to it in accordance with their combinability. This is the case in an
overwhelming majority of English sentences. Here are some figures based on the
investigation of modern American non-fiction.
|
No
|
Pattern
|
Frequency of occurrence
(per cent)
|
|
|
|
as sole pattern
|
in combination
|
|
1.
2.
3.
4.
5.
|
Subject + verb
Babies cry.
Subject + verb + objec
Girls like clothes.
Subject + verb + predicative
Dictionaries are books.
Dictionaries are useful.
Structural subjects + verb +
+ notional subject
There is evidence.
It is easy\o learn knitting.
Minor patterns
Are you sure?
Whom did you invite?
Brush your teeth. What a day
|
2.51
32.9
20.8
4.3
7.9
|
5.3
5.9
6.4
0.9
|
|
|
|
|
|
|
Some
analogy can be drawn between the structure of a word and the structure of a
sentence. The morphemes of a word are formally united by stress. The words of a
sentence are formally united by intonation.
The
centre of a word is the root. The centre of a sentence is the predication.
Some
words have no other morphemes but the root (ink, too, but). Some sentences have
no other words but those of the predication (Birds fly. It rains. Begin.).
Words
may have some morphemes besides the root (unbearable). Sentences may have some
words besides the predication (Yesterday it rained heavily.).
Sometimes
a word is made of a morpheme that is usually not a root (ism). Sometimes
sentences are made of words that are usually not predications (Heavy rain).
Words
may have two or more roots (blue-eyed, merry-go-round). Sentences may have two
or more predications (He asked me if I knew where she lived.).
The
roots may be co-ordinated or subordinated (Anglo-Saxon, blue-bell). The
predications may be co-ordinated and subordinated (She spoke and he listened.
He saw Sam did not believe).
The
roots may be connected directly (footpath) or indirectly, with the help of some
morpheme salesman. The predications may be connected directly (7 think he
knows) or indirectly, with the help of some word (The day passed as others
had-passed.).
The
demarcation line between a word with more than one root and a combination of
words is often very vague (cf. blackboard and black board, brother-in-law and
brother in arms). The demarcation line between a sentence with more than one
predication and a combination of sentences is often very vague.
Cf.
She'd only to cross the pavement. But still she waited. (Mansfield).
As we
know, a predication in English is usually a combination of two words (or
word-morphemes) united by predicativity, or, in other words, a predicative
combination of words. Apart from that the words of a predication do not differ
from other' words in conforming to the general rules of. Combinability. The
rules of grammatical combinability do not admit of *boys speaks or *he am. The
combination *the fish barked is strange as far as lexical combinability is
concerned, etc.
All
the other words of a sentence are added to those of the predication in
accordance with their combinability to make the communication as complete as
the speaker wishes. The predication Boys play can make a sentence by itself.
But the sentence can be extended by realizing the combinability of the noun
boys and the verb play into the three noisy boys play boisterously upstairs. We
can develop the sentence into a still more extended one. But however extended
the sentence is it does not lose its integrity. Every word in it is not just a
word, it becomes part of the sentence and must be evaluated in its relation to
other parts and to the whole sentence much in the same way as a morpheme in a
word is not just a morpheme, but the root of a word or a prefix, or a suffix,
or an inflection.
Depending
on their relation to the members of the predication the words of a sentence
usually fall into two groups - the group of the subject and the group of the
predicate.
Sometimes
there is a third group, of parenthetical words, which mostly belongs to the
sentence as a whole. In the sentence below the subject group is separated from
the predicate group by the parenthetical group.
That
last thing of yours, dear Flora, was really remarkable.
As
already mentioned, the distribution and the function of a word-combination in a
sentence are usually determined by its head-word: by the noun in noun
word-combinations, by the verb in verb word-combinations, etc.
The
adjuncts of word-combinations in the sentence are added to their head-words in
accordance with their combinability, to develop the sentence, to form its
secondary parts which may be classified with regard to their head-words.
All
the adjuncts of noun word-combinations in the sentence can be united under one
name, attributes. All the adjuncts of verb (finite or non-finite)
word-combinations may be termed complements. In the sentence below, the
attributes are spaced out and the complements are in heavy type.
He
often took Inene to the theatre. Instinctively choosing the modern Society
plays with the modern Society conjugal problems. (Galsworthy).
The
adjuncts of all other word-combinations in the sentence may be called
extensions. In the sentences below the extensions are spaced out.
You
will never be free from dozing and dreams. (Shaw).
She
was ever silent, passive, gracefully averse. (Gals-worthy).
The
distribution of semi-notional words in the sentence is determined by their
functions - to connect notional words or to specify them. Accordingly they will
be called connectives or specifies. Conjunctions and prepositions are typical
connectives. Particles are typical specifies.
3. Sentence Reduction
Sentence
reduction is the removal of redundant words or phrases from an input sentence
by creating a new sentence in which the gist of the original meaning of the
sentence remains unchanged.
3.1 A Sentence Reduction Using Syntax Control Abstract
Methods
of sentence reduction have been used in many applications. Grefenstette
(G.Grefenstette, 1998) proposed removing phrases in sentences to
produce
a telegraphic text that can be used to provide audio scanning services for the
blind. Dolan (S.H. Olivers and W.B.Dolan, 1999) proposed removing clauses in
sentences before indexing document for information retrieval. Those methods
remove phrases based on their syntactic categories but not rely on the context
of words, phrases and sentences around. Without using that information can be
reduced the accuracy of sentence reduction problem. Mani and Maybury also
present a process of writing a reduced sentence by reversing the original
sentence with a set of revised rules to improve the performance of
summarization. (Inderject Mani and Mark Maybury, 1999). Jing and McKeown(H.
Jing, 2000) studied a new
method
to remove extraneous phrase from sentences by using multiple source of
knowledge to decide which phrase in the sentences can be removed. The multiple
sources include syntactic knowledge, context information and statistic computed
from a corpus that consists of examples written by human professional.
Their
method prevented removing some phrases that were relative to its context around
and produced a grammatical sentence. Recently, Knight and Marcu(K.Knight and
D.Marcu, 2002) demonstrated two methods for sentence compression problem, which
are similar to sentence reduction one. They devised both noisychannel and
decision tree approach to the problem. The noisy-channel framework has been
used in many applications, including speech recognition, machine translation,
and information retrieval. The decision tree approach has been used in parsing
sentence. (D. Magerman, 1995)(Ulf Hermijakob and J.Mooney, 1997) to define the
rhetorical of text documents (Daniel Marcu, 1999). Most of the previous methods
only produce a short sentence whose word order is the same as that of the
original sentence, and in the same language, e.g., English. When nonnative
speaker reduce a long sentence in foreign language, they usually try to link
the meaning of words within the original sentence into meanings in their
language. In addition, in some cases, the reduced sentence and the original
sentence had their word order are difference. Therefore, two reduced sentences
are performed by non-native speaker, one is the reduced sentence in foreign
language and another is in their language. Following the behavior of nonnative
speaker, two new requirements have been arisen for sentence reduction problem
as follows:
1) The
word order of the reduced sentence may different from the original sentence.
2) Two
reduced sentences in two difference languages can be generated.
With
the two new perspectives above, sentence reduction task are useful for many
applications such as: information retrieval, query text summarization and
especially cross-language information retrieval. To satisfy these new
requirements, we proposed a new algorithm using semantic information to
simulate the behavior of nonnative-speaker. The semantic information obtained
from the original sentence will be integrated into the syntax tree through
syntax control.
Formulation
Let E and V be
two difference languages. Given a long sentence e : e1;
e2;:::; en in the language E. The task of sentence
reduction into two languages E and V is to
remove or replace some redundant words in the sentence e to
generate two new
sentences e0 1;
e0 2; :::; e0 m and v1; v2;
:::; vk in language E and V so that
their gist meanings are unchanged. In practice, we used English language as a
source language and the target language are in English and Vietnamese. However,
the reader should understand that our method can apply for any pair of
languages. In the following part we present an algorithm of sentence reduction
using syntax control with rich semantic information.
Sentence
reduction algorithm
We
present an algorithm based on a semantic parsing in order to generate two short
sentences into difference languages. There are three steps in a reduction
algorithm
using syntax control. In the first step, the input sentence e will
be parsed into a syntax tree t through a syntax parser. In the
second step, the syntax tree will be added rich semantic information by using a
semantic parser, in which each node of the syntax tree is associated with a
specific syntax control. The final step is
a
process of generating two deference sentences into language E and V language
from the syntax tree t that has been annotated with rich
semantic information.
Syntax
parsing
First,
We parse a sentence into a syntax tree. Our syntax parser locates the subject,
object, and head word within a sentence. It also recognizes phrase verbs, cue
phases or expressions in English sentences. These are useful information to
reduce sentence. The Figure 2 explains the equivalent of our grammar symbol
with English grammar symbol. Figure 1 shows an example of our syntax parsing
for the sentence ”Like FaceLift, much of ATM's screen performance depends on
the underlying application”. To reduce the ambiguity, we design a syntactic
parsing
base
on grammar symbols, which classified in detail. Part of speech of words was
extended to cope with the ambiguity problem. For example, in Figure 2, ”noun”
was dived into ”private noun” and ”general noun”.
The
bilingual dictionary was built including about 200,000 words in English and its
meaning in Vietnamese. Each English word entry includes several meanings in
Vietnamese and each meaning was associated with a symbol meaning. The set of
symbol meanings in each word entry is defined by using WordNet database.(C.
Fellbaum, 1998) The dictionary also contained several phrases, expressions in
Figure 1: An example of syntax tree of ”Like FaceLift, much of ATM's screen
performance depends on the underlying application” English and its equivalent
to Vietnamese.
Semantic
parsing using syntax control
After
producing a syntax tree with rich information, we continue to apply a semantic
parsing for that syntax tree. Let N be an internal node of the
syntax tree t and N has k children
nodes: n1; n2; :::nk . The node N based
on semantic information from its n children nodes to consider
what the remained part in the reducing sentence should be. When parsing
semantic for the syntax tree t, each N must be
used the information of children nodes to define its information. We call that
information is semantic-information of the node N and define
it as N:sem . In addition, each semantic-information of a
given node Nwas mapped with a meaning in the Figure 2: Example of
symbol Equivalent target language. For convince, we define SI is
a set of semanticinformation and assume that the jth semanticinformation
of the node nj is nj [i]. To
understand what the meaning of the node N should be, we have
to know the meaning of each children node and know how to combine them into
meanings for the node N . Figure 3: Syntax control Figure 3
shows two choices for sequence meanings of the node N in a
reduction process . It is easy for human to understand exactly which meaning
of ni should be and then decoding them as objects to memorize.
With this basic idea, we design a control language to do this task. The k children
nodes n1; n2; :::nkare associated
with a
set of a syntax control to conduct the reducing sentence process. The
node N and its children are associated with a set of rules. To
present the set of rules we used a simple syntax of a control language as
follows:
1)
Syntax to present the order of children nodes and nodes to be removed.
2)
Syntax to constraint each meaning of a children node with meanings of other
children nodes.
3)
Syntax to combine sequence meanings into one symbol meaning (this process
called a inherit process from the node Nto its children). A syntax
rule control will be encoded as onegeneration rules and a set of condition
rules so that the generation rule has to satisfy. With a specification
condition rule, we can define its generation rule directly.
Condition
rule
A
condition rule is formulated as follows: if nj1:sem = v1 ^
nj2:sem = v2::: ^ njm:sem = vm then N:sem = v with v and vj
2 SI
Generation
rule
A
generation rule is a sequence of symbols in order to transfer the internal
node N into the internal node of a reduced sentence. We used
two generation rules, one for E and other one for V .
Given a sequence symbols g : g1g2:::gm ,
in which gi is an
integer
or a string. The equation gi = j means the
children node be remained at position j in the target node.
If gi = "v1v2:::vl", we have
that string will in the children node ni of the target node.
Figure 1 shows a syntax tree of the input sentence: ”Much of ATM's performance
depends on the underlying application.”. In this syntax tree, the syntax rule:”S1=Bng-daucau
Subj cdgt Bng-cuoicau” will
be
used the syntax control bellow to reduce < Con > default <
=Con > < Gen > 1 2 < =Gen > The
condition rule is ”default” mean the generation rule is applied to any
condition rule. The generation rule be ”1 2” mean only the node (Subj) in the
index
1 and the node (cdgt) in the index 2 of the rule ”S1=Bng-daucau Subj
cdgt Bng-cuoicau” are remained in the reduced sentence. If the syntax
control is changed to < Con > Subj = HUMAN <
=Con > < Gen > 1 2 < =Gen > This
condition rule means that only the case the semantic information in the
children node "Subj" is "HUMAN" the
generation rule ”1 2” is applied for reduction process. Using the default
condition rule the reduced sentences to be generated as follows. Original
sentence: Like FaceLift, much of ATM's screen performance
depends on the underlying application. Reduced sentence in
English: Much of ATM's performance depends on the
underlying application. Reduced sentence in Vietnamese: Nhieu
hieu suat cua ATM phu thuoc vao nhung ung dung tiem an.
In
order to generating reduced sentence in Vietnamese language, the condition rule
and generation is also designed. This process is used the same way as transfer
translation method. Because the gist meaning of a short sentence is unchanged
in comparing with the original sentence, the gist meaning of a node after
applying the syntax control will be unchanged. With this assumption, we can
reuse the syntax control for translating the original sentence into other
languages (English into Vietnamese) for translating the reduced sentence.
Therefore, our sentence reduction program can produce two reduced sentences in
two difference languages.
Our
semantic parsing used that set of rules to select suitable rules for the
current context. The problem of selecting a set of suitable rules for the
current context
of the
current node N is to find the most likely condition rule among
the set of syntax control rules that associated with it. Thus, semantic parsing
using syntax control problem can be described mathematically as follows:
Given
a sequence of children nodes n1; n2; :::; nk of
a node N, each node ni consist of a list of
meaning, in which each meaning was associated with a symbol
meaning. The syntax rule for the node N was associated with a
set of condition rules. In addition, one condition rule is mapped
with a specification generation rule. Find the most
condition rules for that node sequences. This problem can be solved
by using a variant of the Viterbi algorithm (A.J. Viterbi, 1967). Firstly,
we define each semantic-information of a children node with all
index condition rules. Secondly, we try to find all sequences that
come from the same condition rules.
3.2 A New Sentence Reduction based on Decisions tree model
This
chapter is about a novel sentence reduction algorithm base on decision tree
model where semantic information is used to enhance the accuracy of sentence
reduction. The proposed algorithm is able to deal with the changeable order
problem in sentence reduction. Experimental show a better result when comparing
with the original methods.
Many
researches in automatic text summarization were focused on extraction or
identifying the important clauses and sentences, paragraphs in texts.
Meanwhile, humans used to produce summaries by creating new sentences that are
grammatical, that cohere with one another, and capture the most salient parts
of information in the original document. Sentence reduction is the problem of
removing some redundant words or some phrases from the original sentence by
creating a new sentence, in which the gist meaning of the original sentence was
changed . Methods of sentence reduction have been applied in many applications.
Grefenstette (Grefenstette,S,1998) proposed removing phrases in sentences to
produce a telegraphic text that can be used to provide audio scanning services
for the blind. Dolan (Donlan,W.B, 1999) proposed removing clauses in sentences
before indexing document for information retrieval. Those methods removed
phrases based on their syntactic categories without relying on the context of
words, phrases and sentences around. Therefore, those methods are unsuitable
for text summarization task. Sentence reduction for text summarization is
pointed out by Mani and Maybury (Mani and Maybury,1999). The authors present a
process of writing reduced sentences by reversing the original sentence with a
set of revised rules. Jing (Jing,H, 2000) also studied a method to remove
extraneous phrases from sentences by using multiple source of knowledge to
decide which phrase can be removed. The multiple sources include syntactic
knowledge, context information and statistic computed from a corpus that
consists of examples written by human professional. Their method prevented
removing some phrases that were relative to its context around and produced a
grammatical sentence, and applied to the cut and paste summarization strategy.
Recently, Knight and Marcus (Knight and Marcu,D, 2002) demonstrated two methods
for sentence compression problem based on corpus. They devised both
noisy-channel and decision tree approach to the problem. The decision tree
approach has been applied in parsing sentence and defining the rhetorical of
text documents and achieved a good results in sentence compression.
In
almost previous methods, the order of reduced sentences is the same with the
original sentence. Meanwhile, in summrizing document, human may perform a
changeable order to ensure the summary document is smooth and coherence. This
fact requires a new sentence reduction with the order of reduced sentence is
different from the orignal. In addition to using sentence reduction for text
summarization, the information of syntactic is not enough. The semantic
information of original sentences should be incorporated with reduction process
to enhance the accuracy of reduction process. This fact is also similar to the
behavior of human in reduction sentence that they can understand the meaning of
original sentences to ensure that important words is remained in reduced
sentences. To satisfy the new requirements mentioned above, we proposed a new
sentence reduction based on decision tree model where semantic information is
used to support reduction process. The decision tree model is also extended to
cope with the changeable order between original sentences and reduced
sentences.
Decision
tree model for sentence reduction
The
following sections will present a sentence reduction based on decision tree
model using rich semantic information. Let tand s be
a syntax tree of the original sentence and a reduced sentence respectively. To
perform a rewriting process we used an Input list, two stacks and some
rewriting operators are defined as follows.
An
Input list consists of a sequence of words subsumed by the tree t where
each word in the input list is labeled with the name of all syntactic
constituents in t that start with it. CSTACK is
a stack consists of all sub trees in order to rewrite a small tree. RSTACK is
a stack consists of all removed nodes in rewriting process from a large
tree t into a small tree s .Five operators are
used to rewrite a larger tree t into a smaller tree s are as
follows;
*
smFr-operator transfers a first word from the input list into CSTACK. It was
written in
mathematic
by the label sta-T.
*
REDUCE-operators pops the k syntactic trees located at the top of CSTACK and
combine them into a new tree. These operators are formulated as REDUCE (k,
x ) , in which k is an integer and X is a grammar
symbol.
*
DROP-operators are used to remove from the input list subsequences of word that
correspond to syntactic constituents to RSTACK. Both REDUCE-operators and
DROP-operators are used to derive the structure of the syntactic tree of the
short sentence. They were written as DROP x with X is a grammar symbol.
*
ASSIGN TYPE-operators are used to change the label of trees at the top of the
CSTACK. These POS tags may be different from the POS tags in the original
sentence. These operators are written as ASSIGN TYPE (X) , which x are
POS tags.
*
RESTORE-operators take the leh element in RSTACK to remove that element into
the Input list. These operators are designed with the assumption that a
sub-tree was removed from the input list still affects the current decision. We
also formulated it as RESTORE k where k is an
integer.
A DROP
x operators deletes from the input list all words that are spanned by
constituent x in t and store them into CSTACK. The operator
RESTORE is designed to restore some words in RSTACK to generate a small
tree s . With these operators, the order of words within a
small tree s can be changed in comparing with the word order of the large
tree t .
Features
The
features we used in this model consist of:
* Some
features come from the input list.
* Some
features come from the configuration of CSTACK.
* Some
features come from the configuration of RSTACK.
There
are two kinds of features were described followings:
Operation
features
Theses
operators reflect the number of trees in CSTACK and the number of elements in
RSTACK and the type of the last five operators. We also consider the
information of two stacks as the information denotes the syntactic category of
the root nodes of the partial trees build up to a certain time.
Original
tree specific features denote the syntactic
constituents that start with the first unit in the input list.
Semantic
feature
The
semantic features we used including: The semantic information of current words
within the input list. The semantic type we used including some general
semantic types such as, HUMAN, THINGS, ANIMAL, CONCEPT, INSTRUCTOR, COMPUTER,
etc. Some semantic information such as, the word in the input list is head word
or not. The boolean value is used to define whether or not a word is in the
subcategorization table.
Process
of reduction sentence
After
using decision tree learning to generate a set of rules, we have each
configuration of two stacks and input list that correspond to a decision
action. A given input sentence was parsed and each word within the input list
was corresponding to the word in the sentence and the sequence of syntactic
constituents that begins with at each word. We simulate the rewriting process,
in which each configuration of two stacks and one input list were executed with
an operator and change to a new state and so on. The processes repeat until the
Input list is empty and there is only one sub tree in CSTACK with its root node
is the one of terminal symbols (the symbol to recognize it as a root symbol) or
RSTACK is empty. An order traversal of the leaves of this tree that produces
the reduced version of the sentence was given as input.
Reduction
Procedure
Input
: an input sentence
Output:
a reduced sentence
Step
1. The input sentence is parsed into a syntax tree.
Step
2. The syntax tree is enriched semantic information .
Step
3. create an input list and set CSTack and RStack to empty.
Step
4. Call a traversal procedure to obtain a reduced syntax tree
Step
5. Generate a reduced sentence from the reduced syntax tree
Traversal
procedure
Input:
Input list, CSTack, RStack
Output:
A reduced tree while(not terminal condition) { feature=get contextual
feature(); action= get action(feature); parameter=get_parameter(action); switch
(action) { case SHIFT: SHIFT(); case ASSIGN TYPE: ASSIGN TYPE(parameter);
break; case REDUCE: Reduce(parameter); break; case DROP: Drop(parameter);
break; case RESTORE: Restore(parameter); break;}
The
process of reduction sentence
In
the traversal procedure, we use some functions and sub
procedures are as follows: get contextual features, get actionand get
parameter. The function get contextual, features extracts
the vector of features. The function get action and get
parameter are used to get information of operator and parameter for
performing the procedure SHIFT, DROP, RESTORE, ASSIGNT TYPE and REDUCE.
We
have presented the new algorithms that allow rewriting a long sentence into
reduced sentences with the order of short sentence is able to be different from
the original sentence. We claimed that the semantic information of the original
sentence was very useful for sentence reduction problem. Experimental results
showed that the proposed algorithm improved the original algorithm. For future
work we continue testing on the large corpus and integrating with a
summarization system are currently underway.
Оставьте свой комментарий
Авторизуйтесь, чтобы задавать вопросы.