Types of Scoring Techniques for Extractive Text Summarization

 Scoring Techniques:

  • Proper noun
  • Sentence length character
  • Sentence length words
  • Sentence position
  • Word frequency
  • Numerical value
  • Named entity
  • Iterative query score
  • Cue words 



Word frequency:

As the name suggests the more the frequency of the word in the sentence the higher will be it's score. In other words the sentence containing the most frequent words of the document has high chances of getting selected for the Final summary. This is based on the assumption that the higher the frequency of the word in the text , it is more likely to be related to the subject of the document.


S(L) = N(w) / N(d)


Where,

N(w) =  Sum of the frequency of the words of the sentence

N(d) = Sum of the frequency of the words of the document.



Proper noun:

Proper noun refers to an individual,place or organisation. It is considered to be carrying greater information from the rest of the words.

The Sentence containing a higher number of proper nouns is more  likely to be selected for the final summary. It is a specialisation of the upper case method. 


The score is calculated as the number of proper nouns of the sentence to the number of proper noun of the document.


S(P) = N(P) / N(D)


Where, 

N(P) = Number of the proper nouns of the sentence

N(D) = Number of the proper nouns of the document.


Sentence length character:

This method is used for calculating the sentence score lengthwise. The average sentence length is calculated by following function given below:


Avg(L)= max(L)+min(L)/2


Where,

max(L) = maximum length of the sentence (character-wise)

min(L) = minimum length of the sentence (character-wise)


Score= ( | Sl -Avg(L) | ) / Msl


Where,

Sl= Sentence length

Avg(L) = The average sentence length of the document

Msl = The maximum sentence length


Sentence length words:

This method is used for calculating the sentence score lengthwise. The average sentence length is calculated by following function given below:


Avg(L)= max(L)+min(L)/2


Where,

max(L) = maximum length of the sentence (word-wise)

min(L) = minimum length of the sentence (word-wise)


Score= ( | Sl -Avg(L) | ) / Msl


Where,

Sl= Sentence length

Avg(L) = The average sentence length of the document

Msl = The maximum sentence length


Sentence position:

It is assumed to be the most important feature for sentence scoring. Normally a few sentences at the beginning and at the end are considered to be more important than the other sentences. They are more likely to be selected for the final summary.


Numerical value:

Numerical values represent important figures. According to this method the sentence containing numerical values are considered to be more important than other sentences.


S(L) = Nnv / T


Where,

Nnv = Number of numerical values in the sentence

T = Total number of numerical values in the document.


Named entity:

The sentence containing more number of named entities will be considered to be having more weight than other sentences.


S(L)= Net / T


Where,

Net =  Number or named entity of the sentence

T =  Number of named entities of the document.




Iterative query score:

In this technique first we calculate the frequency of the words contained in the document and then the words are sorted based on their frequency. Then the top frequent words are chosen from the sorted list which is known as the initial keyword set. Then the sentence containing the higher number of the initial keywords are considered to be more important than other sentences and they have high chances to get selected for the final summary.


Score= Nit / T


Where,

Nit = Number of the initial keywords of the sentence

T = Total number of initial keywords of the document.


Cue words :

In this technique the sentences are scored based on the number of cue words it contains. For using this technique a set of cue words like ‘In brief’ ,’Summing up’, ‘thus’, ‘to conclude’ has to prepared.


Score = Ncw / T


Where,

Ncw = Number of cue words in the sentence

T = Number of cue words of the document


Comments

Popular posts from this blog

Cuckoo Search Algorithm

Different Types of Text Summarization

Introduction to Text Summerization