Skip to content

Commit 1efe428

Browse files
committed
add text summarizer
1 parent c061bf6 commit 1efe428

File tree

4 files changed

+41
-22
lines changed

4 files changed

+41
-22
lines changed

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,19 @@ php-text-analysis
77
[![Total Downloads](https://poser.pugx.org/yooper/php-text-analysis/downloads)](https://packagist.org/packages/yooper/php-text-analysis)
88

99
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language.
10+
There are tools in this library that can perform:
11+
12+
* document classification
13+
* sentiment analysis
14+
* compare documents
15+
* frequency analysis
16+
* tokenization
17+
* stemming
18+
* collocations with Pointwise Mutual Information
19+
* lexical diversity
20+
* corpus analysis
21+
* text summarization
22+
1023
All the documentation for this project can be found in the book and wiki.
1124

1225
PHP Text Analysis Book & Wiki
@@ -90,13 +103,17 @@ $results = $rake->getKeywordScores();
90103
```
91104

92105
### Sentiment Analysis with Vader
93-
Need Sentiment Analysis ? Use Vader, https://github.com/cjhutto/vaderSentiment .
94-
The PHP implementation can be invoked easily.
106+
Need Sentiment Analysis with PHP Use Vader, https://github.com/cjhutto/vaderSentiment .
107+
The PHP implementation can be invoked easily. Just normalize your data before hand.
95108
```php
96109
$sentimentScores = vader($tokens);
97110
```
98111

99112
### Document Classification with Naive Bayes
113+
Need to do some docucment classification with PHP, trying using the Naive Bayes
114+
implementation. An example of classifying movie reviews can be found in the unit
115+
tests
116+
100117
```php
101118
$nb = naive_bayes();
102119
$nb->train('mexican', tokenize('taco nacho enchilada burrito'));

src/Analysis/Summarize/Simple.php

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,7 @@ public function summarize(array $wordTokens, array $sentenceTokens) : array
2525
{
2626
$scoreKeepers[] = new ScoreKeeper($sentenceTokens[$index], $index);
2727
}
28-
29-
$sentenceCounter = array_fill_keys($sentenceTokens, 0);
30-
28+
3129
foreach($tokenCounts as $token => $freq)
3230
{
3331
foreach($scoreKeepers as $sentenceKeeper)

src/helpers/storage.php

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,22 @@
11
<?php
22

3-
if (! function_exists('get_storage_path')) {
4-
/**
5-
* Base function for getting the storage path to the different directories.
6-
* @return string
7-
*/
8-
function get_storage_path( $subDirName = null ) {
9-
$path = dirname( dirname( __DIR__ ) ) . DIRECTORY_SEPARATOR . 'storage' . DIRECTORY_SEPARATOR;
3+
/**
4+
* Base function for getting the storage path to the different directories.
5+
* @return string|false
6+
*/
7+
function get_storage_path( $subDirName = null )
8+
{
9+
$path = dirname( dirname( __DIR__ ) ) . DIRECTORY_SEPARATOR . 'storage' . DIRECTORY_SEPARATOR;
1010

11-
if ( ! empty( $path ) ) {
12-
$path .= $subDirName;
13-
}
11+
if ( ! empty( $path ) ) {
12+
$path .= $subDirName;
13+
}
1414

15-
if ( ! is_dir( $path ) ) {
16-
throw new Exception( "Path {$path} does not exist" );
17-
}
15+
if ( ! is_dir( $path ) ) {
16+
throw new Exception( "Path {$path} does not exist" );
17+
}
1818

19-
return realpath( $path ) . DIRECTORY_SEPARATOR;
19+
return realpath( $path ) . DIRECTORY_SEPARATOR;
20+
}
2021

21-
}
22-
}
2322

tests/TextAnalysis/Classifiers/NaiveBayesTest.php

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,14 @@ public function testNaiveBayes()
2121

2222
public function testMovieReviews()
2323
{
24-
if( getenv('SKIP_TEST') || !is_dir(get_storage_path('corpora/movie_reviews'))) {
24+
if( getenv('SKIP_TEST')) {
2525
return;
2626
}
27+
try {
28+
get_storage_path('corpora/movie_reviews');
29+
} catch(\Exception $ex) {
30+
return;
31+
}
2732

2833
$posFilePaths = scan_dir(get_storage_path('corpora/movie_reviews/pos'));
2934
$nb = naive_bayes();

0 commit comments

Comments
 (0)