Skip to content

Commit

Permalink
- Breaking change: Switch to returning Promise in place of callback f…
Browse files Browse the repository at this point in the history
…or `process`

- Update: psm argument for tesseract 4+
- Fix (npm): `engine` -> `engines`
- Fix: Allow image files to contain spaces (desmondmorris#59 )
- Enhancement: Add oem flag (https://github.com/desmondmorris/node-tesseract/pull/62/files )
- Enhancement: Add `tessdataDir` option (`--tessdata-dir`)
- Enhancement: Add HOCR, TSV support (https://github.com/desmondmorris/node-tesseract/pull/60/files )
- Enhancement: Add debug option and `.idea` to ignore (desmondmorris#43 )
- Linting (ESLint): Apply ash-nazg
- Refactoring: Use asynchronous `unlink`; remove custom `merge` in favor of spread operator
- Testing: Switch to chai over should
- Testing: Add coverage
- Maintenance: Add `.editorconfig`
- npm: Add recommended properties; update deps and devDeps; add ignore file and `package-lock.json`
  • Loading branch information
brettz9 committed Feb 2, 2020
1 parent 3024978 commit ee117e9
Show file tree
Hide file tree
Showing 12 changed files with 4,724 additions and 166 deletions.
16 changes: 16 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
; EditorConfig file: https://EditorConfig.org
; Install the "EditorConfig" plugin into your editor to use

root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = space
indent_size = 2
trim_trailing_whitespace = true

; [app/public/css/**.styl]
; indent_style = tab
; indent_size = 2
2 changes: 2 additions & 0 deletions .eslintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
node_modules
coverage
39 changes: 39 additions & 0 deletions .eslintrc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
module.exports = {
"env": {
"commonjs": true,
"es6": true,
"node": true
},
settings: {
polyfills: [
'Promise'
]
},
overrides: [
{
files: ['test/**'],
env: {
mocha: true
}
},
{
files: ['*.md'],
rules: {
strict: 0,
'no-console': 0,
'node/no-missing-require': ['error', {allowModules: ['node-tesseract']}]
}
}
],
"extends": ["ash-nazg/sauron", "plugin:node/recommended-script"],
"globals": {
"Atomics": "readonly",
"SharedArrayBuffer": "readonly"
},
"parserOptions": {
"ecmaVersion": 2018
},
"rules": {
"import/no-commonjs": 0
}
};
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
node_modules
.idea
1 change: 1 addition & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
test
65 changes: 39 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,48 +11,61 @@ A simple wrapper for the Tesseract OCR package for node.js
## Installation
There is a hard dependency on the [Tesseract project](https://github.com/tesseract-ocr/tesseract). You can find installation instructions for various platforms on the project site. For Homebrew users, the installation is quick and easy.

brew install tesseract --with-all-languages
```sh
brew install tesseract --with-all-languages
```

The above will install all of the language packages available, if you don't need them all you can remove the `--all-languages` flag and install them manually, by downloading them to your local machine and then exposing the `TESSDATA_PREFIX` variable into your path:

export TESSDATA_PREFIX=~/Downloads/
```sh
export TESSDATA_PREFIX=~/Downloads/
```

You can then go about installing the node-module to expose the JavaScript API:

npm install node-tesseract
```sh
npm install node-tesseract
```

## Usage

```JavaScript
var tesseract = require('node-tesseract');
```js
const {join} = require('path');
const tesseract = require('node-tesseract');

(async () => {
// Recognize text of any language in any format
tesseract.process(__dirname + '/path/to/image.jpg',function(err, text) {
if(err) {
console.error(err);
} else {
console.log(text);
}
});

// Recognize German text in a single uniform block of text and set the binary path

var options = {
l: 'deu',
psm: 6,
binary: '/usr/local/bin/tesseract'
let text;
try {
text = await tesseract.process(join(__dirname, '/path/to/image.jpg'));
} catch (err) {
console.error(err);
return;
}
console.log(text);

// Recognize German text in a single uniform block of text and
// set the binary path
const options = {
l: 'deu',
psm: 6,
binary: '/usr/local/bin/tesseract'
};

tesseract.process(__dirname + '/path/to/image.jpg', options, function(err, text) {
if(err) {
console.error(err);
} else {
console.log(text);
}
});
try {
text = await tesseract.process(
join(__dirname, '/path/to/image.jpg'), options
);
} catch (err) {
console.error(err);
return;
}
console.log(text);
})();
```

## Changelog

* **0.2.7**: Adds output file extension detection
* **0.2.6**: Catches exception when deleting tmp files that do not exist
* **0.2.5**: Preserves whitespace and replaces tmp module
Expand Down
Loading

0 comments on commit ee117e9

Please sign in to comment.