Package org.opencv.text
Class OCRTesseract
java.lang.Object
org.opencv.text.BaseOCR
org.opencv.text.OCRTesseract
OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++.
Notice that it is compiled only when tesseract-ocr is correctly installed.
Note:
-
(C++) An example of OCRTesseract recognition combined with scene text detection can be found
at the end_to_end_recognition demo:
<https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp>
- (C++) Another example of OCRTesseract recognition combined with scene text detection can be found at the webcam_demo: <https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/webcam_demo.cpp>
-
Field Summary
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic OCRTesseract
__fromPtr__
(long addr) static OCRTesseract
create()
Creates an instance of the OCRTesseract class.static OCRTesseract
Creates an instance of the OCRTesseract class.static OCRTesseract
Creates an instance of the OCRTesseract class.static OCRTesseract
Creates an instance of the OCRTesseract class.static OCRTesseract
Creates an instance of the OCRTesseract class.static OCRTesseract
Creates an instance of the OCRTesseract class.protected void
finalize()
Recognize text using the tesseract-ocr API.Recognize text using the tesseract-ocr API.void
setWhiteList
(String char_whitelist) Methods inherited from class org.opencv.text.BaseOCR
getNativeObjAddr
-
Constructor Details
-
OCRTesseract
protected OCRTesseract(long addr)
-
-
Method Details
-
__fromPtr__
-
run
Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.- Parameters:
image
- Input image CV_8UC1 or CV_8UC3 text elements found (e.g. words or text lines). recognition of individual text elements found (e.g. words or text lines). for the recognition of individual text elements found (e.g. words or text lines).min_confidence
- automatically generatedcomponent_level
- OCR_LEVEL_WORD (by default), or OCR_LEVEL_TEXTLINE.- Returns:
- automatically generated
-
run
Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.- Parameters:
image
- Input image CV_8UC1 or CV_8UC3 text elements found (e.g. words or text lines). recognition of individual text elements found (e.g. words or text lines). for the recognition of individual text elements found (e.g. words or text lines).min_confidence
- automatically generated- Returns:
- automatically generated
-
run
-
run
-
setWhiteList
-
create
public static OCRTesseract create(String datapath, String language, String char_whitelist, int oem, int psmode) Creates an instance of the OCRTesseract class. Initializes Tesseract.- Parameters:
datapath
- the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.language
- an ISO 639-3 code or NULL will default to "eng".char_whitelist
- specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".oem
- tesseract-ocr offers different OCR Engine Modes (OEM), by default tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values.psmode
- tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.- Returns:
- automatically generated
-
create
Creates an instance of the OCRTesseract class. Initializes Tesseract.- Parameters:
datapath
- the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.language
- an ISO 639-3 code or NULL will default to "eng".char_whitelist
- specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".oem
- tesseract-ocr offers different OCR Engine Modes (OEM), by default tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.- Returns:
- automatically generated
-
create
Creates an instance of the OCRTesseract class. Initializes Tesseract.- Parameters:
datapath
- the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.language
- an ISO 639-3 code or NULL will default to "eng".char_whitelist
- specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ". tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.- Returns:
- automatically generated
-
create
Creates an instance of the OCRTesseract class. Initializes Tesseract.- Parameters:
datapath
- the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.language
- an ISO 639-3 code or NULL will default to "eng". "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ". tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.- Returns:
- automatically generated
-
create
Creates an instance of the OCRTesseract class. Initializes Tesseract.- Parameters:
datapath
- the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory. "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ". tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.- Returns:
- automatically generated
-
create
Creates an instance of the OCRTesseract class. Initializes Tesseract. system's default directory. "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ". tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values. (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.- Returns:
- automatically generated
-
finalize
-