ocr

利用tess-two和cv4j实现简单的ocr功能

ocr

光学字符识别（英语：Optical Character Recognition, OCR）是指对文本资料的图像文件进行分析识别处理，获取文字及版面信息的过程。

Tesseract

Tesseract是Ray Smith于1985到1995年间在惠普布里斯托实验室开发的一个OCR引擎，曾经在1995 UNLV精确度测试中名列前茅。但1996年后基本停止了开发。2006年，Google邀请Smith加盟，重启该项目。目前项目的许可证是Apache 2.0。该项目目前支持Windows、Linux和Mac OS等主流平台。但作为一个引擎，它只提供命令行工具。现阶段的Tesseract由Google负责维护，是最好的开源OCR Engine之一，并且支持中文。

tess-two是Tesseract在Android平台上的移植。

下载tess-two：

compile 'com.rmtheis:tess-two:8.0.0'

然后将训练好的eng.traineddata放入android项目的assets文件夹中，就可以识别英文了。

1. 简单地识别英文

初始化tess-two，加载训练好的tessdata

    private void prepareTesseract() {

        try {

            prepareDirectory(DATA_PATH + TESSDATA);

        } catch (Exception e) {

            e.printStackTrace();

        }


        copyTessDataFiles(TESSDATA);

    }


    /**

     * Prepare directory on external storage

     *

     * @param path

     * @throws Exception

     */

    private void prepareDirectory(String path) {


        File dir = new File(path);

        if (!dir.exists()) {

            if (!dir.mkdirs()) {

                Log.e(TAG, "ERROR: Creation of directory " + path + " failed, check does Android Manifest have permission to write to external storage.");

            }

        } else {

            Log.i(TAG, "Created directory " + path);

        }

    }


    /**

     * Copy tessdata files (located on assets/tessdata) to destination directory

     *

     * @param path - name of directory with .traineddata files

     */

    private void copyTessDataFiles(String path) {

        try {

            String fileList[] = getAssets().list(path);


            for (String fileName : fileList) {


                // open file within the assets folder

                // if it is not already there copy it to the sdcard

                String pathToDataFile = DATA_PATH + path + "/" + fileName;

                if (!(new File(pathToDataFile)).exists()) {


                    InputStream in = getAssets().open(path + "/" + fileName);


                    OutputStream out = new FileOutputStream(pathToDataFile);


                    // Transfer bytes from in to out

                    byte[] buf = new byte[1024];

                    int len;


                    while ((len = in.read(buf)) > 0) {

                        out.write(buf, 0, len);

                    }

                    in.close();

                    out.close();


                    Log.d(TAG, "Copied " + fileName + "to tessdata");

                }

            }

        } catch (IOException e) {

            Log.e(TAG, "Unable to copy files to tessdata " + e.toString());

        }

    }

拍完照后，调用startOCR方法。

    private void startOCR(Uri imgUri) {

        try {

            BitmapFactory.Options options = new BitmapFactory.Options();

            options.inSampleSize = 4; // 1 - means max size. 4 - means maxsize/4 size. Don't use value <4, because you need more memory in the heap to store your data.

            Bitmap bitmap = BitmapFactory.decodeFile(imgUri.getPath(), options);


            String result = extractText(bitmap);

            resultView.setText(result);


        } catch (Exception e) {

            Log.e(TAG, e.getMessage());

        }

    }

extractText()会调用tess-two的api来实现ocr文字识别。

    private String extractText(Bitmap bitmap) {

        try {

            tessBaseApi = new TessBaseAPI();

        } catch (Exception e) {

            Log.e(TAG, e.getMessage());

            if (tessBaseApi == null) {

                Log.e(TAG, "TessBaseAPI is null. TessFactory not returning tess object.");

            }

        }


        tessBaseApi.init(DATA_PATH, lang);


        tessBaseApi.setImage(bitmap);

        String extractedText = "empty result";

        try {

            extractedText = tessBaseApi.getUTF8Text();

        } catch (Exception e) {

            Log.e(TAG, "Error in recognizing text.");

        }

        tessBaseApi.end();

        return extractedText;

    }

最后，显示识别的效果，此时的效果还算可以。

2. 识别代码

接下来，尝试用上面的程序识别一段代码。

此时，效果一塌糊涂。我们重构一下startOCR()，增加局部的二值化处理。

    private void startOCR(Uri imgUri) {

        try {

            BitmapFactory.Options options = new BitmapFactory.Options();

            options.inSampleSize = 4; // 1 - means max size. 4 - means maxsize/4 size. Don't use value <4, because you need more memory in the heap to store your data.

            Bitmap bitmap = BitmapFactory.decodeFile(imgUri.getPath(), options);


            CV4JImage cv4JImage = new CV4JImage(bitmap);

            Threshold threshold = new Threshold();

            threshold.adaptiveThresh((ByteProcessor)(cv4JImage.convert2Gray().getProcessor()), Threshold.ADAPTIVE_C_MEANS_THRESH, 12, 30, Threshold.METHOD_THRESH_BINARY);

            Bitmap newBitmap = cv4JImage.getProcessor().getImage().toBitmap(Bitmap.Config.ARGB_8888);


            ivImage2.setImageBitmap(newBitmap);


            String result = extractText(newBitmap);

            resultView.setText(result);


        } catch (Exception e) {

            Log.e(TAG, e.getMessage());

        }

    }

在这里，使用cv4j来实现图像的二值化处理。

            CV4JImage cv4JImage = new CV4JImage(bitmap);

            Threshold threshold = new Threshold();

            threshold.adaptiveThresh((ByteProcessor)(cv4JImage.convert2Gray().getProcessor()), Threshold.ADAPTIVE_C_MEANS_THRESH, 12, 30, Threshold.METHOD_THRESH_BINARY);

            Bitmap newBitmap = cv4JImage.getProcessor().getImage().toBitmap(Bitmap.Config.ARGB_8888);