一次完整的OCR实践记录

一、任务介绍

　　这次的任务是对两百余张图片里面特定的编号进行识别，涉及保密的原因，这里就不能粘贴出具体的图片了，下面粘贴出一张类似需要识别的图片。

　　假如说我的数据源如上图所示，那么我需要做的工作就是将上面图片里面标红的数字给识别出来。

　　我采用的算法是GitHub - YCG09/chinese_ocr: CTPN + DenseNet + CTC based end-to-end Chinese OCR implemented using tensorflow and keras，这是基于Tensorflow和keras框架采用ctpn+densenet+CTC算法来完成对图片指定内容的字符识别。

二、图像标注

　　既然要进行OCR识别，那么一定要对已有的数据源进行图像标注工作，这里采用的工具是labelImg，相信大家如果有搞深度学习这块的话一定对这个工具不会陌生。

　　对图像具体的标注流程，我这里就不做说明了，网上有很多资料可以查找。这里需要作特别说明的是，对于ctpn的训练，label的名字为text，对于densenet的训练来说的话，就需要把标注框里面的内容当作label。

　　然后就是数据增强这块，这里需要记录的有两点，一就是原始的数据源比较少就必须做数据增强，不然做出来的效果肯定不太行，二就是怎么做数据增强，由于这里的数据比较简单，需要识别的内容也是有规律可行的，那这里就用不着采用比较复杂的数据增强，所以我做的数据增强就是对图像随机进行裁剪和倾斜，当然这里裁剪的尺寸和倾斜的角度一定要控制好，不然就会影响图片的质量。

import cv2
import numpy as np
import random
import os
from PIL  import Image

# 数据增强的代码

img_path  = r "*****************"
save_path  = r "****************"

# 随机倾斜图片
def rotate_ima(img_path,save_path):
    for file in os.listdir(img_path):
        img  = cv2.imread(os.path.join(img_path, file ), 0 )
        rows,cols  = img.shape

        # cols-1 and rows-1 are the coordinate limits.
        # 每张图片倾斜4张
        for i  in range ( 4 ):
            a  = random.randint( 2 , 6 )
            print (a)
            # 指定左右倾斜
            for j  in range ( 2 ):
                a  = - a
                M  = cv2.getRotationMatrix2D(((cols - 1 ) / 2.0 ,(rows - 1 ) / 2.0 ),a, 1 )
                dst  = cv2.warpAffine(img,M,(cols,rows))

                #cv2.imshow('img',img)
                #cv2.imshow('dst',dst)
                cv2.imwrite(os.path.join(save_path, 'rot_' + str (i) + '_' + str (j) + file ),dst)
                #cv2.waitKey(0)
                cv2.destroyAllWindows()
   
   
# 随机裁剪图片
def cut_img(img_path,save_path):
    all_file = []
    for file in os.listdir(img_path):
        all_file.append( file )
    file1 = random.sample(all_file, 2 )
    for x  in file1:
        im = Image. open (os.path.join(img_path,x))
        crop_all = []
        for c  in range ( 5 ):   # 对每张图片随机生成5张
            for i  in range ( 4 ):
                a = random.randint( 100 , 400 )
                crop_all.append(a)
            region = im.crop((crop_all[ 0 ],crop_all[ 1 ],im.size[ 0 ] - crop_all[ 2 ],im.size[ 1 ] - crop_all[ 3 ]))
            region.save(os.path.join(save_path, 'cut_' + str (c) + '_' + x))
           
#rotate_ima(img_path,save_path)
cut_img(img_path,save_path)

　　然后我大概生成了3000张左右的图片就开始进行数据标注了，标注了大概六七个小时才把这些数据标注给完成。

　　有了这些标注数据过后，就可以正式开始训练了。