Leading
Recently I wrote the project in Swift(2.0) and used the third party library Tesseract-OCR-iOS(4.0.0). There're lots of bugs when coding and I want to share my experience here.
Notes:
Project was based on OS 10.11.3 with Xcode 7.2, used iOS 9.0 SDK.
Getting Started
Actually I found tutorial on raywenderlich and tried at once. Downloaded the source and dragged the TesseractOCR.framework to my project. But Xcode prompted me that can not find TesseractOCR.framework. What? I looked into the framework structure but it's not like other framework of Xcode itself.
The problem was not so easy to work out and I must find a good way to solve it.
An Attempt on Original Sources
Because I have built an Objective-C project with Tesseract-OCR-iOS(3.0.4) successfully before , I copied the library and added to the project. Although I wrote the declaration in the bridging header, Xcode still prompted me that can not found Tesseract.h. Finally I found these words in Using Swift with Cocoa and Objective-C:
You cannot import C++ code directly into Swift. Instead, create an Objective-C or C wrapper for C++ code.
(page 7)
So with ldiqual's code, I wrote Objective-C++ files to wrap the original source and added it to bridging header. Xcode was no longer get error. I added the code below in my viewController.
func performImageRecognition(image: UIImage) {
let tess = Tesseract(dataPath: "tessdata", language: "eng")
tess.setImage(image)
if tess.recognize() {
txtResult.text = tess.recognizedText()
} else {
txtResult.text = "Can not recognize!"
}
}
Now I was so happy and ran it on my iPhone. But...
I can find the tessdata document in the folder(by iTunes) but why iPhone can't? I went through the code, found that the tessdata document was parts of Tesseract(4.0.0) but the libraries(libtesseract_all.a and liblept.a) were parts of former one(3.0.4). Then I tried to replace the tessdata document with Tesseract(3.0.4) to suit with libraries. The app ran successfully at last.
Notes:
But once I use libraries of Tesseract(4.0.0), the error that can not open tessdata document will exist. Can anyone tell me the reason? I guess it's because library of Tesseract(4.0.0) may not support to read the tessdata. So strange!
Focus Back on Framework
Referred to raywenderlich I downloaded the code of TesseractOCR.framework from G8Tesseract. The wiki of Tesseract-OCR-iOS told us to install the framework by using CocoaPods. But I'd like to generate a framework and added it to my project. And you can do it by below steps:
- Open project Tesseract OCR iOS.xcodeproj
- Targets -> Build Settings -> Development -> Targeted Device Family (1,2)
- Targets -> Build Settings -> Development -> Skip Install (Yes)
- Targets -> Build Settings -> Linking -> Mach-O Type (Static Library)
- Targets -> Build Settings -> Linking -> Other Linker Flags (-ObjC -lstdc++ -all_load)
- Targets -> Build Phases -> Headers (Add all headers here)
- Set the active scheme Generic iOS Device
- Cmd + R to build it
- go to Window -> Projects and get the framework
- add the TesseractOCR.framework and CoreImage.framework to your project
The framework which was just generated is Objective-C and you must declare it in the bridging header. And I added below code to my viewController.
func performImageRecognition(image: UIImage) {
let tess = G8Tesseract(language: "eng")
tess.image = image
if tess.recognize() {
txtResult.text = tess.recognizedText
} else {
txtResult.text = "Can not recognize!"
}
}
Now I was so happy again and ran it on my iPhone. And...
Och! I looked up the function - (void)setImage:(UIImage *)image
in G8Tesseract.mm but can't found any error with image = [image fixOrientation];
.So I removed the function fixOrientation to G8Tesseract.mm itself and called in this way: image = [self fixOrientation: image]
. Then deleted the UIImage+G8FixOrientation.h in Targets -> Build Phases -> Headers. Rebuild the framework at last.
After all, the app ran on iPhone successfully!
Notes:
You may delete the UIImage+G8Filters.h in framework because of the same error. But you can copy it to your project and declare it in your bridging header if you need it.
In Conclusion
To implement this function costs me almost one week(off and on). But I have learnt a lot of new things of iOS development. Cheers!
Some extra points you may notice:
- Use
lipo -info liblept.a
to see if the library include the architecture you want. The architect should corresponding to the setting in your project. - Before building the framework you should check the which target to build for: device or simulator. You can only use the framework to device if you just build for device. Another way to solve is generate these two types and combine them.
Reference
Tesseract OCR Tutorial
Ângelo Suzuki's blog post
ldiqual/tesseract-ios-lib
gali8/Tesseract-OCR-iOS
How to export “fat” Cocoa Touch Framework (for Simulator and Device)?