一時停止してるし、レシート写真から情報を読みとるやつ自分で途中まで作ってみた

Last updated at 2018-06-15Posted at 2018-06-27

はじめに

レシートを1枚10円で買い取ってくれる「ONE」が流行ってますね。

ただ、流行りすぎて一時停止しているようなので、
この際ちょっと中身どうやって動いるのか勉強してみようと思い、似たようなアプリ開発を試みました。

ちなみに、iOS開発経験はなく、Swiftも2,3年前にドットインストールの無料講座を何個か見てたような見てなかったような経験しかありません（なんかおみくじ的なの作った気がするなあ。。全部忘れてる）。

そこで、わからないことはMitsuhiro Hashimotoさんに毎回聞いてかなり協力してもらいました。改めてありがとうございました。

ご存知の方もいる通り、この手の写真からの分析はOCRを使うのですが、
OCR（光学式文字認識）を取り込んで、写真から情報を認識させるまでの記事は何個か既にあるのにも関わらず、テキスト情報だけ抽出したり、画面に結果を表示させるまでを一連的に説明している記事はなかったので今回記事を作成しようと思いました。

やったこと

Google Cloud Vision のAPIを取得

Google?俺は自分で精度の高いOCRを作りたいんじゃああという方は是非頑張ってください。
（自分も最初はそれを目指し、詳しそうな企業に連絡をとったりなんかもしたが挫折）

とりあえず、Google　Cloud Visionで問題はないのでおすすめです。
Google Cloud Visionには、写真の特徴を説明したり（何が写っているかなど）、文字を認識したりしてくれる機能がついています。
自分の顔を読み込ませて特徴を吐かせたりなど色々面白いことができるので遊んでみると良いと思います。
APIの取得方法はこちらが参考になります。

サンプルコードのダウンロード

とりあえず、一旦サンプルコードをベースにしていくので、サンプルコードを自分の使用したい言語を選びながらダウンロードしてください。
そのやり方もここに書いてありましたね。
アプリ画面↓

コードの編集

API入力

とりあえず、ViewControllerの中の、"YOUR API"のところに自分の取得したAPIを入力します。
GoogleのAPIは確か、たくさん使用すると有料になるので自分のAPIは間違っても公開しないようにしましょう。

ViewController.swift

var googleAPIKey = "YOUR_API_KEY"

この時点で、「あれ？なんかもう動きそうじゃね？？」と感じるので適当なレシート画面を読み込ませてみます。
レシート↓

↓結果

このように、現時点のままだと画像の特徴情報が取り出されてしまいます。

テキストだけ抽出

そこで、とりあえず"feature"の部分を"TEXT_DETECTION"に置き換えて、再びレシート画面を読み込ませてみます。

VewController.swift

// Build our API request
        let jsonRequest = [
            "requests": [
                "image": [
                    "content": imageBase64
                ],
                "features": [
                    [
                        "type": "TEXT_DETECTION",
                        "maxResults": 10
                    ],
                    [
                        //"type": "FACE_DETECTION",
                        //"maxResults": 10
                    ]
                ]
            ]
        ]

結果↓

なああにいいい！何も表示されねえ！！

って思っていたら、実は、ViewController画面下に結果が表示されていました。

結果を画面に表示

あとは、こいつを上手く画面に表示させれば良いので、頑張ります。
（ここら辺がSwift経験のない自分には全く分からなかったため、めちゃめちゃ教えてもらいました。というかコード書いてもらったもののいまだにどうしてこうすると動くのか理解できていない）

画面表示の大きさは、Main.storyboardからいらなさそうなものを消したり、"Label Results"の大きさを変えるなりして変更できます。

ViewController.swift


// Copyright 2016 Google Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
import UIKit
import SwiftyJSON


class ViewController: UIViewController, UIImagePickerControllerDelegate, UINavigationControllerDelegate {
    let imagePicker = UIImagePickerController()
    let session = URLSession.shared
    
    @IBOutlet weak var imageView: UIImageView!
    @IBOutlet weak var spinner: UIActivityIndicatorView!
    @IBOutlet weak var labelResults: UITextView!
    @IBOutlet weak var faceResults: UITextView!
    
    var googleAPIKey = "YOUR_API_KEY"
    var googleURL: URL {
        return URL(string: "https://vision.googleapis.com/v1/images:annotate?key=\(googleAPIKey)")!
    }
    
    @IBAction func loadImageButtonTapped(_ sender: UIButton) {
        imagePicker.allowsEditing = false
        imagePicker.sourceType = .photoLibrary
        
        present(imagePicker, animated: true, completion: nil)
    }
    
    override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
        imagePicker.delegate = self
        labelResults.isHidden = true
        faceResults.isHidden = true
        spinner.hidesWhenStopped = true
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
        // Dispose of any resources that can be recreated.
    }
}


/// Image processing
extension ViewController {
    
    func analyzeResults(_ dataToParse: Data) {
        
        // Update UI on the main thread
        DispatchQueue.main.async(execute: {
            
            
            // Use SwiftyJSON to parse results
            let json = JSON(data: dataToParse)
            let errorObj: JSON = json["error"]
            
            self.spinner.stopAnimating()
            self.imageView.isHidden = true
            self.labelResults.isHidden = false
            self.faceResults.isHidden = false
            self.faceResults.text = ""
            
            // Check for errors
            if (errorObj.dictionaryValue != [:]) {
                self.labelResults.text = "Error code \(errorObj["code"]): \(errorObj["message"])"
            } else {
                // Parse the response
                print(json)
                let responses: JSON = json["responses"][0]
                
                /*
                // Get face annotations
                let faceAnnotations: JSON = responses["faceAnnotations"]
                if faceAnnotations != nil {
                    let emotions: Array<String> = ["joy", "sorrow", "surprise", "anger"]
                    
                    let numPeopleDetected:Int = faceAnnotations.count
                    
                    self.faceResults.text = "People detected: \(numPeopleDetected)\n\nEmotions detected:\n"
                    
                    var emotionTotals: [String: Double] = ["sorrow": 0, "joy": 0, "surprise": 0, "anger": 0]
                    var emotionLikelihoods: [String: Double] = ["VERY_LIKELY": 0.9, "LIKELY": 0.75, "POSSIBLE": 0.5, "UNLIKELY":0.25, "VERY_UNLIKELY": 0.0]
                    
                    for index in 0..<numPeopleDetected {
                        let personData:JSON = faceAnnotations[index]
                        
                        // Sum all the detected emotions
                        for emotion in emotions {
                            let lookup = emotion + "Likelihood"
                            let result:String = personData[lookup].stringValue
                            emotionTotals[emotion]! += emotionLikelihoods[result]!
                        }
                    }
                    // Get emotion likelihood as a % and display in UI
                    for (emotion, total) in emotionTotals {
                        let likelihood:Double = total / Double(numPeopleDetected)
                        let percent: Int = Int(round(likelihood * 100))
                        self.faceResults.text! += "\(emotion): \(percent)%\n"
                    }
                } else {
                    self.faceResults.text = "No faces found"
                }*/
                
                // Get label annotations
                let labelAnnotations: JSON = responses["textAnnotations"]
                let numLabels: Int = labelAnnotations.count
                var labels: Array<String> = []
                if numLabels > 0 {
                
                    var labelResultsText:String = "Labels found: "
                    for index in 0..<1 {
                        let label = labelAnnotations[index]["description"].stringValue
                        labels.append(label)
                    }
                    for label in labels {
                        // if it's not the last item add a comma
                        if labels[labels.count - 1] != label {
                            labelResultsText += "\(label), "
                        } else {
                            labelResultsText += "\(label)"
                        }
                    }
                    self.labelResults.text = labelResultsText
                }
                //} else {
                //    self.labelResults.text = labelResultsText //"No labels found"
                //}
            }
        })
        
    }
    
    func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String : Any]) {
        if let pickedImage = info[UIImagePickerControllerOriginalImage] as? UIImage {
            imageView.contentMode = .scaleAspectFit
            imageView.isHidden = true // You could optionally display the image here by setting imageView.image = pickedImage
            spinner.startAnimating()
            faceResults.isHidden = true
            labelResults.isHidden = true
            
            // Base64 encode the image and create the request
            let binaryImageData = base64EncodeImage(pickedImage)
            createRequest(with: binaryImageData)
        }
        
        dismiss(animated: true, completion: nil)
    }
    
    func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
        dismiss(animated: true, completion: nil)
    }
    
    func resizeImage(_ imageSize: CGSize, image: UIImage) -> Data {
        UIGraphicsBeginImageContext(imageSize)
        image.draw(in: CGRect(x: 0, y: 0, width: imageSize.width, height: imageSize.height))
        let newImage = UIGraphicsGetImageFromCurrentImageContext()
        let resizedImage = UIImagePNGRepresentation(newImage!)
        UIGraphicsEndImageContext()
        return resizedImage!
    }
}


/// Networking
extension ViewController {
    func base64EncodeImage(_ image: UIImage) -> String {
        var imagedata = UIImagePNGRepresentation(image)
        
        // Resize the image if it exceeds the 2MB API limit
        if (imagedata?.count > 2097152) {
            let oldSize: CGSize = image.size
            let newSize: CGSize = CGSize(width: 800, height: oldSize.height / oldSize.width * 800)
            imagedata = resizeImage(newSize, image: image)
        }
        
        return imagedata!.base64EncodedString(options: .endLineWithCarriageReturn)
    }
    
    func createRequest(with imageBase64: String) {
        // Create our request URL
        
        var request = URLRequest(url: googleURL)
        request.httpMethod = "POST"
        request.addValue("application/json", forHTTPHeaderField: "Content-Type")
        request.addValue(Bundle.main.bundleIdentifier ?? "", forHTTPHeaderField: "X-Ios-Bundle-Identifier")
        
        // Build our API request
        let jsonRequest = [
            "requests": [
                "image": [
                    "content": imageBase64
                ],
                "features": [
                    [
                        "type": "TEXT_DETECTION",
                        "maxResults": 1
                    ]/*,
                    [
                        "type": "FACE_DETECTION",
                        "maxResults": 10
                    ]*/
                ]
            ]
        ]
        let jsonObject = JSON(jsonDictionary: jsonRequest)
        
        // Serialize the JSON
        guard let data = try? jsonObject.rawData() else {
            return
        }
        
        request.httpBody = data
        
        // Run the request on a background thread
        DispatchQueue.global().async { self.runRequestOnBackgroundThread(request) }
    }
    
    func runRequestOnBackgroundThread(_ request: URLRequest) {
        // run the request
        
        let task: URLSessionDataTask = session.dataTask(with: request) { (data, response, error) in
            guard let data = data, error == nil else {
                print(error?.localizedDescription ?? "")
                return
            }
            
            self.analyzeResults(data)
        }
        
        task.resume()
    }
}


// FIXME: comparison operators with optionals were removed from the Swift Standard Libary.
// Consider refactoring the code to use the non-optional operators.
fileprivate func < <T : Comparable>(lhs: T?, rhs: T?) -> Bool {
    switch (lhs, rhs) {
    case let (l?, r?):
        return l < r
    case (nil, _?):
        return true
    default:
        return false
    }
}

// FIXME: comparison operators with optionals were removed from the Swift Standard Libary.
// Consider refactoring the code to use the non-optional operators.
fileprivate func > <T : Comparable>(lhs: T?, rhs: T?) -> Bool {
    switch (lhs, rhs) {
    case let (l?, r?):
        return l > r
    default:
        return rhs < lhs
    }
}
© 2018 GitHub, Inc.
Terms
Privacy
Security
Status
Help
Contact GitHub
API
Training
Shop
Blog
About
Press h to open a hovercard with more details.

↓結果

おおおおおお！

見事に、結果が画面の"Label"部分の出力されるようになりました。
なんか画面のバランスが色々おかしい感じがしますが、とりあえずできました！

今後

今後は、抽出された情報から必要箇所を取り出しデータベースに入力していくことが必要になるかと思います。
また勉強して続きをやっていきたいなと思います！

最後に

Twitterでは、エンジニア向けの面白そうな情報の共有や、「こんなアイデアあるんだけど作って欲しい！」といった依頼に無料で答えていますのでフォローの方してもらえると嬉しいです！

それでは！

216

192

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up