0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

googletrans with google colab

Last updated at Posted at 2021-01-28

Click here for the article that I used as a reference. The reference code is almost the same.

Googletrans: Free and Unlimited Google translate API for Python

Install the python package googletrans with google colab and split from English txt to English and Japanese txt (and others).

about googletrans

[Subtitle files (.srt and .sbv)] (https://support.google.com/youtube/answer/2734698?hl=ja#zippy=%2C%E5%9F%BA%E6%9C%AC%E7 % 9A% 84% E3% 81% AA% E3% 83% 95% E3% 82% A1% E3% 82% A4% E3% 83% AB% E5% BD% A2% E5% BC% 8F% 2C% E9 % AB% 98% E5% BA% A6% E3% 81% AA% E3% 83% 95% E3% 82% A1% E3% 82% A4% E3% 83% AB% E5% BD% A2% E5% BC When you upload% 8F) etc. to google translate and see the translated one, the time code part and counter index are randomly changed to kanji, and the colon (:) is full-width.
It is necessary to perform the process of escaping the translation of such a part and returning it to the text after translating the text.For example, in French, a space is placed before the number.

If you can use the translation function in the procedure of sending back with API, the processing can be programmed on this side.

this QR is URL of this page:

The image is that when you run googletrans on google colab, you can upload the text and download the translated version.

like this. It runs on a google cloud computer, so you don't need a python runtime environment at hand.

Usage: https://youtu.be/tEJDsapYFr8

If you want to use local python instead of google colab, please refer to the page linked at the bottom of this article.1

About [google colab] (https://research.google.com/colaboratory/faq.html)

The package to install has the tkk fix patch (probably uninvestigated) applied, 4.0.0-rc1 did not result in an error.
In version 3.0.0 installed by pip install googletrans

code = unicode (self.RE_TKK.search(r.text).group (1)).replace ('var','')
AttributeError:'NoneType' object has no attribute'group'

Will result in the error. (As of 2021.1.27.)
This problem will often be a problem with Emacs's googletranlete program. Since it will be a chase that it will be corrected according to the token specification change of the service of google translate, this is the method that is temporarily used now, so there will always be changes in the future, so the link at the end Please check the Issue with. There will be updates to the problem at that point and tips such as solutions by volunteers.

If you specify the version on google colab and install the modified googletrans, it's OK. If you have an unversioned package installed, uninstall it, then
Install googletrans with google colab.

google-colab_googletrans.ipynb
pip install googletrans==4.0.0-rc1

and code is here

ipynb (ipython) <But This program does not work well!>

google-colab_googletrans.ipynb
from google.colab import files
from googletrans import Translator
import sys

uploaded = files.upload()

filename = ''
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  filename = fn

#args= sys.argv
args= [('translate.py'),filename,'>','translated-jp.txt']
if len(args) < 2:
    print('python3 translate.py textfile.txt > output_textfile.txt')
else:
    print('open '+args[1])
    f = open(args[1])
    lines = f.readlines()
    f.close()

    translator = Translator()
    for line in lines:
        translated = translator.translate(line, dest="ja");
        print(line) # Original
        print(translated.text) # translated
        print()
    print('EOF')
    files.download(filename)

class
googletrans.models.Translated(src, dest, origin, text, pronunciation, extra_data=None, **kwargs)
Translate result object
Parameters:
src – source language (default: auto)
dest – destination language (default: en)
origin – original text
text – translated text
pronunciation – pronunciation

However, there was a problem when I tried it, and after using it for several hours and verifying it, when there was a blank line in the text to be translated, it became IndexErorr: list index out of range.

In other words, the text to be translated is

00:00:00.320,00:00:06.320
welcome all you super amazing hardware addicts
i am so excited to share this project with you

00:00:06.880,00:00:11.920
after we got that letter in from the listener
talking about how they put lineage os

00:00:11.920, 00:00:17.840
on their fire hd tablet i just had to do
it and the kids have loved this change

In such a case, you will get an error if you stumble on the blank line on the 4th line.

0:00:00.320,0:00:06.320

00000.320,00006.320

welcome all you super amazing hardware addicts 

超素晴らしいハードウェア中毒者を歓迎します

i am so excited to share this project with you

このプロジェクトをあなたと共有できることをとてもうれしく思います

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-e8018cddf127> in <module>()
     23     translator = Translator()
     24     for line in lines:
---> 25         translated = translator.translate(line, dest="ja");
     26         print(line) # Original
     27         print(translated.text) # Japanese

1 frames
/usr/local/lib/python3.6/dist-packages/googletrans/client.py in <lambda>(part)
    220         # not sure
    221         should_spacing = parsed[1][0][0][3]
--> 222         translated_parts = list(map(lambda part: TranslatedPart(part[0], part[1] if len(part) >= 2 else []), parsed[1][0][0][5]))
    223         translated = (' ' if should_spacing else '').join(map(lambda part: part.text, translated_parts))
    224 

IndexError: list index out of range

But if you fill in the blank lines and then upload

00:00:00.320,00:00:06.320
welcome all you super amazing hardware addicts
i am so excited to share this project with you
00:00:06.880,00:00:11.920
after we got that letter in from the listener
talking about how they put lineage os
00:00:11.920,00:00:17.840
on their fire hd tablet i just had to do
it and the kids have loved this change

It's a simple problem that doesn't cause an error, so I think it will be improved soon.

(Addition) Improvement. (2021-01-28)

Since the process of removing line breaks '\n' and whitespace' ' is not a problem of googletrans at all, it has been improved so that the list passed to googletrans does not include line breaks and whitespace.

ipynb (ipython)

google-colab_googletrans.ipynb
pip install googletrans==4.0.0-rc1

translate.ipynb

google-colab_googletrans.ipynb

from google.colab import files
from googletrans import Translator
import sys

uploaded = files.upload()

filename = ''
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  filename = fn

#args= sys.argv
args = [('translate.py'),filename]

if len(args) < 2:
    print('python3 translate.py textfile.txt dist_textfile.txt.')
else:
    print('open '+args[1])
    with open(args[1]) as f:
      line = f.readlines() 
    f.close()
    
    line[:] = [l.strip() for l in line]
    line[:] = [l.rstrip('\n') for l in line]
    line[:] = [l.rstrip('\r') for l in line]
    line[:] = [a for a in line if a != '']
    line[:] = [l.replace('\n',' ') for l in line]
    ##print(line)

    translator = Translator()
    num = 20
    #obj_num = 1
    filename = 'translated.txt'

    backup_stdout = sys.stdout
    
    with open(filename,'w') as f:
        sys.stdout = f # stdout to file

        for count, l in enumerate(line):
            if count +1< num:
                translated = translator.translate(l, dest='ja')
                print(count+1,' ', l) # original text
                print(count+1,' ', translated.text)
            else:
                translated = translator.translate(l, dest='ja')
                print(count+1,' ', l) # original text
                print(count+1,' ', translated.text)        
                del translator
                num = num + 20
                #obj_num = obj_num + 1
                #print("")
                #print("--- translator :", obj_num)
                #print("")
                translator = Translator()        

        sys.stdout = backup_stdout # back
    del translator

    files.download(filename)


##    translator = Translator()
##    f = open(filename, 'w')
##    for l in line:
##      translated = translator.translate(l, dest="japanese");
##      print(l) # Original
##      f.writelines(l)
##      f.write('\n')
##      print(translated.text) # dest lang
##      f.writelines(translated.text)
##      f.write('\n')
##      print()
##      f.write('\n')
##    print('EOF')
##    f.close()
##
##    files.download(filename)

[github gist google-colab_googletrans.py] (https://gist.githubusercontent.com/dauuricus/1c5b0df0f03437b837dd6d15f48a798e/raw/e0f5e59f0aafece9fd864182fe809fd8c0899807/googletrans%2520_with_google_colab.py)

Cf. [how to remove newline character from a list in python] (https://www.kite.com/python/answers/how-to-remove-newline-character-from-a-list-in-python)

Cf. list comprehension:
https://realpython.com/lessons/writing-your-first-list-comprehension/

English to French
https://youtu.be/WlZbQnKOCMk

All of lang list

import googletrans

box =[]
for i in range(len(googletrans.LANGUAGES)):
    box.append(googletrans.LANGUAGES.popitem())

box.reverse()

for num,language in enumerate(box):
    print(num,language)
list
LANGUAGES = {
    'af': 'afrikaans',
    'sq': 'albanian',
    'am': 'amharic',
    'ar': 'arabic',
    'hy': 'armenian',
    'az': 'azerbaijani',
    'eu': 'basque',
    'be': 'belarusian',
    'bn': 'bengali',
    'bs': 'bosnian',
    'bg': 'bulgarian',
    'ca': 'catalan',
    'ceb': 'cebuano',
    'ny': 'chichewa',
    'zh-cn': 'chinese (simplified)',
    'zh-tw': 'chinese (traditional)',
    'co': 'corsican',
    'hr': 'croatian',
    'cs': 'czech',
    'da': 'danish',
    'nl': 'dutch',
    'en': 'english',
    'eo': 'esperanto',
    'et': 'estonian',
    'tl': 'filipino',
    'fi': 'finnish',
    'fr': 'french',
    'fy': 'frisian',
    'gl': 'galician',
    'ka': 'georgian',
    'de': 'german',
    'el': 'greek',
    'gu': 'gujarati',
    'ht': 'haitian creole',
    'ha': 'hausa',
    'haw': 'hawaiian',
    'iw': 'hebrew',
    'he': 'hebrew',
    'hi': 'hindi',
    'hmn': 'hmong',
    'hu': 'hungarian',
    'is': 'icelandic',
    'ig': 'igbo',
    'id': 'indonesian',
    'ga': 'irish',
    'it': 'italian',
    'ja': 'japanese',
    'jw': 'javanese',
    'kn': 'kannada',
    'kk': 'kazakh',
    'km': 'khmer',
    'ko': 'korean',
    'ku': 'kurdish (kurmanji)',
    'ky': 'kyrgyz',
    'lo': 'lao',
    'la': 'latin',
    'lv': 'latvian',
    'lt': 'lithuanian',
    'lb': 'luxembourgish',
    'mk': 'macedonian',
    'mg': 'malagasy',
    'ms': 'malay',
    'ml': 'malayalam',
    'mt': 'maltese',
    'mi': 'maori',
    'mr': 'marathi',
    'mn': 'mongolian',
    'my': 'myanmar (burmese)',
    'ne': 'nepali',
    'no': 'norwegian',
    'or': 'odia',
    'ps': 'pashto',
    'fa': 'persian',
    'pl': 'polish',
    'pt': 'portuguese',
    'pa': 'punjabi',
    'ro': 'romanian',
    'ru': 'russian',
    'sm': 'samoan',
    'gd': 'scots gaelic',
    'sr': 'serbian',
    'st': 'sesotho',
    'sn': 'shona',
    'sd': 'sindhi',
    'si': 'sinhala',
    'sk': 'slovak',
    'sl': 'slovenian',
    'so': 'somali',
    'es': 'spanish',
    'su': 'sundanese',
    'sw': 'swahili',
    'sv': 'swedish',
    'tg': 'tajik',
    'ta': 'tamil',
    'te': 'telugu',
    'th': 'thai',
    'tr': 'turkish',
    'uk': 'ukrainian',
    'ur': 'urdu',
    'ug': 'uyghur',
    'uz': 'uzbek',
    'vi': 'vietnamese',
    'cy': 'welsh',
    'xh': 'xhosa',
    'yi': 'yiddish',
    'yo': 'yoruba',
    'zu': 'zulu',

Reference for tkk error:
[stackoverflow.com "googletrans stopped working with error nonetype object has no attribute group"] (https://stackoverflow.com/questions/52455774/googletrans-stopped-working-with-error-nonetype-object-has-no-attribute -group)

[py-googletrans/issues/234] (https://github.com/ssut/py-googletrans/issues/234#issuecomment-742460612)


googletrans , YouTube subtitle , Google Colab
https://qiita.com/dauuricus/items/863dd4d087b3aff6455d

  1. googletrans with local python qiita page
    qr-code (4).png

0
0
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?