Qiita Teams that are logged in
You are not logged in to any team

Log in to Qiita Team
OrganizationEventAdvent CalendarQiitadon (β)
Qiita JobsQiita ZineQiita Blog
Help us understand the problem. What are the problem?

More than 3 years have passed since last update.

Reading pyc file (Python 3.5.2)

Out of respect to the following posts, I'll post this article in English too.

I tried reading pyc content in __pycache__ using the code mentioned above, to understand what the pyc structure looks like in recent days.

However, it was totally unsuccessful due to some mysterious error I don't understand.

$ python --version
Python 3.5.2
$ python show_pyc.py __pycache__/hello.cpython-35.pyc
magic b'160d0d0a'
moddate b'6a393e58' (Wed Nov 30 11:28:58 2016)
source_size: 227
Traceback (most recent call last):
  File "show_pyc.py", line 74, in <module>
  File "show_pyc.py", line 70, in show_file
ValueError: bad marshal data (unknown type code)

Actually, Ian (in the second article) appropriately mentioned in the comment.

The file format has changed slightly as of Python 3.3+, so the recipe above no longer works.
In addition to the two original four-byte fields there is a new four-byte field that encodes the size of the source file as a long.

OK, pyc header after "3.3+" now contains another 4 bytes!

Because of this slight modification, all the documents before Python 3.3 may contain misleading descriptions. For another example, take PEP 3147.

Byte code files contain two 32-bit big-endian numbers followed by the marshaled [2] code object.
The 32-bit numbers represent a magic number and a timestamp.

This is not the case anymore. Anyway, the PEP was originally released for Python 3.2, and there was no guarantee pyc format would not change over time.

Here's my modified version of pyc reader.

import binascii
import dis
import marshal
import sys
import time
import types

def get_long(s):
    return s[0] + (s[1] << 8) + (s[2] << 16) + (s[3] << 24)

def show_hex(label, h, indent):
    h = binascii.hexlify(h).decode('ascii')
    if len(h) < 60:
        print('%s%s %s' % (indent, label, h))
        print('%s%s' % (indent, label))
        for i in range(0, len(h), 60):
            print('%s   %s' % (indent, h[i:i+60]))

def show_code(code, indent=''):
    print('%scode' % indent)
    indent += '   '
    print('%sargcount %d' % (indent, code.co_argcount))
    print('%snlocals %d' % (indent, code.co_nlocals))
    print('%sstacksize %d' % (indent, code.co_stacksize))
    print('%sflags %04x' % (indent, code.co_flags))
    show_hex('code', code.co_code, indent=indent)
    print('%sconsts' % indent)
    for const in code.co_consts:
        if isinstance(const, types.CodeType):
            show_code(const, indent+'   ')
            print('   %s%r' % (indent, const))
    print('%snames %r' % (indent, code.co_names))
    print('%svarnames %r' % (indent, code.co_varnames))
    print('%sfreevars %r' % (indent, code.co_freevars))
    print('%scellvars %r' % (indent, code.co_cellvars))
    print('%sfilename %r' % (indent, code.co_filename))
    print('%sname %r' % (indent, code.co_name))
    print('%sfirstlineno %d' % (indent, code.co_firstlineno))
    show_hex('lnotab', code.co_lnotab, indent=indent)

def show_file(fname: str) -> None:
    with open(fname, 'rb') as f:
        magic_str = f.read(4)
        mtime_str = f.read(4)
        mtime = get_long(mtime_str)
        modtime = time.asctime(time.localtime(mtime))
        print('magic %s' % binascii.hexlify(magic_str))
        print('moddate %s (%s)' % (binascii.hexlify(mtime_str), modtime))
        if sys.version_info < (3, 3):
            print('source_size: (unknown)')
            source_size = get_long(f.read(4))
            print('source_size: %s' % source_size)

if __name__ == '__main__':

Let the new reader work on the following code.

a, b = 1, 0
if a or b:
    print("Hello World")
$ python --version
Python 3.5.2
$ ls -l hello.py
-rwxr-xr-x 1 dmiyakawa 48 Nov 30 12:41 hello.py
$ python -m py_compile hello.py
$ python show_pyc.py __pycache__/hello.cpython-35.pyc
magic b'160d0d0a'
moddate b'574a3e58' (Wed Nov 30 12:41:11 2016)
source_size: 48
   argcount 0
   nlocals 0
   stacksize 2
   flags 0040
  1           0 LOAD_CONST               4 ((1, 0))
              3 UNPACK_SEQUENCE          2
              6 STORE_NAME               0 (a)
              9 STORE_NAME               1 (b)

  2          12 LOAD_NAME                0 (a)
             15 POP_JUMP_IF_TRUE        24
             18 LOAD_NAME                1 (b)
             21 POP_JUMP_IF_FALSE       34

  3     >>   24 LOAD_NAME                2 (print)
             27 LOAD_CONST               2 ('Hello World')
             30 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             33 POP_TOP
        >>   34 LOAD_CONST               3 (None)
             37 RETURN_VALUE
      'Hello World'
      (1, 0)
   names ('a', 'b', 'print')
   varnames ()
   freevars ()
   cellvars ()
   filename 'hello.py'
   name '<module>'
   firstlineno 1
   lnotab 0c010c01

Note the size of the source file (48) is appropriately embedded in pyc too. That is the new part which is introduced in Python 3.3+ (Sorry I don't know what "+" means here).

This seemed working fine with Python 3.5.2, 3.4.3, 3.3.6, 3.2.6, and 3.6.0b3 on my environment with MacOS Sierra + pyenv. In 3.2.6, obviously, it does not answer source size because it is not embedded in pyc either.

For readers from future: do not rely on the assumption "pyc format won't change", as I did.

Note (2018-02-01)

Python 3.7 or later may have different pyc format, which will be more "deterministic". See the following PEP

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Help us understand the problem. What are the problem?