Reverse engineering Mischief file format Part 3
In this part, we’re going to parse unpacked .art
data.
We have full control over the file contents so it should be possible to explore file format by making changes in Mischief, and exploring the unpacked dump.
Let’s look at empty file dump:
$ python3 artparser.py empty.art | xxd > empty.01.xxd
$ cat empty.01.xxd
0000000: 0200 0000 0000 0000 0000 0000 dcdc dc00 ................
0000010: 0080 3f01 0000 0001 0000 0000 0000 0000 ..?.............
0000020: 0000 0000 0000 0000 0000 0000 0000 7ac2 ..............z.
0000030: 0641 1a2e f83e 0000 803f 0000 0000 0000 .A...>...?......
0000040: 0000 0400 0000 0000 803f 0000 803f 0000 .........?...?..
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 803f 0000 0000 0000 0000 0000 0000 0000 .?..............
0000070: 0000 0000 803f 0000 0000 0000 0000 0000 .....?..........
0000080: 0000 0000 0000 0000 803f 0000 803f 0100 .........?...?..
0000090: 0000 0000 0000 0100 0000 0100 0000 0000 ................
00000a0: 803f 4c61 7965 7220 3100 0000 0000 0000 .?Layer 1.......
00000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
... snip ...
00001a0: 0000 0000 0000 0000 803f 0000 0000 0000 .........?......
00001b0: 0000 0000 0000 0000 0000 0000 803f 0000 .............?..
00001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00001d0: 803f 0000 0000 0000 0000 0000 0000 0000 .?..............
00001e0: 0000 0000 803f 0000 803f 0000 0000 0100 .....?...?......
00001f0: 0000 0000 0000 0800 0000 0000 0000 0000 ................
0000200: 0000 ..
Now, I happen know that #dcdcdc
is background color, and there it is,
on the first line. It is possible to see the layer name at offset 0xa2
.
Let’s change something, for example — set pen color to #69face
.
$ python3 artparser.py empty.art | xxd > empty.02.xxd
$ diff empty.01.xxd empty.02.xxd
3c3
< 0000020: 0000 0000 0000 0000 0000 0000 0000 7ac2 ..............z.
---
> 0000020: 0000 0000 0000 0069 face 0000 0000 7ac2 .......i......z.
We can clearly see the pen color at offset 0x27
.
What about pen type, size and opacity?
$ python3 artparser.py empty.art | xxd > empty.03.xxd
$ diff empty.02.xxd empty.03.xxd
3,4c3,4
< 0000020: 0000 0000 0000 0069 face 0000 0000 7ac2 .......i......z.
< 0000030: 0641 1a2e f83e 0000 803f 0000 0000 0000 .A...>...?......
---
> 0000020: 0000 0001 0000 0069 face cdcc 4c3e db14 .......i....L>..
> 0000030: 3942 ba01 5c3f cdcc 4c3f 0000 803f 0000 9B..\?..L?...?..
Let’s look at those modified bytes as 32-bit floats.
$ python3
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from binascii import hexlify, unhexlify
>>> from struct import pack, unpack
>>> from pprint import pprint
>>> unpack('5f', unhexlify('cdcc4c3edb143942ba015c3fcdcc4c3f0000803f'))
(0.20000000298023224, 46.27036666870117, 0.8594013452529907, 0.800000011920929, 1.0)
46.27
looks like pen size, 0.8
is 80% pen opacity, after playing with pens some more,
I’ve figured out that 0.2
is noise level, 0.8594
is max pen size/min pen size and
the last one is max opacity/min opacity.
What if we pan and rotate the canvas?
$ python3 artparser.py empty.art | xxd > empty.04.xxd
$ diff empty.03.xxd empty.04.xxd
5,9c5,9
< 0000040: 0000 0400 0000 0000 803f 0000 803f 0000 .........?...?..
< 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
< 0000060: 803f 0000 0000 0000 0000 0000 0000 0000 .?..............
< 0000070: 0000 0000 803f 0000 0000 0000 0000 0000 .....?..........
< 0000080: 0000 0000 0000 0000 803f 0000 803f 0100 .........?...?..
---
> 0000040: 0000 0400 0000 0000 803f 91e0 613f e0f5 .........?..a?..
> 0000050: f03e 0000 0000 0000 0000 e0f5 f0be 91e0 .>..............
> 0000060: 613f 0000 0000 0000 0000 0000 0000 0000 a?..............
> 0000070: 0000 0000 803f 0000 0000 9464 9f43 3a21 .....?.....d.C:!
> 0000080: dcc2 0000 0000 0000 803f 0000 803f 0100 .........?...?..
27c27
< 00001a0: 0000 0000 0000 0000 803f 0000 0000 0000 .........?......
---
> 00001a0: 0000 8801 0000 0000 803f 0000 0000 0000 .........?......
... snip ...
>>> a=unpack('16f', unhexlify('91e0613fe0f5f03e0000000000000000\
... e0f5f0be91e0613f000000000000000000000000000000000000803f000\
... 0000094649f433a21dcc2000000000000803f'))
>>> a
(0.8823328614234924, 0.4706258773803711, 0.0, 0.0,
-0.4706258773803711, 0.8823328614234924, 0.0, 0.0, 0.0, 0.0,
1.0, 0.0, 318.7857666015625, -110.06489562988281, 0.0, 1.0)
>>> pprint([list(a[i:i+4]) for i in range(0, 16, 4)]) # as matrix
[[0.8823328614234924, 0.4706258773803711, 0.0, 0.0],
[-0.4706258773803711, 0.8823328614234924, 0.0, 0.0],
[0.0, 0.0, 1.0, 0.0],
[318.7857666015625, -110.06489562988281, 0.0, 1.0]]
This looks like 4x4 transformation matrix. These are often used in 3D graphics.
In a similar way, it is possible to see how any change affects the file.
There was an interesting detail: the creators of Mischief decided to “bit-pack” the representation of pen points, even though the file is going to be compressed later. Mischief packs pen movement including pressure to a 5-byte structure: 14 bit dx fixed point float, 1 bit dx sign, 14 bit dy fixed point float, 1 bit dy sign and 10 bit fixed point pressure.
I think it might’ve been better to use just 16-bit fixed point floats for these 3 values, it would only cost 1 byte per point and it would probably compress better.
After making a lot of changes to the file and seeing the effect, I’ve described a file structure in a following file:
0000000: 0200 0000 [alyridx] 0000 0000 [bgcol][b ................
0000010:g alpha]01 0000 0001 0000 0000 0000 0000 ..?.............
0000020: 0000 00[pen type][fgcolr][pennoise][pen ............L>%.
0000030:size] [min/size][penopcty][min/opct][is !B...?...?...?..
0000040:erzr] ???? ???? [float 1] [ .........?...?..
0000050: 4x4 float view matrix ................
0000060: .?..............
0000070: .....?..........
0000080: ] [viezoom] [Lyr .........?...?..
0000090:cont]([lyrordr])[lyrcont]([lyrvsble][Lyr ................
00000a0:opcty][Layer name .?Layer 1.......
... snip ...
00001a0: ][lyractns] [ .........?......
00001b0: 4x4 float layer matrix .............?..
00001c0: ................
00001d0: .?..............
00001e0: ] [lyrzoom])[img cnt] [act .....?...?......
00001f0:ncnt](*action*)*0000 0000
action: [layerno][actionid]*arguments*
actionids:
01: stroke
arguments: [strokecnt][originx][originy][opct]([5bytes])*(strokecnt-1)
5 bytes are:
dx sign, 14 dx bits,
dy sign, 14 dy bits,
and 10 bit pressure
decoding:
dx = bytes[0] | ((bytes[1] & 0x3f) << 8)
if bytes[1] & 0x40: dx = -dx
dy = (bytes[1] >> 7) | (bytes[2] << 1) | ((bytes[3] & 0x1f) << 9)
if bytes[3] & 0x20: dy = -dy
pressure = (bytes[3] >> 6) | (bytes[4] << 2)
dx = i & 0x3fff
if i & (1<<14): dx = -dx
dy = (i >> 15) & 0x3fff
if i & (1<<29): dy = -dy
pressure = (i >> 30) | (bytes[4] << 2)
08: undo position? [undopos]
33: pen matrix, [4x4 float matrix][zoom]
34: pen properties, [pentype][noizelvl][pensize][min/size][penopct][min/penopct]
35: pen color [rgb][erzer(byte)]
36: ??? argument: int
Then parsing was implemented in python in the following commit.
It is now possible to dump all information in the .art
file in a human-readable way.
$ python3 artparser.py small.art
pen info:
{'color': (0, 0, 0),
'is_eraser': 0,
'noise': 0.0,
'opacity': 1.0,
'opacity_min': 0.0,
'size': 8.422479629516602,
'size_min': 0.4847267270088196,
'type': 0}
view matrix:
[[2.2500007152557373, 0.0, 0.0, 0.0],
[0.0, 2.2500007152557373, 0.0, 0.0],
[0.0, 0.0, 1.0, 0.0],
[-758.07958984375, -543.32177734375, 0.0, 1.0]]
layer order:
[0]
layer info:
[{'action_count': 3196,
'matrix': [[1.0, 0.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 0.0, 1.0]],
'name': 'Layer 1',
'opacity': 1.0,
'visible': 1,
'zoom': 1.0}]
actions:
[{'action_id': 8, 'action_name': 'unknown_08', 'argument': 0, 'layer': 0},
{'action_id': 51,
'action_name': 'pen_matrix',
'layer': 0,
'matrix': [[0.44444429874420166, -0.0, 0.0, -0.0],
[-0.0, 0.44444429874420166, -0.0, 0.0],
[0.0, -0.0, 1.0, -0.0],
[336.92413330078125, 241.47625732421875, 0.0, 1.0]],
'zoom': 0.44444429874420166},
{'action_id': 52,
'action_name': 'pen_properties',
'layer': 0,
'noise': 0.0,
'opacity': 1.0,
'opacity_min': 0.0,
'size': 8.422479629516602,
'size_min': 0.4847267270088196,
'type': 0},
{'action_id': 53, 'action_name': 'pen_color', 'color': (0, 0, 0), 'layer': 0},
{'action_id': 54, 'action_name': 'is_eraser', 'is_eraser': 0, 'layer': 0},
{'action_id': 1,
'action_name': 'stroke',
'layer': 0,
'points': [{'p': 1.0, 'x': 306.0, 'y': 485.0},
{'p': 1.0, 'x': 306.0, 'y': 485.0}]},
{'action_id': 1,
'action_name': 'stroke',
'layer': 0,
'points': [{'p': 1.0, 'x': 368.0, 'y': 412.0},
{'p': 1.0, 'x': 368.0, 'y': 412.0}]},
{'action_id': 1,
'action_name': 'stroke',
'layer': 0,
'points': [{'p': 1.0, 'x': 260.0, 'y': 400.0},
{'p': 1.0, 'x': 260.0, 'y': 400.0}]}]
The resulting parser is available at github.com/m1el/mischief-re.
Converting .art
files
Converting .art
files to other vector formats, such as SVG is possible but with several caveats:
- “Pencil” pen Mischief is drawn as a bunch of noizy circlies, there is no efficient represenation in SVG for that. It is possible to use circles/lines with gradient fill, but it is goging to look different.
- “Eraser” pen is decreasing value in alpha channel, without modifying previous strokes. I don’t know if there is efficient representation for that. It probably requires creating a layer mask for each eraser sequence.