Reverse engineering Mischief file format Part 1

There is a great drawing program called Mischief.

It is beautiful, minimalistic, vector-based and very responsive. What’s there not to like?

I wanted to read their file format, but it turned out to be harder than expected.
So I decided to write about the reverse-engineering process.

next part →

Raw input

First thing I do when trying to understand a file format is look at raw file and hex dump.

# `du -b` prints file size in bytes
# `xxd` prints hex dump of a file to standard output
m1el@wbox $ du -b

m1el@wbox $ xxd
0000000: c5b3 8be9 8100 0000 2000 0200 0000 0000  ........ .......
0000010: 0300 0100 0000 0000 0000 803f 0000 0000  ...........?....
0000020: 0200 0000 5100 0000 0202 0000 0001 003a  ....Q..........:
0000030: fb68 83ae d8cf 313f 626a 4555 9a47 4f6b  .h....1?bjEU.GOk
0000040: 5184 3a04 fcf6 335c 6ca1 2597 c2fb b0c5  Q.:...3\l.%.....
0000050: 8d41 d4d9 0409 1d2b cf3a ee1e 140f 092f  .A.....+.:...../
0000060: 2951 ce95 1d5b 4010 9728 0924 69ce 7afb  )Q...[@..(.$i.z.
0000070: 2848 cd5f 007c 3108 80                   (H._.|1..

This does not look good, already a big “random” byte sequence for an empty file. By “random” I mean that byte frequencies look like to be evenly-distributed (although this is a relatively small sample to say something about frequency distribution).

Let’s look at some bigger files.

m1el@wbox $ du -b

m1el@wbox $ xxd | head -16
0000000: c5b3 8be9 8100 0000 0000 0200 0000 0000  ................
0000010: 0300 0100 0000 0000 0000 803f 0000 0000  ...........?....
0000020: 1200 0000 f3df 0300 8385 0600 0001 003a  ...............:
0000030: fb68 83ae d8cf 313f 626a 4555 9a43 13a5  .h....1?bjEU.C..
0000040: 5880 f14d 9ed6 1f78 a0ca e1bb 6770 8e93
0000050: 96a0 e0f7 3d74 d382 361b 33d2 c0f4 8353  ....=t..6.3....S
0000060: 439b b9dd 3cc6 7fd3 dcba e272 b35a 46a1  C...<......r.ZF.
0000070: 31c1 8422 9843 609c d1dc 519b d004 cbd6  1..".C`...Q.....
0000080: 9c65 ef3a 3e65 7e3e a031 7fd1 6809 dab9  .e.:>e~>.1..h...
0000090: 81e9 a222 f3f0 bf45 7c42 2003 bcfb 6e49  ..."...E|B ...nI
00000a0: c33a ee6c bda4 8c8d 8fd8 9972 1905 a73e  .:.l.......r...>
00000b0: 6dc9 752d fe14 2bd7 fb1b 0e82 98b7 32e4  m.u-..+.......2.
00000c0: 212d 7550 5ec3 a4f8 777b a9c2 e122 e8a1  !-uP^...w{..."..
00000d0: 2cfb 2e71 9475 bc5e 429c eebb a933 df7e  ,..q.u.^B....3.~
00000e0: d4dd 5731 f50e 46a1 cbac 4e96 da4e 5876  ..W1..F...N..NXv
00000f0: 00f8 d5b7 5b24 75cc 3443 9324 9c70 f64b  ....[$u.4C.$.p.K

m1el@wbox $ xxd | head -16
0000000: c5b3 8be9 8100 0000 0000 0200 0000 0000  ................
0000010: 0300 0100 0000 0000 0000 803f 0000 0000  ...........?....
0000020: 1600 0000 dfc3 1200 a7b3 1c00 0001 003a  ...............:
0000030: fcfc 0a25 a80a a7ac 953d 7cde fa28 b32d  ...%.....=|..(.-
0000040: da83 ce7e 8f34 c197 25f2 9005 b076 207e  ...~.4..%....v ~
0000050: 024b c3a0 bb62 0580 b1e6 f6e4 c5ee 0c19  .K...b..........
0000060: 968b 4335 9433 f6ca 5d88 31c4 78dd e93b  ..C5.3..].1.x..;
0000070: 32f7 ba82 3e6c 2171 353b b0b1 b780 5237  2...>l!q5;....R7
0000080: 74de 2141 a875 573d 86f2 ed2d 91c3 64fb  t.!A.uW=...-..d.
0000090: 929e 610b ff98 fe03 2979 f59f b960 d24c  ..a.....)y...`.L
00000a0: f043 4e4c 3641 0096 1ff5 ec84 f29f ab1c  .CNL6A..........
00000b0: 2a46 47f8 7877 85ff 3e0e d61c 1971 e23c  *FG.xw..>....q.<
00000c0: bfc8 1d4a 0985 fb5b f66a bc32 b2be 14c0  ...J...[.j.2....
00000d0: b463 754e 1c4f 386a 46e9 9361 d2a0 ab79  .cuN.O8jF..a...y
00000e0: 60cd fcfc f421 9b1a 871f b381 06bf 5fdd  `....!........_.
00000f0: 71aa 15ce 4854 745d 57db b3de c24e bc5f  q...HTt]W....N._

Yep, starting from byte 0x30 it is “random bytes”. It looks like Mischief is using some sort of packing or encryption, which we’ll have to look into.

But first, let’s look at first 0x30 bytes. First 0x20 bytes look the same in all three files, except for one byte in empty file.

I assume first four bytes c5b38be9 are format “magic” number.

Next, let’s look at four bytes on position 0x24..0x27 and 0x28..0x2b.

file                size    0x24..0x27  0x28..0x2b           121     51000000    02020000
SAMPLE_Figures1     253979  f3df0300    83850600
SAMPLE_Sleepy_Story 1229831 dfc31200    a7b31c00

m1el@wbox $ python3 -c 'print("%d" % (121 - 0x51))'
m1el@wbox $ python3 -c 'print("%d" % (253979 - 0x3dff3))'
m1el@wbox $ python3 -c 'print("%d" % (1229831 - 0x12c3df))'

It looks like bytes 0x24..0x27 store data size. But it is weird: at offset 40 (=0x28) we already have a possible integer stored.

Mischief insides 1

It is time to look how Mischief itself does the parsing.

Unpacking mischief.exe with upx: upx -d mischief.exe, an easy step.

Loading Mischief into Ollydbg, adding a breakpoint to ReadFile WinApi function. To add a breakpoint on function call, press Ctrl-F1, then type bpx FunctionCallToBreakOn.

Open and… nothing, it did not hit a breakpoint. Added breakpoints on fread, maybe Mischief is calling ReadFile through msvcrt/libc.

That’s much better: Mischief hit a breakpoint in this function on opening the file: open-art-asm.txt

Ugh. It looks like there are several possible file formats or possible parsing branches; we’re going to look at one branch only.

In a nutshell, this function does the following:

  • Read 32-bit magic, fail if it is not c5b38be9 or c5b38be7.
  • Read a 32-bit int. In case first int is 0x81, read three more.
  • Read four more ints if the weather is appealing.
    At this point, we’re at offset 0x24.
  • Read 32-bit size, allocate size + 4 bytes.
  • Read size bytes.
  • “Unpack” these bytes.
  • Parse unpacked bytes.

So far, the few things we’re interested in are magic, data size and data. Looks like data always starts at offset 0x28.

Our own parser is going to do the following:

  • Read and check magic.
  • Read data size.
  • Read data bytes of given size.
  • Unpack data. (We don’t know how to do that yet)
  • Parse unpacked.

First parsing steps

Let’s write a simple file parser in python. All it’s going to do so far is read packed data size and data from file.

In the next part, we’ll dive into the unpacking function.

next part →