testing document conversion

2010-01-06

Being able to properly read many different file formats is important for KOffice success. By 'read', I mean 'convert to ODF' because the conversion and reading is strictly separated in KOffice. KWord will convert a .doc file to a .odt file before loading it into the internal rendering and editing structure. There is even a nice separate program called 'koconverter' that can convert files on the command-line.

So far, there were no decent tests to avoid regressions in our filters. I have written a small framework (well, a shell script, but framework sounds better) that makes it simple to write tests. There are a number of tests there now for converting ppt files, but it would be great to have them for other input formats too. And here is where I hope you will help. All you need is a small input file that highlights a feature or problem and a small XSL file. The XSL file contains the test.

Look at this small example. Suppose you have a file, it can be a .doc, .docx or another office format. The file contains only one image and you want to have an automated test to verify that the ODF that is created also has one image. The following XSL file tests this:

<?xml version="1.0" encoding="UTF-8"?>
<x:stylesheet
   xmlns:d="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
   xmlns:x="http://www.w3.org/1999/XSL/Transform" version="1.0"
>
 <x:template match="/">
  <x:if test="count(//d:image) != 1">
   <x:message terminate="yes">
    Error: there should be exactly one image.
   </x:message>
  </x:if>
 </x:template>
</x:stylesheet>

If the number of image elements is not exactly one, the XSL transformation will abort with an error message.

So you see that the framework is written in such a way that writing tests is easy and fast. When reporting a bug in KOffice or koconverter you can help a lot by writing an XSL for our automated tests. You will see that this will speed up fixing the bug and it will help avoid regressions.

This way of testing is a bit unconventional: these are not unit tests but overall tests. Files are converted to ODF and the output file is checked. Not a small part is tested but the complete conversion is tested. A benefit is that the tests are independent of the programs doing the conversion. We just check the result. So the same method could be used on any programs that write out ODF files.

Here is how our tests in KOffice work. First we convert the input file to ODF with koconverter. An ODF is a zip file with many files and we usually want to check the content of the XML files. So after conversion with koconverter, the ODF file is uncompressed. Then an XSL transformation is run on the file content.xml.

In XSL on can report errors like this:

<x:if test="string($style/s:graphic-properties/@d:fill-color) != '#bbe0e3'">
  <x:message terminate="yes">
    Error: draw:fill-color of the second frame should be '#bbe0e3'.
  </x:message>
</x:if>

(You see that XML does not have to be too verbose.) The prefixes x: and s: in this snippet stand for http://www.w3.org/1999/XSL/Transform and urn:oasis:names:tc:opendocument:xmlns:style:1.0 respectively. The test checks if the fill-color for a particular part of the output document has the correct value. If not an error message is printed and the transformation stopped.

You can replay this example by checking out the tests:

svn checkout svn://anonsvn.kde.org/home/kde/trunk/tests/kofficetests/
cd import/powerpoint
make test

That was the overview of how the tests work. Now let us look into one more complicated test. It has two files: background.ppt and background.xsl. background.ppt is the input file and background.xsl is the transformation that verifies the output of the transformation.

The file background.ppt has two frames, one of which must have a light blue (#bbe0e3) background. At the moment the frame gets a background color, but it is wrong. So when fixing this bug we first formulate what we want the result to be by writing an XSL file.

One XSL file can contain multiple tests. This test is called testSolidBackground:

<x:template name="testSolidBackground">

We assign the second frame in content.xml to a variable:

<x:variable name="frame"
  select="o:body/o:presentation/d:page/d:frame[position()=2]"/>

Now we find the name of the style for this frame:

  <x:variable name="stylename" select="$frame/@p:style-name"/>

And find the style with that name:

<x:variable name="style"
  select="o:automatic-styles/s:style[@s:name=$stylename]"/>

Now we do a sanity check: do we even have a second frame?

<x:if test="count($frame) != 1">
  <x:message terminate="yes">
    Error: there is no second frame on the first slide.
  </x:message>
</x:if>

And do we even have a style?

<x:if test="count($style) != 1">
  <x:message terminate="yes">
    Error: there is no style for the second frame.
  </x:message>
</x:if>

Now we test if the background is 'solid':

<x:if test="string($style/s:graphic-properties/@d:fill) != 'solid'">
  <x:message terminate="yes">
    Error: draw:style of the second frame should be solid.
  </x:message>
</x:if>

And we check the color:

<x:if test="string($style/s:graphic-properties/@d:fill-color) != '#bbe0e3'">
  <x:message terminate="yes">
    Error: draw:fill-color of the second frame should be '#bbe0e3'.
  </x:message>
</x:if>

That is all there is to it! Learning XSL if you do not know it yet is some
effort but one that will pay off. Once you have the XSL you can run 'make test'
while fixing the bug. This will call the test for you which has as side-effect
that the conversion is run and the odf file unpacked.

I hope you all will start using this method for reporting and fixing filter bugs. I stop by starting you off with some links to XSL and XPath.

Comments

Post a comment