Tuesday, November 19, 2013

LibreOffice Import filters - what is stewing in the sauce-pan

Long time not see, dear friends. But that does not mean that there is nothing to speak about. So, hence a new blog post for those that were wondering what was happenning in the reverse-straight engineering partnership.

After the moments in August and September, where I transitioned from working on LibreOffice to working on SuSE Linux Enterprise and after some breathing pause to give to the Cesar (or also known as family) what is belonging to Cesar, the activity on LibreOffice related stuff restarted in October. Just this time, during nights, weekends and other free time.

Sample Keynote presentation in LibreOffice 4.2

It is with a huge pleasure that I realized that we start to have a vibrant developer community around the libwpd/libwpg family, as well as around Valek's reverse-engineering framework. SUSE Hackweek 10 helped me to produce an initial importer for Freehand file-format. Close to that, David Tardon of RedHat fame added a library to parse Keynote files and a library to convert different e-book file-formats. Laurent Alonso works like a bee on importing Microsoft Works spreadsheets (*.wks). Many exciting things in the pipeline, as you can see.

Wireframe of shapes from a sample Freehand drawing in LibreOffice 4.2

With the extension to presentations and spreadsheets, we decided that the time has come to simply break the super-stable libwpd/libwpg API and profit to make it even more future-proof and in the same token solve some of the API issues that were preventing us from importing correctly several features; most notable of which the Visio connectors.

librevenge

We decided to diminish drastically dupplication of code and we extracted from libwpd, libwpg and from libetonyek the API classes along with the used types. We created a new library, librevenge where we also added as sub-libraries the (structured) stream implementations that used to be in libwpd-stream, as well as several classes that the libraries used to copy and paste between them. The structured stream implementations support now both OLE2 and Zip containers and the relevant libraries assume this. That means that we will have to eventually extend the WPXSvStream implementation in LibreOffice's "writerperfect" module to cater for Zip too.

A new sub-library, librevenge-generators has the simple implementations of the interface classes that we use to convert documents into html, text, or that we use to see the raw API calls for the purpose of regression testing. The exception is the RVNGSVGDrawingGenerator class. In the current stable branches, all of the libraries that convert graphics file-formats contain an SVG generator and they rely on its presence in several cases for things like fills with vector graphics. This class is thus not part of the librevenge-generators library, but of the base librevenge, which is a hard dependency of all of the converter libraries.

RVNGPropertyList

The base type for passing information using the API callbacks is RVNGPropertyList, which was born from libwpd's WPXPropertyList. We modified the design of this class the way that each atrribute can have as a value either a simple property or an array of RVNGPropertyList element. This allows us to do more or less all that JSON is able to do. The API classes are even more flexible and future-proof, since extending the information passed in the different callbacks will not modify function signatures.

Quality improvement

Although the relevant libraries were quite extensively regression-tested in the past, the new librevenge extends the coverage of unit tests. We hope that this helps us to keep under control the basic functionalities without having to use the heavy regression tests on each commit.

Other effort is to avoid to copy in the API calls huge data structures. This effort will result in some performance improvements especially if a document contains a lot of shapes that are filled by different bitmap fills.

When will it be ready?

When it is ready! But seriously, we are trying to take our time and get the APIs right. Like this we intend to prevent gratuitous breakages of binary compatibility in the future. So, it will not be in LibreOffice 4.2 for sure.

If this is interesting for you, please drop by at #libreoffice-dev channel at irc.freenode.net in order to meet us. We cannot promise you that you will become rich, but we can guarantee you fame and eternal gratitude