Using Xerces-C++ Progressive Scan with Qt

Parsing XML is fun again with Xerces progressive scan.

Do you have large XML files that you need to parse in your GUI application? Are you sick of your XML parser blocking the event loop while it laboriously parses all that XML? Don’t waste another day! Xerces-C++ progressive scan was made for you, my friend.

A typical XML parser call looks like this:

parser->setFile( "foo.xml" );
parser->parse();

The parse() function will typically block until the XML file has been totally parsed. Both DOM and SAX approaches behave this way, generally.

This is a huge limitation for GUI developers, because this will cause your GUI to become unresponsive (i.e., white out) for the duration of the parse() call. Many GUI developers choose to spawn a thread to parse the XML outside of the event loop, but this introduces its own set of problems. A great way to keep your GUI responsive and avoid threads is to use Xerces Progressive Scan. This allows you to read a small chunk of the XML file at a time, and return to the event loop between each chunk. By repeating this until the file is completely read, you can avoid blocking the event loop. In Qt, the easiest way to do this is with a QTimer.

Let’s create an example using a Xerces SAX parser. To do this, create a class inherited from QObject that will be responsible for XML parsing. Give it a slot called readSomeMore(), maybe something like this (note that I’ve ommitted the #include and namespace lines for brevity):

class MyXmlParser : public QObject, public DefaultHandler
{
    public:
        MyXmlParser( const QString &fileName, QObject *parent );
        void parse();
        void startElement (const XMLCh *const uri, const XMLCh *const localname,
                                  const XMLCh *const qname, const Attributes &attrs);

    signals:
        void finished();

    private slots:
        void readSomeMore();

    private:
        XMLPScanToken token;
        QTimer *timer;
        const char *fileName;
        SAX2XMLReader* parser
};

Now let’s look at the two most important functions, parse() and readSomeMore(). The rest of the class is boiler plate stuff that you should already be familiar with if you are using Xerces (or any SAX parser for that matter). The parse() function will be called by users to start the parsing process, but since the whole parsing process happens asynchronously, it will return immediately. This is good, since this is the same way QSocket‘s connectToHost() works and a whole slew of other Qt classes.

The parse() function will be responsible for creating the Xerces XML parser, and initializing it. Then, it will advance to the first element in the XML file and save off the XMLPScanToken. Lastly, it will start the timer, which will continue the parsing until it is finished.

Here’s the parse() function:

void MyXmlParser::parse()
{
    // Create the Xerces parser and set ourselves up as the handler:
    parser = XMLReaderFactory::createXMLReader();
    parser->setContentHandler(this);

    // Parse the first XML element, and if successful, start the parsing timer
    // to work in the background:
    if( parser.parseFirst( fileName, token ) )
    {
        // Kick off the timer, and tell it to start reading.
        timer = new QTimer( this );
        connect( timer, SIGNAL(timeout()), this, SLOT(readSomeMore()) );

        // Start it with a 0 ms interval, to make it go FAST:
        timer->start( 0 );
    }
    else
    {
        // An error occurred, so tell everyone that we are finished:
        emit finished();
    }
}

Side note: Notice that this implementation of parse() ignores error conditions. A good way to error handling here is to make the finished() signal emit a bool to indicate an error. Then add an errorString() funcion that returns a human-readable error message (suitable for a QMessageBox or equivelant) for callers to figure out what went wrong.

So the parse() function just started a timer that is going to call readSomeMore(), which is responsible for reading a little bit of XML over and over, and then killing the timer when it’s done. It should look something like this:

void MyXmlParser::readSomeMore()
{
    // Read 100 elements:
    int count = 0;
    while( count++ < 100 )
    {
        // When parseNext() returns false, the XML file is compeltely read,
        // so kill the timer and emit our finished signal.
        if( ! parser->parseNext() )
        {
            timer->stop();
            emit finished();
        }
    }
}

And that’s it. You can experiment with the number of elements you parse in each invocation of readSomeMore() to see where your best performance lies. I have found that between 100 and 500 gets pretty good performance.

I’ve also found that I can do this with about 100 different XML files simultaneously, each with their own instance of MyXmlParser, and the GUI stays nice and responsive.

I am a huge fan of this approach for XML parsing with Qt. Let me know if you know of a better way, by leaving a comment below.

4 comments to “Using Xerces-C++ Progressive Scan with Qt”

You can leave a reply or Trackback this post.
  1. http://Byron%20Clark says: -#1

    Note that the 0 ms timeout on the QTimer really means run whenever idle. I always wondered how to do that in QT.

  2. It should probably be noted that “idle” in Qt has a different meaning than in, say, wxWidgets. If you specify a 0 ms interval on a QTimer, the timer will fire over and over, even if the GUI is not idle. Qt will squeeze timer events in whenever it can, even between paint events and user input events. In wxWidgets, the idle event only fires once the application becomes idle, and does not fire again until the app goes from non-idle to idle.

  3. http://Byron%20Clark says: -#1

    I’m not sure about that definition, unless it’s a QT4 thing. The QT3 docs for QTimer say:

    “As a special case, a QTimer with timeout 0 times out as soon as all the events in the window system’s event queue have been processed.”

    That seems pretty consistent with what other toolkits count as idle time.

  4. http://Sajjad says: -#1

    Quite a good suggestion

    I was wondering if Xerces could be used to generate xml content from the GUI component…like the SceneGraph Structure in the GUI representation