{"id":26,"date":"2006-11-12T18:18:49","date_gmt":"2006-11-12T22:18:49","guid":{"rendered":"http:\/\/thesmithfam.org\/blog\/2006\/11\/12\/using-xerces-c-progressive-scan-with-qt\/"},"modified":"2019-08-12T07:16:33","modified_gmt":"2019-08-12T13:16:33","slug":"using-xerces-c-progressive-scan-with-qt","status":"publish","type":"post","link":"https:\/\/thesmithfam.org\/blog\/2006\/11\/12\/using-xerces-c-progressive-scan-with-qt\/","title":{"rendered":"Using Xerces-C++ Progressive Scan with Qt"},"content":{"rendered":"<p>Parsing XML is fun again with Xerces progressive scan.<\/p>\n<p>Do you have large XML files that you need to parse in your GUI application? Are you sick of your XML parser blocking the event loop while it laboriously parses all that XML? Don&#8217;t waste another day! Xerces-C++ progressive scan was made for you, my friend.<br \/>\n<!--more--><br \/>\nA typical XML parser call looks like this:<\/p>\n<style> tt { font-size: 12pt; color: #005; } <\/style>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nparser-&gt;setFile( &quot;foo.xml&quot; );\r\nparser-&gt;parse();\r\n<\/pre>\n<p>The <tt>parse()<\/tt> function will typically block until the XML file has been totally parsed. Both DOM and SAX approaches behave this way, generally.<\/p>\n<p>This is a huge limitation for GUI developers, because this will cause your GUI to become unresponsive (i.e., white out) for the duration of the <tt>parse()<\/tt> call. Many GUI developers choose to spawn a thread to parse the XML outside of the event loop, but this introduces its own set of problems. A great way to keep your GUI responsive and avoid threads is to use Xerces Progressive Scan. This allows you to read a small chunk of the XML file at a time, and return to the event loop between each chunk. By repeating this until the file is completely read, you can avoid blocking the event loop. In Qt, the easiest way to do this is with a <tt>QTimer<\/tt>.<\/p>\n<p>Let&#8217;s create an example using a Xerces SAX parser. To do this, create a class inherited from <tt>QObject<\/tt> that will be responsible for XML parsing. Give it a slot called <tt>readSomeMore()<\/tt>, maybe something like this (note that I&#8217;ve ommitted the <tt>#include<\/tt> and <tt>namespace<\/tt> lines for brevity):<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nclass MyXmlParser : public QObject, public DefaultHandler\r\n{\r\n    public:\r\n        MyXmlParser( const QString &amp;fileName, QObject *parent );\r\n        void parse();\r\n        void startElement (const XMLCh *const uri, const XMLCh *const localname,\r\n                                  const XMLCh *const qname, const Attributes &amp;attrs);\r\n\r\n    signals:\r\n        void finished();\r\n\r\n    private slots:\r\n        void readSomeMore();\r\n\r\n    private:\r\n        XMLPScanToken token;\r\n        QTimer *timer;\r\n        const char *fileName;\r\n        SAX2XMLReader* parser\r\n};\r\n<\/pre>\n<p>Now let&#8217;s look at the two most important functions, <tt>parse()<\/tt> and <tt>readSomeMore()<\/tt>. The rest of the class is boiler plate stuff that you should already be familiar with if you are using Xerces (or any SAX parser for that matter). The <tt>parse()<\/tt> function will be called by users to start the parsing process, but since the whole parsing process happens asynchronously, it will return immediately. This is good, since this is the same way <tt>QSocket<\/tt>&#8216;s <tt>connectToHost()<\/tt> works and a whole slew of other Qt classes.<\/p>\n<p>The <tt>parse()<\/tt> function will be responsible for creating the Xerces XML parser, and initializing it. Then, it will advance to the first element in the XML file and save off the <tt>XMLPScanToken<\/tt>. Lastly, it will start the timer, which will continue the parsing until it is finished.<\/p>\n<p>Here&#8217;s the <tt>parse()<\/tt> function:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nvoid MyXmlParser::parse()\r\n{\r\n    \/\/ Create the Xerces parser and set ourselves up as the handler:\r\n    parser = XMLReaderFactory::createXMLReader();\r\n    parser-&gt;setContentHandler(this);\r\n\r\n    \/\/ Parse the first XML element, and if successful, start the parsing timer\r\n    \/\/ to work in the background:\r\n    if( parser.parseFirst( fileName, token ) )\r\n    {\r\n        \/\/ Kick off the timer, and tell it to start reading.\r\n        timer = new QTimer( this );\r\n        connect( timer, SIGNAL(timeout()), this, SLOT(readSomeMore()) );\r\n\r\n        \/\/ Start it with a 0 ms interval, to make it go FAST:\r\n        timer-&gt;start( 0 );\r\n    }\r\n    else\r\n    {\r\n        \/\/ An error occurred, so tell everyone that we are finished:\r\n        emit finished();\r\n    }\r\n}\r\n<\/pre>\n<p>Side note: Notice that this implementation of <tt>parse()<\/tt> ignores error conditions. A good way to error handling here is to make the <tt>finished()<\/tt> signal emit a <tt>bool<\/tt> to indicate an error. Then add an <tt>errorString()<\/tt> funcion that returns a human-readable error message (suitable for a <tt>QMessageBox<\/tt> or equivelant) for callers to figure out what went wrong.<\/p>\n<p>So the <tt>parse()<\/tt> function just started a timer that is going to call <tt>readSomeMore()<\/tt>, which is responsible for reading a little bit of XML over and over, and then killing the timer when it&#8217;s done. It should look something like this:<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nvoid MyXmlParser::readSomeMore()\r\n{\r\n    \/\/ Read 100 elements:\r\n    int count = 0;\r\n    while( count++ &lt; 100 )\r\n    {\r\n        \/\/ When parseNext() returns false, the XML file is compeltely read,\r\n        \/\/ so kill the timer and emit our finished signal.\r\n        if( ! parser-&gt;parseNext() )\r\n        {\r\n            timer-&gt;stop();\r\n            emit finished();\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<p>And that&#8217;s it. You can experiment with the number of elements you parse in each invocation of <tt>readSomeMore()<\/tt> to see where your best performance lies. I have found that between 100 and 500 gets pretty good performance.<\/p>\n<p>I&#8217;ve also found that I can do this with about 100 different XML files simultaneously, each with their own instance of <tt>MyXmlParser<\/tt>, and the GUI stays nice and responsive.<\/p>\n<p>I am a huge fan of this approach for XML parsing with Qt. Let me know if you know of a better way, by leaving a comment below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Parsing XML is fun again with Xerces progressive scan. Do you have large XML files that you need to parse in your GUI application? Are you sick of your XML parser blocking the event loop while it laboriously parses all that XML? Don&#8217;t waste another day! Xerces-C++ progressive scan was made for you, my friend.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-26","post","type-post","status-publish","format-standard","hentry","category-code-and-cruft"],"_links":{"self":[{"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/posts\/26","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/comments?post=26"}],"version-history":[{"count":1,"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/posts\/26\/revisions"}],"predecessor-version":[{"id":1605,"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/posts\/26\/revisions\/1605"}],"wp:attachment":[{"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/media?parent=26"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/categories?post=26"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thesmithfam.org\/blog\/wp-json\/wp\/v2\/tags?post=26"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}