There are many different examples on how to parse an XML document into an array with PHP. What mine is different with is that it:
- is very memory efficient by using PHP references (similar to pointers in C)
- uses no recursion, thus there is no limit on the XML subtree levels
- is very strict and paranoid about correctness
The parsing is done using XML Parser.
An example input XML data follows:
<?xml version="1.0" encoding="ISO-8859-1"?> <root> <first_item>Test 1st item</first_item> <first_level_nested> <item idx="0">value #1</item> <item idx="1">value #2</item> <second_level_nested> <item idx="0">value #3</item> <item idx="1">value #4</item> </second_level_nested> </first_level_nested> <second_item>Test 2nd item</second_item> </root>
There is one specific hack here. Since XML allows it to have an element with the same name multiple times on the same subtree level (see <item> on lines #05, #06, #08, #09), and at the same time it does not allow to have an element with only numeric name, we need to make the following exception for arrays which have numeric indexes:
- If an element is named <item>, and it has an attribute named “idx”, then we will use this attribute as name, and respectively array key.
This is handled in the XmlCallback() class, method startElement(), lines #44, #45, #46, which are also highlighted. You can see the sources at the end of the article.
XML also allows it that an element contains both DATA and sub-elements. This cannot be parsed into a PHP array, and will result in an Exception.
The parsed PHP array would look like as follows:
Array ( [root] => Array ( [first_item] => Test 1st item [first_level_nested] => Array ( [0] => value #1 [1] => value #2 [second_level_nested] => Array ( [0] => value #3 [1] => value #4 ) ) [second_item] => Test 2nd item ) )
If you liked the results, you can download the sources which follow (click “show source” below):
<?php function xml_decode($output) { $xml_parser = xml_parser_create(); $xml_callback = new XmlCallback(); if (!xml_set_element_handler( $xml_parser, array($xml_callback, 'startElement'), array($xml_callback, 'endElement') )) throw new Exception('xml_set_element_handler() failed'); if (!xml_set_character_data_handler($xml_parser, array($xml_callback, 'data'))) { throw new Exception('xml_set_character_data_handler() failed'); } if (!xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 0)) { throw new Exception('xml_parser_set_option() failed'); } if (!xml_parse($xml_parser, $output, TRUE)) { $xml_error = sprintf( "%s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser) ); throw new Exception("XML error: $xml_error\nXML data: $output"); } xml_parser_free($xml_parser); return $xml_callback->getResult(); } class XmlCallback { private $ret = null; /* assign and use references directly to the array, or else you'll be in trouble */ private $ptr_stack = array(); private $level = 0; public function __construct() { $this->ptr_stack[$this->level] =& $this->ret; } public function startElement($parser, $name, $attrs) { if ($name == 'item' && isset($attrs['idx'])) { $name = $attrs['idx']; /* reconstruct arrays with numeric indexes */ } if (!isset($this->ptr_stack[$this->level])) { $this->ptr_stack[$this->level] = array(); $this->ptr_stack[$this->level][$name] = null; } else { if (!is_array($this->ptr_stack[$this->level])) { if (!strlen(trim($this->ptr_stack[$this->level]))) { /* if until now we got only whitespace (thus scalar data), but now we start a nested elements structure, discard this whitespace, as it is most probably just space between the element tags */ $this->ptr_stack[$this->level] = array(); } else { throw new Exception('Mixed array and scalar data'); } } if (isset($this->ptr_stack[$this->level][$name])) { /* isset() == (isset() && !is_null()) */ throw new Exception("Duplicate element name: $name"); } } /* array_push() */ ++$this->level; $this->ptr_stack[$this->level] =& $this->ptr_stack[$this->level-1 /* MINUS ONE! */][$name]; } public function endElement($parser, $name) { if (!array_key_exists($this->level, $this->ptr_stack)) { throw new Exception('XML non-existing reference'); } /* array_pop() */ unset($this->ptr_stack[$this->level]); --$this->level; if ($this->level < 0) throw new Exception('XML stack underflow'); } public function data($parser, $data) { if (is_array($this->ptr_stack[$this->level])) { if (strlen(trim($data))) { # check if this is just whitespace throw new Exception('Mixed array and scalar data'); } else { /* we tolerate AND skip whitespace, if we're already in a nested elements structure, as this whitespece is most probably just space between the element tags */ return; } } if (is_null($this->ptr_stack[$this->level])) { $this->ptr_stack[$this->level] = ''; /* first data input */ } $this->ptr_stack[$this->level] .= $data; /* we may be called several times, in chunks */ } public function getResult() { return $this->ret; } }
Update, 20/Jul/2011: The source code was modified to handle white-space better, in order to fix the following tricky sample XML input: <item6> & < </item6>
Update, 30/Jul/2011: Another bugfix which handles empty responses like: <response/>
References:
- There are plenty of other (similar) solutions out there:
September 10, 2012 at 8:08 pm
Example of converting xml data into an array php http://alex-kurilov.blogspot.com/2012/09/php-xml-to-array-function-example.html#.UE4drewxTsM
September 11, 2012 at 8:20 am
Alex, your example returns a flat array structure, while we want a nested one, in order for the array to be as close as possible to the XML structure.
February 5, 2013 at 8:17 am
I would recommend fixing the duplicate name problem. That is the only thing I could find wrong with your script. I tried using it with the response result sent back from the USPS API and it broke on multiple SpecialServices. Being that I can’t control the results and there being duplicate names without index, I couldn’t use your (otherwise nice) script. Just a thought, but maybe try counting the number of duplicate names in a result set and setting an index to them yourself?
February 7, 2013 at 10:57 am
I had a different idea when creating this PHP example. The purpose is to parse an XML document into a PHP array where keys of the PHP array use the very same names as the XML element names. The exception “data” was introduced only because XML does not support an element with numeric name, like “data”.
Therefore, if you want to parse an XML document where you have duplicate names for the XML elements, mine implementation won’t work out of the box. I’m sorry that I don’t have time to develop a version which suits your needs.