Reading and Writing XML Files in Python (2024)

XML, or Extensible Markup Language, is a markup-language that is commonly used to structure, store, and transfer data between systems. While not as common as it used to be, it is still used in services like RSS and SOAP, as well as for structuring files like Microsoft Office documents.

With Python being a popular language for the web and data analysis, it's likely you'll need to read or write XML data at some point, in which case you're in luck.

Throughout this article we'll primarily take a look at the ElementTree module for reading, writing, and modifying XML data. We'll also compare it with the older minidom module in the first few sections so you can get a good comparison of the two.

The XML Modules

The minidom, or Minimal DOM Implementation, is a simplified implementation of the Document Object Model (DOM). The DOM is an application programming interface that treats XML as a tree structure, where each node in the tree is an object. Thus, the use of this module requires that we are familiar with its functionality.

The ElementTree module provides a more "Pythonic" interface to handling XMl and is a good option for those not familiar with the DOM. It is also likely a better candidate to be used by more novice programmers due to its simple interface, which you'll see throughout this article.

In this article, the ElementTree module will be used in all examples, whereas minidom will also be demonstrated, but only for counting and reading XML documents.

XML File Example

In the examples below, we will be using the following XML file, which we will save as "items.xml":

<data> <items> <item name="item1">item1abc</item> <item name="item2">item2abc</item> </items></data>

As you can see, it's a fairly simple XML example, only containing a few nested objects and one attribute. However, it should be enough to demonstrate all of the XML operations in this article.

Reading XML Documents

Using minidom

In order to parse an XML document using minidom, we must first import it from the xml.dom module. This module uses the parse function to create a DOM object from our XML file. The parse function has the following syntax:

xml.dom.minidom.parse(filename_or_file[, parser[, bufsize]])

Here the file name can be a string containing the file path or a file-type object. The function returns a document, which can be handled as an XML type. Thus, we can use the function getElementByTagName() to find a specific tag.

Since each node can be treated as an object, we can access the attributes and text of an element using the properties of the object. In the example below, we have accessed the attributes and text of a specific node, and of all nodes together.

from xml.dom import minidom# parse an xml file by namemydoc = minidom.parse('items.xml')items = mydoc.getElementsByTagName('item')# one specific item attributeprint('Item #2 attribute:')print(items[1].attributes['name'].value)# all item attributesprint('\nAll attributes:')for elem in items: print(elem.attributes['name'].value)# one specific item's dataprint('\nItem #2 data:')print(items[1].firstChild.data)print(items[1].childNodes[0].data)# all items dataprint('\nAll item data:')for elem in items: print(elem.firstChild.data)

The result is as follows:

$ python minidomparser.py Item #2 attribute:item2All attributes:item1item2Item #2 data:item2abcitem2abcAll item data:item1abcitem2abc

Figure 1

If we wanted to use an already-opened file, can just pass our file object to parse like so:

datasource = open('items.xml')# parse an open filemydoc = parse(datasource)

Also, if the XML data was already loaded as a string then we could have used the parseString() function instead.

Using ElementTree

ElementTree presents us with an very simple way to process XML files. As always, in order to use it we must first import the module. In our code we use the import command with the as keyword, which allows us to use a simplified name (ET in this case) for the module in the code.

Following the import, we create a tree structure with the parse function, and we obtain its root element. Once we have access to the root node we can easily traverse around the tree, because a tree is a connected graph.

Using ElementTree, and like the previous code example, we obtain the node attributes and text using the objects related to each node.

The code is as follows:

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# one specific item attributeprint('Item #2 attribute:')print(root[0][1].attrib)# all item attributesprint('\nAll attributes:')for elem in root: for subelem in elem: print(subelem.attrib)# one specific item's dataprint('\nItem #2 data:')print(root[0][1].text)# all items dataprint('\nAll item data:')for elem in root: for subelem in elem: print(subelem.text)

The result will be as follows:

$ python treeparser.py Item #2 attribute:item2All attributes:item1item2Item #2 data:item2abcAll item data:item1abcitem2abc

Figure 2

As you can see, this is very similar to the minidom example. One of the main differences is that the attrib object is simply a dictionary object, which makes it a bit more compatible with other Python code. We also don't need to use value to access the item's attribute value like we did before.

You may have noticed how accessing objects and attributes with ElementTree is a bit more Pythonic, as we mentioned before. This is because the XML data is parsed as simple lists and dictionaries, unlike with minidom where the items are parsed as custom xml.dom.minidom.Attr and "DOM Text nodes".

Counting the Elements of an XML Document

Using minidom

As in the previous case, the minidom must be imported from the dom module. This module provides the function getElementsByTagName, which we'll use to find the tag item. Once obtained, we use the len() built-in method to obtain the number of sub-items connected to a node. The result obtained from the code below is shown in Figure 3.

from xml.dom import minidom# parse an xml file by namemydoc = minidom.parse('items.xml')items = mydoc.getElementsByTagName('item')# total amount of itemsprint(len(items))
$ python counterxmldom.py2

Figure 3

Keep in mind that this will only count the number of children items under the note you execute len() on, which in this case is the root node. If you want to find all sub-elements in a much larger tree, you'd need to traverse all elements and count each of their children.

Using ElementTree

Similarly, the ElementTree module allows us to calculate the amount of nodes connected to a node.

Example code:

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# total amount of itemsprint(len(root[0]))

The result is as follows:

$ python counterxml.py2

Figure 4

Writing XML Documents

Using ElementTree

ElementTree is also great for writing data to XML files. The code below shows how to create an XML file with the same structure as the file we used in the previous examples.

The steps are:

  1. Create an element, which will act as our root element. In our case the tag for this element is "data".
  2. Once we have our root element, we can create sub-elements by using the SubElement function. This function has the syntax:

SubElement(parent, tag, attrib={}, **extra)

Here parent is the parent node to connect to, attrib is a dictionary containing the element attributes, and extra are additional keyword arguments. This function returns an element to us, which can be used to attach other sub-elements, as we do in the following lines by passing items to the SubElement constructor.
3. Although we can add our attributes with the SubElement function, we can also use the set() function, as we do in the following code. The element text is created with the text property of the Element object.
4. In the last 3 lines of the code below we create a string out of the XML tree, and we write that data to a file we open.

Example code:

import xml.etree.ElementTree as ET# create the file structuredata = ET.Element('data')items = ET.SubElement(data, 'items')item1 = ET.SubElement(items, 'item')item2 = ET.SubElement(items, 'item')item1.set('name','item1')item2.set('name','item2')item1.text = 'item1abc'item2.text = 'item2abc'# create a new XML file with the resultsmydata = ET.tostring(data)myfile = open("items2.xml", "w")myfile.write(mydata)

Executing this code will result in a new file, "items2.xml", which should be equivalent to the original "items.xml" file, at least in terms of the XML data structure. You'll probably notice that it the resulting string is only one line and contains no indentation, however.

Finding XML Elements

Using ElementTree

The ElementTree module offers the findall() function, which helps us in finding specific items in the tree. It returns all items with the specified condition. In addition, the module has the function find(), which returns only the first sub-element that matches the specified criteria. The syntax for both of these functions are as follows:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

findall(match, namespaces=None)
find(match, namespaces=None)

For both of these functions the match parameter can be an XML tag name or a path. The function findall() returns a list of elements, and find returns a single object of type Element.

In addition, there is another helper function that returns the text of the first node that matches the given criterion:

findtext(match, default=None, namespaces=None)

Here is some example code to show you exactly how these functions operate:

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# find the first 'item' objectfor elem in root: print(elem.find('item').get('name'))# find all "item" objects and print their "name" attributefor elem in root: for subelem in elem.findall('item'): # if we don't need to know the name of the attribute(s), get the dict print(subelem.attrib) # if we know the name of the attribute, access it directly print(subelem.get('name'))

And here is the reuslt of running this code:

$ python findtree.py item1{'name': 'item1'}item1{'name': 'item2'}item2

Figure 5

Modifying XML Elements

Using ElementTree

The ElementTree module presents several tools for modifying existing XML documents. The example below shows how to change the name of a node, change the name of an attribute and modify its value, and how to add an extra attribute to an element.

A node text can be changed by specifying the new value in the text field of the node object. The attribute's name can be redefined by using the set(name, value) function. The set function doesn't have to just work on an existing attribute, it can also be used to define a new attribute.

The code below shows how to perform these operations:

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# changing a field textfor elem in root.iter('item'): elem.text = 'new text'# modifying an attributefor elem in root.iter('item'): elem.set('name', 'newitem')# adding an attributefor elem in root.iter('item'): elem.set('name2', 'newitem2')tree.write('newitems.xml')

After running the code, the resulting XML file "newitems.xml" will have an XML tree with the following data:

<data> <items> <item name="newitem" name2="newitem2">new text</item> <item name="newitem" name2="newitem2">new text</item> </items></data>

As we can see when comparing with the original XML file, the names of the item elements have changed to "newitem", the text to "new text", and the attribute "name2" has been added to both nodes.

You may also notice that writing XML data in this way (calling tree.write with a file name) adds some more formatting to the XML tree so it contains newlines and indentation.

Creating XML Sub-Elements

Using ElementTree

The ElementTree module has more than one way to add a new element. The first way we'll look at is by using the makeelement() function, which has the node name and a dictionary with its attributes as parameters.

The second way is through the SubElement() class, which takes in the parent element and a dictionary of attributes as inputs.

In our example below we show both methods. In the first case the node has no attributes, so we created an empty dictionary (attrib = {}). In the second case, we use a populated dictionary to create the attributes.

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# adding an element to the root nodeattrib = {}element = root.makeelement('seconditems', attrib)root.append(element)# adding an element to the seconditem nodeattrib = {'name2': 'secondname2'}subelement = root[0][1].makeelement('seconditem', attrib)ET.SubElement(root[1], 'seconditem', attrib)root[1][0].text = 'seconditemabc'# create a new XML file with the new elementtree.write('newitems2.xml')

After running this code the resulting XML file will look like this:

<data> <items> <item name="item1">item1abc</item> <item name="item2">item2abc</item> </items> <seconditems> <seconditem name2="secondname2">seconditemabc</seconditem> </seconditems></data>

As we can see when comparing with the original file, the "seconditems" element and its sub-element "seconditem" have been added. In addition, the "seconditem" node has "name2" as an attribute, and its text is "seconditemabc", as expected.

Deleting XML Elements

Using ElementTree

As you'd probably expect, the ElementTree module has the necessary functionality to delete node's attributes and sub-elements.

Deleting an attribute

The code bellow shows how to remove a node's attribute by using the pop() function. The function applies to the attrib object parameter. It specifies the name of the attribute and sets it to None.

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# removing an attributeroot[0][0].attrib.pop('name', None)# create a new XML file with the resultstree.write('newitems3.xml')

The result will be the following XML file:

<data> <items> <item>item1abc</item> <item name="item2">item2abc</item> </items></data>

As we can see in the XML code above, the first item has no attribute "name".

Deleting one sub-element

One specific sub-element can be deleted using the remove function. This function must specify the node that we want to remove.

The following example shows us how to use it:

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# removing one sub-elementroot[0].remove(root[0][0])# create a new XML file with the resultstree.write('newitems4.xml')

The result will be the following XML file:

<data> <items> <item name="item2">item2abc</item> </items></data>

As we can see from the XML code above, there is now only one "item" node. The second one has been removed from the original tree.

Deleting all sub-elements

The ElementTree module presents us with the clear() function, which can be used to remove all sub-elements of a given element.

The example below shows us how to use clear():

import xml.etree.ElementTree as ETtree = ET.parse('items.xml')root = tree.getroot()# removing all sub-elements of an elementroot[0].clear()# create a new XML file with the resultstree.write('newitems5.xml')

The result will be the following XML file:

<data> <items /></data>

As we can see in the XML code above, all sub-elements of the "items" element have been removed from the tree.

Wrapping Up

Python offers several options to handle XML files. In this article we have reviewed the ElementTree module, and used it to parse, create, modify and delete XML files. We have also used the minidom model to parse XML files. Personally, I'd recommend using the ElementTree module as it is much easier to work with and is the more modern module of the two.

Reading and Writing XML Files in Python (2024)

FAQs

Can you read XML files in Python? ›

To read XML files in Python, you can use the xml. etree. ElementTree module, which provides a simple and efficient API for parsing and creating XML data.

How to write an XML file in Python? ›

In the code below we have to create a new XML file from the scratch. Now at first, using ET. Element('chess'), we will make a parent tag (root) which will be under the chess. Now when the root is defined, the other subtype elements are made under that root tag.

How to deal with XML files in Python? ›

Reading and Writing XML Files in Python – FAQs
  1. To read XML files in Python, you can use the xml. etree. ...
  2. Writing an XML file can also be done using the xml.etree.ElementTree module. ...
  3. To convert XML to CSV in Python, you can parse the XML using ElementTree and then use the csv module to write the parsed data to a CSV file.
Aug 10, 2024

Can Python handle XML? ›

Python's interfaces for processing XML are grouped in the xml package. The XML modules are not secure against erroneous or maliciously constructed data. If you need to parse untrusted or unauthenticated data see the XML vulnerabilities and The defusedxml Package sections.

Why is JSON better than XML? ›

Ease of use

As a markup language, XML is more complex and requires a tag structure. In contrast, JSON is a data format that extends from JavaScript. It does not use tags, which makes it more compact and easier to read for humans. JSON can represent the same data in a smaller file size for faster data transfer.

How to read XML files? ›

Right-click the XML file and select "Open With." This will display a list of programs to open the file in. Select "Notepad" (Windows) or "TextEdit" (Mac). These are the pre-installed text editors for each operating system and should already be on the list. Any basic text editors will work.

How do I start writing XML? ›

How to Create an XML File
  1. Open your text editor of choice.
  2. On the first line, write an XML declaration.
  3. Set your root element below the declaration.
  4. Add your child elements within the root element.
  5. Review your file for errors.
  6. Save your file with the . ...
  7. Test your file by opening it in the browser window.
Jun 3, 2024

What language is XML? ›

Extensible Markup Language (XML) is a markup language that provides rules to define any data.

How to extract XML data using Python? ›

Load our XML document into memory, and construct an XML ElementTree object. We then use the find method, passing in an XPath selector, which allows us to specify what element we're trying to extract. If the element can't be found, None is returned. If the element can be found, then we'll use the .

How do I write data in an XML file? ›

Creating data in XML format
  1. Open an XML or text editor.
  2. In the first line of your file, add the following code to define your file as an XML file: ...
  3. After the line that defines your file as an XML file, specify an element for the business object type that you are loading information about.

How to read XML as string in Python? ›

Start by importing the lxml library and parsing your XML file using the parse function. You can also parse an XML string using the fromstring() function. After parsing the XML, use the getroot() method to retrieve the root element. The root tag travelPackages is extracted!

How to read XML file using Python? ›

To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .

Why we use XML file in Python? ›

Extensible Markup Language (XML) is a markup language which encodes documents by defining a set of rules in both machine-readable and human-readable format. Extended from SGML (Standard Generalized Markup Language), it lets us describe the structure of the document.

How to read and write a data set in XML? ›

To write the schema information from the DataSet (as XML Schema) to a string, use GetXmlSchema. To write a DataSet to a file, stream, or XmlWriter, use the WriteXml method. The first parameter you pass to WriteXml is the destination of the XML output. For example, pass a string containing a file name, a System.

How can I read the data from an XML file? ›

If all you need to do is view the data in an XML file, you're in luck. Just about every browser can open an XML file. In Chrome, just open a new tab and drag the XML file over. Alternatively, right click on the XML file and hover over "Open with" then click "Chrome".

How to read XML file to CSV in Python? ›

Approach
  1. Import module.
  2. Declare rows and columns for the data to arranged in csv file.
  3. Load xml file.
  4. Parse xml file.
  5. Write each row to csv file one by one.
  6. Save csv file.
Mar 21, 2024

How to read XML file as string in Python? ›

Parsing XML with lxml

Start by importing the lxml library and parsing your XML file using the parse function. You can also parse an XML string using the fromstring() function. After parsing the XML, use the getroot() method to retrieve the root element. The root tag travelPackages is extracted!

Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 5531

Rating: 4.2 / 5 (73 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.