Read and Write XML Files with Python (2024)

Read and Write XML Files with Python (1) Raymond

Read and Write XML Files with Python (2)

Raymond

articleArticles 578

imageDiagrams 58

commentComments 303

loyaltyKontext Points 6373

| Python Programming

XML is a commonly used data exchange format in many applications and systems though JSON became more popular nowadays. Compared with JSON, XML supports schema (XSD) validation and can be easily transformed other formats using XSLT. XHTML is also a strict version of HTML that is XML based and used in many websites.

This article provides some examples of using Python to read and write XML files.

Example XML file

Create a sample XML file named test.xml with the following content:

<?xml version="1.0"?>
<data>
<record id="1">
<rid>1</rid>
<name>Record 1</name>
</record>
<record id="2">
<rid>2</rid>
<name>Record 2</name>
</record>
<record id="3">
<rid>3</rid>
<name>Record 3</name>
</record>
</data>

Use XML DOM model

As many other programming languages, XML DOM is commonly used to parse and to manipulate XML files.

More than decades ago when I started coding with C#, XmlDocument was a commonly used class to manipulate XML data. In Python, there are also DOM model implementations like package minidom even though it is a minimal implementation of the Document Object Model interface.

Read XML document

The following code snippet reads the attributes from the document. It first creates a DOM object and then finds all the record elements from document root element. For each element, is parsed as a dictionary record. When parsing record element, several objects are used: Attr, Text and Element. All these elements are inherited from Node class.

from xml.dom.minidom import parseimport osimport pandas as pddir_path = os.path.dirname(os.path.realpath(__file__))data_records = []with parse(f'{dir_path}/test.xml') as xml_doc: root = xml_doc.documentElement records = root.getElementsByTagName('record') for r in records: data = {} id_node = r.getAttributeNode('id') data['id'] = id_node.value el_rid = r.getElementsByTagName('rid')[0] data['rid'] = el_rid.firstChild.data el_name = r.getElementsByTagName('name')[0] data['name'] = el_name.firstChild.data data_records.append(data)print(data_records)df = pd.DataFrame(data_records)print(df)

Output:

[{'id': '1', 'rid': '1', 'name': 'Record 1'}, {'id': '2', 'rid': '2', 'name': 'Record 2'}, {'id': '3', 'rid': '3', 'name': 'Record 3'}] id rid name0 1 1 Record 11 2 2 Record 22 3 3 Record 3

Write XML document

We can also use DOM object to write XML data in memory to files.

The following code snippet adds a new attribute for each record element in the previous example and then save the new XML document to a new file.

with open(f"{dir_path}/test_new.xml", "w") as new_xml_handle: with parse(f'{dir_path}/test.xml') as xml_doc: root = xml_doc.documentElement records = root.getElementsByTagName('record') for r in records: r.setAttribute('testAttr', 'New Attribute') xml_doc.writexml(new_xml_handle)

The new file text_new.xml has the following content after the script is run:

<?xml version="1.0" ?><data> <record id="1" testAttr="New Attribute"> <rid>1</rid> <name>Record 1</name> </record> <record id="2" testAttr="New Attribute"> <rid>2</rid> <name>Record 2</name> </record> <record id="3" testAttr="New Attribute"> <rid>3</rid> <name>Record 3</name> </record></data>

As you can see, attribute node testAttr is added for each element.

Use xml.etree.ElementTree

Another approach is to use xml.etree.ElementTree to read and write XML files.

Read XML file using ElementTree

import xml.etree.ElementTree as ETimport osimport pandas as pddir_path = os.path.dirname(os.path.realpath(__file__))data_records = []tree = ET.parse(f'{dir_path}/test.xml')root = tree.getroot()for r in root.getchildren(): data = {} data['id'] = r.attrib['id'] el_rid = r.find('rid') data['rid'] = el_rid.text el_name = r.find('name') data['name'] = el_name.text data_records.append(data)print(data_records)df = pd.DataFrame(data_records)print(df)

The above scripts first create ElementTree object and then find all 'record' elements through the root element. For each 'record' element, it parses the attributes and child elements. The APIs are very similar to the minidom one but is easier to use.

The output looks like the following:

[{'id': '1', 'rid': '1', 'name': 'Record 1'}, {'id': '2', 'rid': '2', 'name': 'Record 2'}, {'id': '3', 'rid': '3', 'name': 'Record 3'}] id rid name0 1 1 Record 11 2 2 Record 22 3 3 Record 3

Write XML file using ElementTree

To write XML file we can just call the write function.

Example code:

for r in root.getchildren(): r.set('testAttr', 'New Attribute2')tree.write(f'{dir_path}/test_new_2.xml', xml_declaration=True)

Again, the API is simpler compared with minidom. The content of the newly generated file text_new_2.xml is like the following:

<?xml version='1.0' encoding='us-ascii'?><data> <record id="1" testAttr="New Attribute2"> <rid>1</rid> <name>Record 1</name> </record> <record id="2" testAttr="New Attribute2"> <rid>2</rid> <name>Record 2</name> </record> <record id="3" testAttr="New Attribute2"> <rid>3</rid> <name>Record 3</name> </record></data>

For ElementTree.write function, you can specify many optional arguments, for example, encoding, XML declaration, etc.

warningWarning: The xml.etree.ElementTree module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.

Summary

There are many other libraries available in Python to allow you to parse and write XML files. For many of these packages, they are not as fluent or complete as Java or .NET equivalent libraries. ElementTree is the closest one I found so far.

References

Read XML Files as Pandas DataFrame

Read and Write XML Files with Python (2024)

FAQs

Read and Write XML Files with Python? ›

Using ElementTree Library to parse XML files. ElementTree is a widely used built-in python XML parser that provides many functions to read, manipulate and modify XML files. This parser creates a tree-like structure to store the data in a hierarchical format.

Can Python read XML files? ›

Using ElementTree Library to parse XML files. ElementTree is a widely used built-in python XML parser that provides many functions to read, manipulate and modify XML files. This parser creates a tree-like structure to store the data in a hierarchical format.

How to write an XML file using Python? ›

Creating XML Document using Python. First, we import minidom for using xml. dom . Then we create the root element and append it to the XML.

How do I read and write an XML file? ›

Below are the steps to implement:
  1. Specify the XML file path as String and give the file location of XML file.
  2. Create the DocumentBuilder and that can be used to parse the XML file.
  3. After that accesses the elements using TagName.
  4. Once the accesses the element using for loop then print the elements into the XML file.
Feb 2, 2024

How to read and write a data set in XML? ›

To write the schema information from the DataSet (as XML Schema) to a string, use GetXmlSchema. To write a DataSet to a file, stream, or XmlWriter, use the WriteXml method. The first parameter you pass to WriteXml is the destination of the XML output. For example, pass a string containing a file name, a System.

Does Python have a built-in XML parser? ›

The Expat parser is included with Python, so the xml. parsers. expat module will always be available.

What is the best XML parser for Python? ›

The Best Way To Parse XML in Python
  • Simple and built-in: xml. etree. ElementTree.
  • Small to medium documents with DOM support: minidom.
  • Large documents with stream processing: sax.
  • High performance with advanced features: lxml.
  • Easy handling of irregular XML/HTML: BeautifulSoup.
Nov 8, 2023

Does Python support XML? ›

As you can see, even though the default XML parser in Python can't validate documents, it still lets you inspect . doctype , the DTD, if it's present. Note that the XML declaration and DTD are optional. If the XML declaration or a given XML attribute is missing, then the corresponding Python attributes will be None .

How to extract XML data using Python? ›

Load our XML document into memory, and construct an XML ElementTree object. We then use the find method, passing in an XPath selector, which allows us to specify what element we're trying to extract. If the element can't be found, None is returned. If the element can be found, then we'll use the .

How do I read and edit an XML file? ›

If you want to open an XML file and edit it, you can use a text editor. You can use default text editors, which come with your computer, like Notepad on Windows or TextEdit on Mac. All you have to do is locate the XML file, right-click the XML file, and select the "Open With" option.

How to read XML easily? ›

Opening XMLs Fast
  1. Right click the XML file you want to open and hover the cursor over “open with.” Select Notepad to open the text on your computer.
  2. You can open XML files using a browser if your notepad isn't legible.
  3. If the contents of the XML file don't make sense, download an XML reader to interpret the file.

How to read XML file in Python using BeautifulSoup? ›

Below are the steps in which we will see how to extract tables with beautiful soup in Python:
  1. Step 1: Import the Library and Define Target URL.
  2. Step 2: Create Object for Parsing.
  3. Step 3: Locating and Extracting Table Data.
  4. Step 4: Extracting Text from Table Cell.
  5. Complete Code.
  6. Step 1: Creating XML File.
Jan 12, 2024

How to write an XML file in Python? ›

In the code below we have to create a new XML file from the scratch. Now at first, using ET. Element('chess'), we will make a parent tag (root) which will be under the chess. Now when the root is defined, the other subtype elements are made under that root tag.

How to read XML file using Python? ›

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

How to deal with XML in Python? ›

To read XML files in Python, you can use the xml. etree. ElementTree module, which provides a simple and efficient API for parsing and creating XML data.

How to read XML file to CSV in Python? ›

Approach
  1. Import module.
  2. Declare rows and columns for the data to arranged in csv file.
  3. Load xml file.
  4. Parse xml file.
  5. Write each row to csv file one by one.
  6. Save csv file.
Mar 21, 2024

Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 5527

Rating: 4 / 5 (71 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.