How to extract text from DOCX or ODT files using PHP

Are you searching for a method to extract text from DOCX or ODT files using PHP? Well in this article I will show you how to do so. This technique can be used to create a web crawler and index document files based upon their content i.e. this can be used to create a document repository. The technique here doesn't involve any third party plugins or softwares. It will work in PHP 5.2+ and the only requirement is php_zip.dll for Windows or --enable-zip parameter for Linux. Actually the DOCX and ODT files are archive files whose extension has been changed from .zip to .docx or .odt. Hence we need a ZIP library for PHP in order to extract the data from them.

You can verify this fact yourself. Just try to open any docx or odt file with a ZIP utility. Check out the screenshot below -

DOCX file structure

The text data is present in word/document.xml for DOCX and in Content.xml for ODT file. In order to extract the text all we need to do is that get the contents of word/document.xml (for docx file) or content.xml (for odt file) and then display its content after filtering out XML tags present in it.

Create a new PHP file and name it as extract.php and add the following code it -


/*Name of the document file*/
$document = 'attractive_prices.docx';

/**Function to extract text*/
function extracttext($filename) {
    //Check for extension
    $ext = end(explode('.', $filename));

    //if its docx file
    if($ext == 'docx')
	$dataFile = "word/document.xml";
    //else it must be odt file
	$dataFile = "content.xml";     
    //Create a new ZIP archive object
    $zip = new ZipArchive;

    // Open the archive file
    if (true === $zip->open($filename)) {
        // If successful, search for the data file in the archive
        if (($index = $zip->locateName($dataFile)) !== false) {
            // Index found! Now read it to a string
            $text = $zip->getFromIndex($index);
            // Load XML from a string
            // Ignore errors and warnings
            // Remove XML formatting tags and return the text
            return strip_tags($xml->saveXML());
        //Close the archive file

    // In case of failure return a message
    return "File not found";

echo extracttext($document);

Comments in the code snippet should easily help you to understand it.



Member since:
7 August 2017
Last activity:
2 years 34 weeks

kanye west shoes

michael kors bags

ua shoes

ray ban erika

prada handbags

payless shoes online

official kate spade website

coach outlet online

pandora rings


pandora rings uk

Coach Sunglasses Outlet

balenciaga outlet

Engagement Rings - Official

Nike Air Max 180

official michael kors

Ray Ban sunglasses

shoes sale

Nike Air Max

Air Max 180

burberry outlet

Dreamland Jewelry - Official

birkenstock sandals


Nike Air Max Couple

coach outlet online

prada outlet online

louis vuitton outlet online

Nike Air Jordan

gel kayano

official michael kors

michael kors outlet bags

Hugo Boss Store

payless shoes

Premier Jewelry - Official

ray ban outlet

Coach Handbags

pandora sale

coach watches

nike store

football jerseys

louis vuitton outlet online

coach outlet online sale

Add Your Link Free


louis vuitton us

michael kors

Nike Air Jordan Enfants

michael kors

Michael Kors Handbags Outlet Sale

kate spade outlet online

Green Cleaned

ugg boots


Nike Blazer Femmes



coach outlet

birkenstock outlet

Burberry Australia

payless near me

Chanel Outlet

north face sale

Nike Air Max

Prada Shoes - Official

michael kors outlet

ray ban wayfarer sunglasses

Oakley Sunglasses

puma sneakers outlet

payless shoes

nike free shoes

diesel watches

burberry outlet canada

prada sale

puma sneakers sale

louis vuitton outlet

Air Max 180 Homme

Jewelry Armoire - Official

coach handbags sale

Hermes Outlet

timberland shoes

Nike Air Max 2017

coachs outlet

louis vuitton outlet online

Nike Jordans

football jerseys

burberry outlet sale

Nike Free 3.0

michael kors outlet online

ray ban wayfarer

ray ban sunglasses

Rolex Watch

michael kors

michael kors

nike huarache shoes

ray ban sale


pandora necklaces

kate spade italy

coach australia

pandora uk

pandora uk

swatch watch

michael kors watches

michael kors

rayban prescription sunglasses

Brighton Jewelry - Official

prada outlet online

Nike Air Force

michael kors outlet

fulam shoes

softball jerseys

Nike Air Max

balenciaga us

Air Max 1 Homme

michael kors


Air Jordan Enfant

chrome store canada

sunglasses store

coach outlet

michael kors outlet

Nike Free Run

mk outlet online

michael kors outlet

the beat

seiko watches

michael kors


Air Max 90 Femme

hermes outlet

Nike Air Jordan enfants

hermes outlet

prada purses

kate spade outlet

nike jordan shoes

me adc8.7