W2XML v2.5 Help Documentation

W2XML v2.5 Help Documentation

You may download the PDF version of this document here.
Load frameset.
Software Name: W2XML
Software Description: Converts *.doc, *.rtf, *.htm, *.asp, *.jsp to XML.
Software Version Number: 2.5
Version Release Date: June 1, 2005
Original Release Date: January 17, 2003
Software Author: DocSoft, Inc.
Author Website: http://wwww.docsoft.com/
Author Phone: 1.877.430.3502 or 1.405.236.2466
Documentation Version Number: 2.5.0
Last Updated: Wednesday, June 08, 2005 5:20:36 PM
Support Site: http://www.docsoft.com/productq.aspx
Source Filename: w2xHelp.xml

Introduction

W2XML will convert *.doc, *.rtf, *.asp, *.htm, *.html and *.jsp to well-formed XML. The software can also apply eXtensible Style Language Transformations (XSLTs) to the standard output so that you can match virtually any Schema's tag set.

Customers who use W2XML to convert Word to XML must understand the relatively complex concepts behind conversion from an unstructured format to a structured format. For this reason, DocSoft recommends that customers who are new to structured content or do not have the expertise to develop sometimes complex XSLTs, contract with DocSoft's consulting services to develop XSLTs to convert the standard output to a Schema or tag set for their specific needs.

Features Available in Version 2.5
Features Available in Version 2.4
New parameters available to XSLT processor in Version 2.4
Features Available in Version 2.3
Parameters available to XSLT processor in Version 2.3
Features Added in Version 2.2
Features Added in Version 2.1

System Requirements

Please review the following requirements to ensure optimum application performance and operability:



NOTE:

Customers who use W2XML to convert Word to XML must understand the relatively complex concepts behind conversion from an unstructured format to a structured format. For this reason, DocSoft recommends that customers who are new to structured content or do not have the expertise to develop sometimes complex XSLTs, contract with DocSoft's consulting services to develop XSLTs to convert the standard output to a Schema or tag set for their specific needs.





NOTE:

Some of the information in this help file is dynamically included from the source UAC documentation XML using Eclipse Autopublisher. Therefore, some of the graphics used may have slight differences in terms of files and plug-ins used. This does not affect any procedural information.



UAC Product Description

The Graphical User Interface (GUI) is simple and easy to learn. There are four main areas in the interface (see accompanying Figure 1):

  1. Menu Bar
  2. Toolbar
  3. Workspace List Window
  4. Options Window



Figure 1 - Interface Description

UAC Menu Bar Description

When the GUI is first launched, the Menu Bar contains only three top-level menu options (as shown in Figure 2):

  1. File
  2. Workspace
  3. Help

As soon as you load an application plug-in, the third top-level menu item inherits an application-specific menu item, usually named something similar to the loaded application. The UAC inherently comes packaged with a search-and-replace application called "Replace", which once loaded, applies menu items specific to the Replace application, as shown in Figure 3. Also notice the Options Window has also loaded application specific information as well.




Figure 2 - Standard Menu Bar List Items




Figure 3 - Dynamic Menu Bar Loaded As Third Menu Item

File Menu

The File Menu is used to Create, Open, Save, and Save as Workspaces. You may also exit out of the UAC by choosing the "Exit" menu item.




Figure 4 - File Menu

Workspace Menu

The Workspace Menu is used for actions specifically oriented around Workspace information, such as adding or removing files from a workspace, and/or changing the application plug-in for use within the Workspace.




Figure 5 - Workspace Menu

The Workspace Menu is also used to load an installed application plug-in, using the Change Plug-in... menu item. This launches the Select Plug-in dialog (Figure 7).




Figure 6 - UAC Drop Pad

One of the features in the UAC is the Drop Pad (Figure 6). It is a shortcut that always stays on top, and appears at the lower, right-hand corner of the screen (it can be placed anywhere on the screen). The Drop Pad provides a quick method of adding files to the UAC. Simply drag-and-drop the files you want to add to the current UAC workspace to the Drop Pad.




Figure 7 - Select Plug-in Dialog

Dynamic Application Menu

The Dynamic Application Menu is a menu item that is dynamically generated each time an application plug-in is loaded. It loads application-specific menu items depending on what application plug-in is currently loaded. The example in Figure 8 shows that the standard "Replace" application plug-in has been loaded and thereby shows menu items specific to the "Replace" application.

The one standard menu item that will appear here is the Options... item. Choosing this option on any application plug-in will launch the "Options" dialog box, in which you can modify current associated options within the current workspace.




Figure 8 - Dynamic Application Menu - Replace Application Example

Help Menu

The Help Menu contains items relating to application help, both for the UAC and the installed application plug-ins. The Plug-in Info... item opens the Plug-in Info dialog (Figure 10), which displays information specific to the currently-loaded plug-in, such as Plug-in Name, Version and Valid Files, or files that the plug-in can effect or modify.




Figure 9 - Help Menu




Figure 10 - Plug-in Info Dialog

UAC Toolbar Description

The Toolbar provides a quick, graphical interface to most of the functionality provided by the UAC. The following information describes each button's purpose.

New Workspace Button



Figure 11 - New Workspace Button

The New Workspace button is used to create a new workspace.

Open Workspace Button



Figure 12 - Open Workspace Button

The Open Workspace button is used to open a previously-created workspace.

Save Workspace Button



Figure 13 - Save Workspace Button

The Save Workspace button is used to save a new or modified workspace.

Add Files Button



Figure 14 - Add Files Button

The Add Files button launches the Open dialog, which is used to browse and load files for use within the UAC.

Remove Files Button



Figure 15 - Remove Files Button

The Remove Files button removes selected files from the Workspace List Window.

Process All Files Button



Figure 16 - Process All Files Button

The Process All Files button initiates the run sequence using the parameters as defined in the application plug-in options on the files in the Workspace List Window.

UAC Workspace List Window Description

The Workspace List Window displays the currently selected files the UAC will effect. It shows the File name, file Size, Location of each file, and the date/time the file was last Modified.

You can add files to this window by using Workspace > Add Files... to launch the Open dialog, or copy- or cut-and-paste files into the window, or simply drag-and-drop files into the window.

If file types dragged into the window do not match the proper type for the loaded plug-in, the non-compatible file will be listed in red text (see Figure 18). Keeping these non-compatible files can be useful within a workspace in case you have multiple plug-ins you want to use on a single group of files, allowing you to create a single workspace of files to be effected through multiple plug-ins.




Figure 17 - Workspace List Window




Figure 18 - Workspace List Window with Non-Compatible Files

Installation

The UAC can be installed from CD or via an Electronic Software Distribution (ESD) file downloaded from the DocSoft website. You will need a valid key code to install the full version. After purchase, you will have seven (7) days to activate the software electronically. If you do not activate the software within seven (7) days, the software will not operate until it is properly activated.

Installation Procedure

The Universal Application Console is easy to install, and can be installed with just a few clicks. Use the following procedure to install the UAC.



NOTE:

The installer file requires Windows Installer version 2 or greater. A compatible Windows Installer version is installed as part of the .NET Framework installation. If you received the executable via CD-ROM, a compatible Windows Installer version is located on the CD-ROM in the "Support Installs" directory.

If you continue to have problems launching the UAC installer executable, visit http://www.microsoft.com/ to download the latest installer version.



  1. Double-click the UACSetup.msi installer file. This will launch the installer wizard as shown in the following figure. Click "Next >" to continue.



  2. Figure 19 - Universal Application Console Installer Wizard

  3. Read the license agreement. If you agree to the license agreement, click "I agree" and press the "Next >" button to continue.



  4. Figure 20 - UAC License Agreement

  5. Select the folder in which to install the UAC files. Default is to "C:\Program Files\DocSoft Universal Application Console\". You may press the "Browse..." button to select a different folder. Press the "Disk Cost..." to see how much space will be needed to install the UAC. Select the "Everyone" radio button to install the UAC for everyone that will use the client computer, or "Just me" radio button to only allow the currently-logged-in user to use software. Press "Next >" to continue.



  6. Figure 21 - Select Installation Folder

  7. This screen allows the user to go back and change installation configuration before installing. If configuration is correct, press "Next >" to continue. If changes need to be made, press "< Back" to modify before continuing.



  8. Figure 22 - Confirm Installation

  9. Upon successfully entering installation path information, the Installing Universal Application Console screen appears. A progress bar details the progress of the installation.



  10. Figure 23 - Installing Universal Application Console

  11. Once installation is successful, the "Installation Complete" screen appears. Press "Close" to complete installation.



  12. Figure 24 - Installation Complete

Creating a New Workspace

The UAC uses workspaces to save information for which files to effect, and which plug-in(s) and associated options to use within a workspace. You may create a workspace that does not contain any specific files, but rather only plug-in and options, so that you may add different files each time you need to effect different files over a period of time.

The UAC also provides a method of using command line instructions to automatically schedule files to effect, or to run a series of workspaces in chronological order. Please see Using Command Line Instructions for detailed information.

To create a new workspace, choose File > New Workspace from the Menu Bar (as shown in Figure 25), or press Ctrl+N.




Figure 25 - Creating New Workspace from File Menu

Adding Files to a Workspace

The UAC provides 4 ways of adding files to a workspace. The following defines and described each method.

  1. Workspace Menu - Go to Workspace Menu > Add Files..., which will launch the Open dialog in which you can browse and select files to add to the workspace.
  2. Drop Pad - Go to Workspace Menu > Drop Pad or use the Ctrl+D shortcut to toggle. The Drop Pad will be located at the bottom-right corner of the screen and always stay on top. You can drag-and-drop files to the Drop Pad to add to the current workspace.
  3. Drag-and-Drop - You may drag-and-drop files directly to the workspace window.
  4. Copy-and-Paste - You may copy-and-paste files to the workspace window. You may paste using Ctrl+V shortcut or paste via the Workspace Menu (Workspace Menu > Paste From Clipboard).

Saving a Workspace

After you have created and/or modified a workspace, you will probably want to save the workspace for later use. When you save a workspace, the UAC saves the workspace as a *.uac file (as shown in Figure 26).

To save a newly created or modified workspace, simply press Ctrl+S or choose File > Save Workspace or File > Save Workspace As... menu items.




Figure 26 - Saving a Workspace

Opening a Workspace

To open a previously-created workspace, use the File > Open Workspace menu item (see Figure 27), or Ctrl+O shortcut.




Figure 27 - Accessing Replace Application Plug-in Help

Removing Files from a Workspace



NOTE:

It is not necessary to remove files from a workspace when a new application plug-in is loaded to a current workspace. The application plug-in will only affect files compatible with the currently-loaded plug-in. Non-compatible files will not be modified.



To remove files from a workspace, highlight the file(s) you want to remove from the workspace, and press the Remove Files button (see Figure 15), or from the Menu Bar, go to Workspace Menu > Remove Files. You may also press the Del key to remove selected files from Workspace.

To remove ALL files from the workspace, go to Workspace Menu > Remove All Files.

Executing Workspaces From Context Menu

You may execute a saved workspace directly without opening the UAC by right-clicking on a workspace file (*.uac) and choosing "Execute" from the context menu (as shown in Figure 28). This will run the workspace in the background without opening the UAC interface. This is especially useful when you want to run specific workspaces from a scheduler, such as Microsoft's Scheduled Tasks.




Figure 28 - Executing a Workspace From the Context Menu

After a workspace has been executed, a log file will be created in the same directory in which the *.uac workspace file resides. You can view this log file to see the results of the executed workspace (see Figure 29).




Figure 29 - Workspace Log Created After Execting a Workspace File

Using Command Line to Run UAC

The UAC provides a method of using command line parameters to run workspaces. This is especially helpful if you need to schedule the operation of workspaces for a particular time of the day or week using Scheduler.

Use the following command line parameters to run the UAC via the Run command line:

UniversalConsole.exe [workspace_file] [options]

Parameter Description
-r Run application without UI
-l [log_file] Create log file
-p plug-in_id Select Plug-in
-a [file [filename ...]] Add files

Loading the W2XML Application Plug-in



NOTE:

Operation of the Universal Application Console is covered under the UAC documentation. Please refer to the UAC documentation for UAC-specific help.



Loading the W2XML application plug-in is simple. Perform these three easy steps to properly load the application.

  1. Go to Workspace > Change Plug-in... (Figure 30) to launch the Select Plug-in dialog (Figure 31).



  2. Figure 30 - Change Plug-in

  3. A list of all of your installed application plug-ins is shown. Highlight the W2XML v2.5 application plug-in, then press OK.



  4. Figure 31 - Select Plug-in Dialog

  5. The W2XML v2.5 application plug-in is now loaded. Notice there is now a W2XML menu on the UAC menu bar, and the name of the application plug-in is shown on the UAC title bar and the Options Window reflects current W2XML options configuration (Figure 32).



  6. Figure 32 - W2XML Application Plug-in Loaded

How W2XML Works

W2XML uses Microsoft's .NET® framework and Tidy Open Source technology, along with some of DocSoft's own conversion technology to output pure, structured XML from MSWord®, RTF, ASP*, and JSP* files.

W2XML first opens each *.doc and *.rtf document in Word, saves as HTML, then uses Tidy technology to save as raw XHTML. Then the W2XML application applies special technology to clean up the raw exported data and apply custom configuration options and XSLTs to the XML to provide you with XML your organization can use to add life to your legacy information.

This version of W2XML also exports named styles from Word and uses them as attributes within the exported document's elements. Because of this technology, you can use MSWord to author all of your documentation, by applying custom, named styles from a custom template, then using a custom XSLT to transform the exported XML to a complete, custom XML document that conforms to any Schema or DTD you desire. See Using Named Styles in Word for Custom XML Attributes for complete details.



NOTE:

Customers who use W2XML to convert Word to XML must understand the relatively complex concepts behind conversion from an unstructured format to a structured format. For this reason, DocSoft recommends that customers who are new to structured content or do not have the expertise to develop sometimes complex XSLTs, contract with DocSoft's consulting services to develop XSLTs to convert the standard output to a Schema or tag set for their specific needs.



W2XML Options

The W2XML Options Window contains four links that are used to modify the output configuration:

  1. Mode Link
  2. Output Folder Link
  3. Open Output Folder
  4. Options Link

Explanations for each of the links are included below:

Mode Link

The Mode Link, when clicked, opens a menu dialog allowing the user to choose from a list of preset output configurations or to choose a Custom mode (see Figure 33). The menu consists of four preset configurations and the custom option (which is default). When one of the four preset configurations is chosen, the link changes to reflect which configuration is current (see Figure 34).




Figure 33 - Preset Configuration Menu from Options Window




Figure 34 - Mode Link Shows Current Configuration Mode

When the Custom mode is chosen from the menu dialog (Figure 35), the Options dialog is launched. The Options dialog's interface provides a means to fully customize the configuration. (The Options dialog will be covered in depth in "Configuring W2XML Options.")




Figure 35 - Choosing Custom Mode from Menu Dialog




Figure 36 - Options Dialog

The preset configuration output options are defined in the following list. The fewer tags you select, the smaller the XML. In some instances you will choose to use many tags for maximum information; other times, you will require less information and, consequently, fewer tags. It is recommended to test the output to see what preset configuration is best for your needs or to create a custom configuration.

Output Folder Link

The Output Folder Link launches the Browse For Folder dialog, which you can use to navigate to a folder or create a new folder for output. (See Figure 37).




Figure 37 - Browse For Folder Dialog

Open Output Folder

The Open Output Folder link opens the folder you specified in the Browse For Folder dialog, which you can launch using the Output Folder Link. (See Figure 38).




Figure 38 - Launching the W2XML Output folder

Options Link

Pressing the Options Link will launch the Options dialog (see Figure 39). You can also access this dialog by choosing W2XML > Options... from the menu bar (Figure 39).




Figure 39 - Launching the W2XML Options Dialog from the Options Window Link




Figure 40 - Launching the W2XML Options Dialog from the W2XML Menu

Configuring W2XML Options

The W2XML Options Dialog (see accompanying Figure 41) provides access to all of the options needed to perform XML conversion. It consists of three main areas used for configuring options:

  1. Output Folder
  2. Export Settings
  3. Apply Custom XSLT



Figure 41 - W2XML Options Dialog

The following sections describe each of the above option items.

Output Folder

The Output Folder option can be used to choose a specific folder for output or to create a new folder for output. The Output Folder text field shows the currently-selected path in which the XML files will be exported. To change the path, click the Browse... button to launch the Browse For Folder dialog (as shown in Figure 37).

Export Settings

The Export Settings option contains four preset configuration options and a custom option. The custom option can be saved for future use by saving the workspace with the custom settings selected. Choosing one of the preset configurations, selects predetermined settings in the configuration list.

Each configuration item checkbox is listed and exlained below:

Hidden Options
Apply Custom XSLT


NOTE:

The Docbook XSLT that is provided with the installation program gives the end-user a starting point in exporting Docbook-compliant XML. Since there is no way to determine the exact layout of everyone's Word documents, you may need to tweak this XSLT to meet your specific Docbook requirements.





NOTE:

Customers who use W2XML to convert Word to XML must understand the relatively complex concepts behind conversion from an unstructured format to a structured format. For this reason, DocSoft recommends that customers who are new to structured content or do not have the expertise to develop sometimes complex XSLTs, contract with DocSoft's consulting services to develop XSLTs to convert the standard output to a Schema or tag set for their specific needs.



The Apply Custom XSLT checkbox, when checked, provides the ability to create and use custom XSLTs to further customize the exported output. You may press the Browse... button to choose a custom XSLT that you have created, or press the Set Docbook XSLT button to auto-select an XSLT that will transform the standard XHTML output to Docbook-compliant XML (which is included as part of the installation package).

The Check XSLT will check the selected XSLT to ensure it is a valid XSLT in terms of structure and error.

Remember, since W2XML exports named styles from Word as attributes, you can use them to fully customize the exported XML. For more information on using named styles as attributes, see Using Named Styles in Word for Custom XML.

Creating XSLTs for Custom XML Output



NOTE:

Customers who use W2XML to convert Word to XML must understand the relatively complex concepts behind conversion from an unstructured format to a structured format. For this reason, DocSoft recommends that customers who are new to structured content, or do not have the expertise to develop sometimes complex XSLTs, contract with DocSoft's consulting services to develop XSLTs to convert the standard output to a Schema or tag set for your specific needs.



To fully take advantage of W2XML's capabilities, one of the prerequisites is a good understanding of XML and XSL. If you don't have a good understanding, or do not have access to someone within your organization who does, then you may want to inquire with DocSoft or some other consulting agency that can build the XSLT you need to take advantage of the rich capabilities W2XML offers. The following information is provided to give a high-level understanding of the standard output so that one can build a custom XSLT to create a completely custom output.

W2XML's standard output is XHTML (which is well-formed XML using HTML tags). If you retain the raw xml file (an option in the options list), you will see that there is a lot of information that is exported from the Word document. So much so, that you could write an XSLT to convert it back to a properly formatted Word document. The first thing to do when starting the development of an XSLT is to determine which options to select to give you the output that you want to start with. Familiarizing yourself with the standard output and options is one of the most important things to do.

W2XML also wraps div tags around each section (such as <div class="section1">) that will enable you to identify sections throughout your XML document for proper nesting and hierarchy.

You also need to determine if your legacy documents contain style names that you can key on to produce specific XML elements. This will help you gain some leverage in creating your custom XSLT, since each style name is exported as a class attribute value for each element (an example is shown below):

<p class="sectionTitle">Sample Document</p>
<p class="para">Paragraph text</p>
<p class="para">More paragraph text</p>
<p class="subsectionTitle">Introduction</p>
<p class="para">Paragraph text</p>
<p class="para">More paragraph text</p>

In the above example, many p tags are used, but contain style name information that was included in the source Word document. The source document has a style named sectionTitle that formats the section title. It has a style named "para" that formats the normal paragraphs for the Word document. It has a style named "subsectionTitle" that formats each subsection title. These class attributes are valuable in creating an XSLT that produces custom XML. Even though each element in the standard output (in the fragment above) uses the "p" element, the XSLT can easily transform these to custom element names. There are sample XSLTs available from the W2XML Updates page you can use to modify or you can use as a learning tool. You may use the docbook.xslt that is included in the installation to modify or learn from as well.

Setting W2XML Options Through Custom XSLTs

W2XML v2.5 provides the capability of including Workspace information in the XSLT to set specific options whenever a custom XSLT is applied (through the "Apply Custom XSLT checkbox"). This makes it easier to distribute this application with specific options settings throughout your organization. This is especially helpful to organizations using W2XML as an enterprise-wide solution.

Including option settings information into your XSLT can be accomplished in two ways: (1) by creating a workspace with options information saved within the workspace and copying to your XSLT or (2) by creating workspace information via notepad using the Workspace Schema.

To add workspace information to your XSLT from a previously-created Workspace, open the *.uac workspace file in a text editor and copy everything between (and including) the WordXMLOptionsData element, then paste into the top portion of your XSLT, directly after your xsl:output declaration (if applicable, such as <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="no" /> and before the <xsl:template match="/"> template.

Once the workspace information is properly placed within your XSLT, attempt to apply the XSLT from the W2XML Options interface. Notice that when the XSLT is loaded, a dialog box will appear (as shown in Figure Figure 42) asking if you would like to import options. Selecting "Yes" will change the appropriate options in the W2XML Options Dialog. You are now ready to process files.




Figure 42 - Importing Options From XSLT Dialog

Working with the EXSLT Library

Introduced in v.2.5 of W2XML, the EXSLT.NET implementation of EXSLT brings the funcanality of EXSLT to your custom XSLT development.

What Is EXSLT.NET?

The EXSLT.NET library is a community-developed implementation of the EXSLT extensions to XSLT for the .NET platform.

EXSLT.NET fully implements the following EXSLT modules:

In addition EXSLT.NET library provides a proprietary set of useful extension functions.



NOTE:

Currently the only extension element supported by W2XML is exsl:document element. See "Multiple Output" section for more information.



Multiple Output

EXSLT.NET partially supports exsl:document extension element. Not all exsl:document attributes and their values are supported in this version of W2XML. The supported subset of attributes and values is as follows:

<exsl:document
     href = { uri-reference }
     method = { "xml" | "text" }
     encoding = { string }
     standalone = { "yes" | "no" }
     doctype-public = { string }
     doctype-system = { string }
     indent = { "yes" | "no" } >
          <-- Content: template -->
</exsl:document>

exsl:document extension element assumes the transformation is always done in XML, so EXSLT produces XHTML result documents. More specifically, main result document is always XML, but subsidiary result documents may be written either as XML or as text, depending on the method attribute value of the appropriate <exsl:document> element.

Moreover, the <xsl:output> element is ignored when creating subsidiary result documents. The <xsl:output> only affects outputting of the main result document, the <xsl:output> element does not affect outputting of any subsidiary result documents. It's completely controlled by the <exsl:document> element. Instead, you can get some control over outputting of the main result document using the encoding and indentation value of the appropriate <exsl:document> element.

For more information about how Multiple Output is implemented see "Producing Multiple Outputs from an XSL Transformation" article.

References

  1. EXSLT community initiative - http://www.exslt.org
  2. EXSLT.NET Workspace home - http://workspaces.gotdotnet.com/exslt
  3. EXSLT.NET online documentation - http://www.xmland.net/exslt
  4. "EXSLT: Enhancing the Power of XSLT" by Dare Obasanjo, MSDN
  5. "EXSLT Meets XPath" by Dare Obasanjo, MSDN
  6. "Building Practical Solutions with EXSLT.NET" by Oleg Tkachenko, MSDN
  7. EXSLT.NET enabled command line XSLT utility - nxslt.exe

Examples

The Multiple Output option allows you to create, as its name implies, multiple output files from a custom XSLT. Below you will find an example XSLT using this new "Multiple Output" option.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common"
xmlns:h="http://www.w3.org/1999/xhtml" xmlns:w2x="urn:schemas-docsoft-com:word-to-xml:extensions"
xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:st1="urn:schemas-microsoft-com:office:smarttags"
exclude-result-prefixes="h w2x o st1 exsl">
  <xsl:param name="w2x-source" />
  <xsl:param name="w2x-uniquenumber" />
  <xsl:param name="w2x-user" />
  <WordXMLOptionsData xmlns="urn:schemas-docsoft-com:word-to-xml:extensions">
    <OutputPath>C:\temp</OutputPath>
    <SaveRawXML>false</SaveRawXML>
    <SaveStyle>false</SaveStyle>
    <RemoveGeneratedHTMLFile>true</RemoveGeneratedHTMLFile>
    <RemoveStyle>true</RemoveStyle>
    <RemoveScript>true</RemoveScript>
    <RemoveOTags>true</RemoveOTags>
    <RemoveVTags>true</RemoveVTags>
    <RemoveXML>true</RemoveXML>
    <RemoveSPAN>true</RemoveSPAN>
    <ExpandShowHideTags>true</ExpandShowHideTags>
    <CleanHeader>true</CleanHeader>
    <ClassAttributeLowerCase>true</ClassAttributeLowerCase>
    <ChangeNBSPToSpace>true</ChangeNBSPToSpace>
    <AddHeaderDIV>true</AddHeaderDIV>
    <AddPosibleListAttributes>true</AddPosibleListAttributes>
    <RemoveAutonumeration>true</RemoveAutonumeration>
    <ApplyXSL>true</ApplyXSL>
    <XSLFileName>c:\program files\docsoft universal application console\word2xml\xslts\example.xslt</XSLFileName>
  </WordXMLOptionsData>
  <xsl:output method="html" encoding="utf-8" indent="yes" omit-xml-declaration="yes" />
  <xsl:template match="h:style|h:script|h:xml|w2x:hide|w2x:script" />
  <xsl:template match="/">
    <xsl:variable name="tar" select="substring-before($w2x-source,'.')" />
    <exsl:document href="{$tar}/toc.cfm" method="html" encoding="utf-8" indent="yes" omit-xml-declaration="yes">
      <html xmlns="http://www.w3.org/1999/xhtml">
        <head>
          <title>Table of Contents</title>
        </head>
        <body>
          <table width="100%" border="0" cellpadding="0" cellspacing="1" bgcolor="#003366">
            <tr>
              <td>
          <table align="center" width="100%" border="0" cellpadding="3" cellspacing="0">
            <tr>
              <td colspan="2" valign="top" class="bodytext-sm-reversed">Table of Contents</td>
            </tr>
            <xsl:apply-templates select="//h:p" mode="toc" />
          </table>
              </td>
            </tr>
          </table>
        </body>
      </html>
    </exsl:document>
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>
        <xsl:for-each select="h:html/h:head/h:title">
          <xsl:value-of select="." />
        </xsl:for-each>
        </title>
        <xsl:apply-templates select="*/*/o:documentproperties" />
      </head>
      <body>
        <xsl:apply-templates select="h:html/h:body" />
      </body>
    </html>
  </xsl:template>
<!-- Templates -->
  <xsl:template match="o:documentproperties">
    <xsl:if test="o:author">
      <meta name="author" content="{o:author}" />
    </xsl:if>
  </xsl:template>
  <xsl:template match="h:body">
    <xsl:apply-templates />
  </xsl:template>
  <xsl:template match="h:div">
    <div>
      <xsl:apply-templates />
    </div>
  </xsl:template>
  <xsl:template match="h:p" mode="toc">
    <tr>
      <td width="99%" valign="top">
        <a href="#{generate-id()}">
          <xsl:value-of select="." />
        </a>
      </td>
    </tr>
  </xsl:template>
  <xsl:template match="h:p">
    <xsl:choose>
      <xsl:when test=". = ' '"></xsl:when>
      <xsl:when test="@class = 'h1'">
        <h1 id="{generate-id()}">
          <xsl:apply-templates />
        </h1>
      </xsl:when>
      <xsl:when test="@class = 'h2'">
        <h2 id="{generate-id()}">
          <xsl:apply-templates />
        </h2>
      </xsl:when>
      <xsl:otherwise>
        <p>
          <xsl:apply-templates />
        </p>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

The example XSLT above takes the W2XML output and produces two files. The main output and a toc file of that main output.

Along with creating external files, such as a table of contents, this new "Multiple Output" could also be used to create pagination of your Word Document. This would be accomplished by dividing the document into sections.

Working with Equation Objects

Introduced in v.2.5 of W2XML, the Equation Objects to MathML option allows for the conversion of Equation Objects to XML. This conversion is done using the MathML Schema as the result base. Using this option to convert your Equation Objects to XML allows for the posibility of applying a custom XSLT to this XML to customize W2XML's output even further.



NOTE:
Choosing the 'Remove XML data islands' or 'Remove IE hide/show tags' options may cause the 'Convert Equation objects to MathML' option to not function properly. Please ensure these options are not selected when using this option.


What is MathML?

MathML is a low-level specification for describing mathematics as a basis for machine to machine communication.

References

  1. W3C Math Home - http://www.w3.org/Math/
  2. Latest MathML Recommendation - http://www.w3.org/TR/MathML/
  3. Zvon MathML Reference - http://www.zvon.org/xxl/MathML/Output/index.html

Examples

The Equation Objects to MathML option allows you to create MathML instances of your Equation Objects. Below is an example:


Equation Object

Equation Object

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mfrac>
      <mrow>
        <mi>x</mi>
      </mrow>
      <mrow>
        <mfrac bevelled="true">
          <mrow>
            <mn>1</mn>
          </mrow>
          <mrow>
            <mn>2</mn>
          </mrow>
        </mfrac>
        <mi>L</mi>
      </mrow>
    </mfrac>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mi>L</mi>
        <mo>−</mo>
        <mi>x</mi>
      </mrow>
      <mrow>
        <mi>L</mi>
      </mrow>
    </mfrac>
  </mrow>
</math>

MathML

Preserving Page numbering

Now, throught the use of the "Insert Page Markers" option, you can preseve paging information from your Word Documents. Preserving Page numbering would allow you to recreate page formatting outside of your Word document.

Examples

The Insert Page Markers option allows you to create xml anchor tags that will be placed at the beginning of all sections that start a new page in your word documens.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/1999/xhtml" w2x:generator-version="1.2" xmlns:w2x="urn:schemas-docsoft-com:word-to-xml:extensions">
  <head>
    <title>Page Breaks</title>
  </head>
  <body>
    <div class="section1">
      <p class="msonormal">
        <a name="_PgM1" id="_PgM1"/>        Page 1
      </p>
      <br clear="all" />
      <p class="msonormal">
        <a name="_PgM2" id="_PgM2"/>        Page 2
      </p>
      <br clear="all" />
      <p class="msonormal">
        <a name="_PgM3" id="_PgM3">        Page 3
      </p>
    </div>
  </body>
</html>

Creating <![CDATA[ ]]> Commenting

Now, throught the use of the "Post process of 'cdata' attributes" option and custom XSLT, you can include 'non parsed' chunks of code from your Word Documents by placing them inside <![CDATA[ ]]> tags.



NOTE:

Everything inside a CDATA section is ignored by the parser.



Examples

The Post process of 'cdata' attributes option allows you to create CDATA elements that will inclose tags that represent code you have placed in your Word Documents.

<xsl:template match="h:p">
  <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
  <xsl:apply-templates/>
  <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
</xsl:template>

W2XML Extensions Schema

Use the following Schema to aid in developing workspace settings within your custom XSTL. You may also view the Schema documentation online.

<?xml version="1.0"?>
<xs:schema
targetNamespace="urn:schemas-docsoft-com:word-to-xml:extensions"
xmlns:mstns="urn:schemas-docsoft-com:word-to-xml:extensions"
xmlns="urn:schemas-docsoft-com:word-to-xml:extensions"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
attributeFormDefault="unqualified" elementFormDefault="qualified">
  <xs:element name="WordXMLOptionsData">
    <xs:complexType>
      <xs:all>
        <xs:element name="OutputPath" type="xs:string" minOccurs="0" />
        <xs:element name="RemoveStyle" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveScript" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveOTags" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveVTags" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveXML" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveSPAN" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveUselessAnchors" type="xs:boolean" minOccurs="0" />
        <xs:element name="ExpandShowHideTags" type="xs:boolean" minOccurs="0" />
        <xs:element name="CleanHeader" type="xs:boolean" minOccurs="0" />
        <xs:element name="ClassAttributeLowerCase" type="xs:boolean" minOccurs="0" />
        <xs:element name="ChangeNBSPToSpace" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveSpaceBeforePunct" type="xs:boolean" minOccurs="0" />
        <xs:element name="AddHeaderDIV" type="xs:boolean" minOccurs="0" />
        <xs:element name="AddFieldGroups" type="xs:boolean" minOccurs="0" />
        <xs:element name="AddPosibleListAttributes" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveAutonumeration" type="xs:boolean" minOccurs="0" />
        <xs:element name="SaveRawXML" type="xs:boolean" minOccurs="0" />
        <xs:element name="SaveStyle" type="xs:boolean" minOccurs="0" />
        <xs:element name="MoveImages" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveGeneratedHTMLFile" type="xs:boolean" minOccurs="0" />
        <xs:element name="PostPublishing" type="xs:boolean" minOccurs="0" />
        <xs:element name="UseEXSLT" type="xs:boolean" minOccurs="0" />
        <xs:element name="UseMulti" type="xs:boolean" minOccurs="0" />
        <xs:element name="InsertPageMarkers" type="xs:boolean" minOccurs="0" />
        <xs:element name="CDATAAttrs" type="xs:boolean" minOccurs="0" />
        <xs:element name="MathML" type="xs:boolean" minOccurs="0" />
        <xs:element name="AcceptRevisions" type="xs:boolean" minOccurs="0" />
        <xs:element name="RemoveSmartTags" type="xs:boolean" minOccurs="0" />
        <xs:element name="ApplyXSL" type="xs:boolean" minOccurs="0" />
        <xs:element name="XSLFileName" type="xs:string" minOccurs="0" />
        <xs:element name="ImagesFolder" type="xs:string" minOccurs="0" />
        <xs:element name="UserMode" type="xs:boolean" minOccurs="0" />
      </xs:all>
    </xs:complexType>
  </xs:element>
  <xs:element name="show" type="freeContent" />
  <xs:element name="hide" type="freeContent" />
  <xs:element name="root" type="freeContent" />
  <xs:element name="script">
     <xs:complexType mixed="true" >
      <xs:complexContent>
        <xs:extension base="freeContent">
          <xs:attribute name="output" default="no" type="yesno" />
        </xs:extension>
     </xs:complexContent>
    </xs:complexType>
  </xs:element>
  <xs:attribute name="class" type="xs:string" />
  <xs:attribute name="header-level" type="xs:int" />
  <xs:attribute name="list" type="yesno" />
  <xs:attribute name="list-type" type="xs:string" />
  <xs:attribute name="list-level" type="xs:int" />
  <xs:attribute name="list-class" type="xs:string" />
  <xs:attribute name="list-format" type="xs:string" />
  <xs:attribute name="list-item-level" type="xs:int" />
  <xs:attribute name="list-item-class" type="xs:string" />
  <xs:simpleType name="yesno">
    <xs:restriction base="xs:string">
      <xs:enumeration value="yes" />
      <xs:enumeration value="no" />
    </xs:restriction>
  </xs:simpleType>
  <xs:complexType name="freeContent" mixed="true">
    <xs:sequence>
      <xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Using Named Styles in Word for Custom XML

One powerful feature of W2XML is that it can retain named styles used within MSWord and use them in the exported XML. This allows developers to create named styles and save them as templates so authors can apply styles to the Word document and then export that information for a complete, custom solution that suits the individual needs of each organization.



NOTE:

Customers who use W2XML to convert Word to XML must understand the relatively complex concepts behind conversion from an unstructured format to a structured format. For this reason, DocSoft recommends that customers who are new to structured content or do not have the expertise to develop sometimes complex XSLTs, contract with DocSoft's consulting services to develop XSLTs to convert the standard output to a Schema or tag set for their specific needs.





NOTE:

The following procedure is provided for demonstration purposes. The images and specific steps required to perform this task may differ depending on which version of Word® you are using. Word 2002 was used in the procedure. Please read and follow guidelines as set forth in the documentation for the version of MSWord you will use to create named styles and templates.



  1. Open Word, then go to Format > Styles and Formatting (Figure 43) to open the Styles and Formatting task pane (as shown in Figure 44).



  2. Figure 43 - Accessing the Styles and Formatting Task Pane

  3. The Styles and Formatting task pane is launched, showing available styles in the current template.



  4. Figure 44 - Style and Formatting Task Pane Launched

  5. Figure 45 shows a sample Word document containing titles, paragraphs, and a bulleted list. We will create a named style for each of these elements that will allow W2XML to export the name of the style as an attribute.



  6. Figure 45 - Sample Word Document

  7. We will create a named style named "title" for the titles (highlighted in red), a style named "para" for the paragraphs (highlighted in blue), and another named style named "ulist" for the bulleted list items (see accompanying Figure 46).



  8. Figure 46 - Color-Coded Elements

  9. To create a new named style, click the New Style button (Figure 47). This launches the New Style dialog (as shown in Figure 48).



  10. Figure 47 - New Style Button

  11. Enter the name of the style in which you want to add. In this example, we are adding a style named "title", which we will use for the titles of each paragraph. This name will become a custom attribute in the exported XML. You can also set formatting for this style here as well. Be sure and check the Add to template checkbox to add this style to the template.



  12. Figure 48 - New Style Dialog

  13. After creating a new style, the new, named style appears in the Styles and Formatting task pane, as shown in Figure 49.



  14. Figure 49 - Named Style Added to Styles and Formatting Task Pane

  15. Continue the above process for each named style you want to add. For this example, we have added a named style for each of the 3 elements we spoke of earlier (title, para and ulist), as shown in the following figure.



  16. Figure 50 - Three Named Styles Added to Styles and Formatting Task Pane

  17. To apply a named style, simply highlight the text in the Word document that you want to add the style (Figure 51), then select which named style to apply from the Styles and Formatting task pane. In this example, we highlighted a title and chose to apply the "title" named style (Figure 52).



  18. Figure 51 - Highlighted Text in Word




    Figure 52 - Named Style Applied

  19. After the document is saved and W2XML has processed the document to XML, the output shows how the named styles are applied (as shown in Figure 53 and Figure 54). Notice <p class="title">Lorem Ipsum</p> has a class attribute of "title" and <p class="para"> has a class attribute with the value of "para". Figure 54 shows the list style we create contains list data (inside the DIV tag) and the values of the p tags have an attribute of "ulist".

    With this powerful capability, you can create and apply a custom XSLT to transform this document using only the values of the class attributes as elements, simply and easily authoring your XML documents in Word.



  20. Figure 53 - Title and Para Named Styles Shown as Values in Class Attributes




    Figure 54 - Ulist Named Style Shown as Values in Class Attributes

Converting ASP and JSP to XML

W2XML's ability to export ASP and JSP to XML is somewhat more limited than from MSWord, because we are not able to key off of attributes and styles like we can in MSWord. W2XML does provide structure for any HTML code that is used within these two formats, and puts w2x:script tags around any specific processing instructions (such as <% if request("productID") <> "" then %> is converted to <w2x:script>if request("productID") <> "" then</w2x:script>), so it is still useful to some developers to have this capability.

DocSoft suggests that you test how your ASP and/or JSP is exported by W2XML, then develop and apply a custom XSLT that will make the most of your data.

When exporting ASP or JSP the "Remove script elements and attributes" option of W2XML must be unchecked.

Running the W2XML Application

DocSoft's W2XML Application Plug-in affects only certain file types, as listed under Help > Plug-in Info.... Caution should be exercised when selecting files for modification. Any files highlighted in red are not valid files (as shown in Figure 55). If you try to process all files by pressing the Go button, an alert will appear telling you that you cannot process all files. Clicking OK will close the alert, but no files will be processed. You can either remove the invalid files from the workspace or you can highlight the valid files, then choose W2XML > Process Selected Files (as shown in Figure 57).

To process a single file from the workspace, you may right-click on the selected file and choose Process from the context menu.




Figure 55 - Invalid File Types Highlighted in Red




Figure 56 - Error Alert Displayed When Attempting to Process Invalid File Types




Figure 57 - Processing Selected Files Only

After you have properly added files to the workspace and have properly configured the options, you are ready to process your files.







Home | Search Software Docs with X3 | Downloads | Contact Us | About Us | Employment

























Powered by:

© 2000, 2002 DocSoft, Inc.
All Rights Reserved.