Text Formatting Settings

When you print to Miraplacid Text Driver, Preview window pops up. If you click on "Settings" button on the Preview Window toolbar, Settings dialog will open. Settings dialog has several tabs discussed here. This document describes Text Formatting Settings.

Miraplacid Text Driver Text Formatting Settings

  1. Character set New versions of Windows print text in Unicode. You can keep it in Unicode or convert to old good 8-bit bytes. Please specify your charater set from the dropdown list.
  2. Insert Unicode prefix Some text editors add codes 0xFF 0xFE to the beginnning of Unicode text file. We recommend to add the prefix unless you are sure your software does not support it.
  3. End of line style You can choose between Windows and Unix style. We recommend to use Windows style unless you use legacy Unix software.
  4. Insert page breaks Adds page brake symbol. As an alternative, you can add {{PAGE}} to outlut file name to save pages to indvidual files.
  5. Formatting Style Miraplacid Text Driver can format extracted text in different ways. Unfortunately, there is no way to make it look exactly like the original document. Plain text files do not support different font types and sizes and cannot condence or expand characters. However, Miraplacid Text Driver Text Formatting plug-ins do a really good job in most cases.

Formatting Styles

If you need your text to look like the original document, please select Formatted text.
If you need just a text without formatting, select Plain text.
If you familiar with XML files processing, you can try XML output. It saves textboxes with text, size, location and font information. Besides, it contains page size and DPI settings.
RSS-Atom formatting style allows you to save information in Web content syndication formats RSS and Atom for further using in news exchanging services.
Text with Layout is similar to Formatted text, but based on previous version of text formatting algorithm.
We recommend to use "Formatted text" unless "Text with Layout" works better for you.

Formatting Style Settings

  • Formatted text
    • Character size settings allows you to control character size calculation process, which is important when rendering text placed at specific positions, tables, etc.
      Automatic, each page option used by default, each page statistics calculated independently.
      Automatic, first page option calculates page statistics once per document, based on first page. This option will be useful when you print some kind of reports and each piece of text must be rendered exactly at the same position at all pages.
      Fixed size option alows you to set character size (in printer pixels) manually to get the best results for particular type of document with appropriate printer resolution.
    • Line height option allows you to control vertical rendering. Automatic value lets you to specify gap between lines, in tenths of percent. Increase this value if you see that Text Driver inserts too many empty lines in the text, to make it more condensed.
      Default value is 65 (0.065%).
      It's also possible to specify Fixed value for line height, in printer pixels. It allows to achieve correct results in some machine-generated document with fixed fonts.
    • you can turn on or off Print margins. When turned on, Miraplacid Text Driver will add blank borders to formatted text. Border sizes will be calculated to match print margin settings in the document you extracting text from.
    • You may use Use Document spaces option to let plug-in use whitespaces in rendering.
    • Right alignment option may help to render documents with right-aligned tables properly.
  • Text with Layout uses Print margins, Line height and Use Document spaces options - see above.
  • Plain text - This text formatting style just merges all pieces of text in each line. By default it adds blank character between them, but you can change it by updating Delimiter value. This value may include the following escaped special characters: \s (whitespace), \t, \r, \n, \f, \\ (backslash itself) and \xnnnn (nnn is a hexadecimal code of Unicode symbol).
    Line height option also used.
  • XML option Optimize output can be used to merge individual textboxes, if words became split to several pieces, into whole words. Textboxes bound coordinates will be merged if this option is turned on.
    Whitespaces will be removed from output. You may correct merging process with Whitespace size option. Sometimes, it is impossible to separate textboxes automatically because there are very small spaces between them. Adjusting this parameter from 10% to 100% width from average space character, you may set an order in document with many "glued" words and phrases.
  • RSS-Atom style settings Link, Author and Description represent appropriate fields of RSS channel or Atom feed attributes. Additionally, Add <BR> to EOL option adds linebreak tag to make text look formatted (in Atom this means HTML content type).

See also: