leftlogo
 
Main
Software
Gallery
Forum
Blog

TextEbookify (c) Dan Klein (aka Schmads)

TextEbookify (download) removes extra carriage returns from TXT format ebooks that were formatted for reading on a fixed width display.  Typically most files like this have an extraneous newline every ~80 characters.  Reading files formatted in this manner on a portable reading device tends to be irritating, as you have a lot of un-needed carriage returns breaking up your lines.

Usage

Please note that this is a command-line interface program.  You don't have to run it from the commandline, though.  You can add a shortcut to your SendTo folder or drag-drop the text file onto the executable.  No installer is included (or needed), though this is a C# program built on the .NET 2.0 framework, so you'll need to get that if you don't have it already.

After completion, you will find a new file next to the original one, with _TE appended to the original name.  If it is unable to process the file, it will show an error.

Example

Approximate example simulated with Notepad using word wrap.
This is from The First Christmas Tree which is available from Project Gutenberg here.

As you can see, there are still some improper line breaks on the right side.  I am going to keep working on the process and see if I can create a smarter version.  The difficulty is in telling the difference between an intentional linebreak at the end of a paragraph and a linebreak that was inserted by the formatter that was wrapping it at 80 characters.

Version Notes

  • 1.10 (11/14/2007)
    - Rudimentary HTML tagging removal added.  It removes all fields bounded by "<" and ">" and replaces &quot; with ".  Please always keep your original files, and if you find one that causes problems, send it to me.
    - The parser assumes that lines ending with punctuation characters indicate paragraph ends, so it always preserves a carriage return.  This isn't foolproof, but it allows me to assume more lines need to be appended together, without combining short quotes that always should be on new lines, for instance.
    - Extra carriage returns are removed from lines longer than the average line length-10.

  • 1.00 (11/05/2007)
    - Extra carriage returns are removed from lines longer than the average line length.  Lines without spaces (i.e. more than one word) are not included in the average.
    - Multiple empty lines in a row are collapsed into a single empty line.
    - Limited to processing files in the 10s of Megabytes size range, since I read the entire file into memory to do the parsing.  If you need support for files without a size limit, I may support this at a later time.

Disclaimer

I am not responsible for anything bad that happens to anyone or anything as a result of your using this software.  I have used it and it works fine for me (that's why I wrote it!), but if it causes your Macbook Pro to burst into flames, it's not my problem.