Have a look at col(1), because col can filter out backspace sequences. Just in case you can't wait that long:
funnyprompt$ groff -t -e -mandoc -Tascii manpage.1 | col -bx > manpage.txt
The -t and -e switches tell groff to preprocess using tbl and eqn. This is overkill for man pages that don't require preprocessing but it does no harm apart from a few CPU cycles wasted. On the other hand, not using -t when it is actually required does harm: the table is terribly formatted. You can even find out (well, "guess" is a better word) what command is needed to format a certain groff document (not just man pages) by issuing
funnyprompt$ grog /usr/man/man7/signal.7 groff -t -man /usr/man/man7/signal.7 |
"Grog" stands for "GROff Guess", and it does what it says--guess. If it were perfect we wouldn't need options any more. I've seen it guess incorrectly on macro packages and on preprocessors. Here is a little perl script I wrote that can delete the page headers and footers, thereby saving you a few pages (and mother nature a tree) when printing long and elaborate man pages. Save it in a file named strip-headers & chmod 755.
#!/usr/bin/perl -wn
# make it slurp the whole file at once:
undef $/;
# delete first header:
s/^\n*.*\n+//;
# delete last footer:
s/\n+.*\n+$/\n/g;
# delete page breaks:
s/\n\n+[^ \t].*\n\n+(\S+).*\1\n\n+/\n/g;
# collapse two or more blank lines into a single one:
s/\n{3,}/\n\n/g;
# see what's left...
print; |
You have to use it as the first filter after the man command as it relies on the number of newlines being output by groff. For example:
funnyprompt$ man bash | strip-headers | col -bx > bash.txt