Meniu

rmnl — remove new line characters with tr, awk, perl, sed or c/c++

How to remove new lines from files or pipe streams under Linux? This post contains simple examples that show how to use common Linux shell tools such as tr, awk/gawk, perl, sed and many others to delete new line characters. C and C++ source codes are also provided. They can be compiled into a binary tool that removes new lines. To get started, here is an example text file: days.txt.

Lets have a look at its content by running the following command from shell.

cat days.txt ©2007-2008 dsplabs.com.au

The output is shown below.

Mon
Tue
Wed
Thu
Fri
Sat
Sun

If the new lines were removed from days.txt then its content would look like this:

MonTueWedThuFriSatSun

If instead of simply removing the new line characters they were replaced with spaces then the output would look as follows.

Mon Tue Wed Thu Fri Sat Sun

Below are simple examples of two types: one to remove new lines and the other to replace them with spaces. The outputs of these examples are as shown above.

tr — remove new line characters with tr

Either of the following commands can be used to delete new lines using the tr utility.

tr -d '\n' < days.txt
cat days.txt | tr -d '\n'

While the new line characters can be replaced with spaces using the tr program as follows.

tr '\n' ' ' < days.txt
cat days.txt | tr '\n' ' '

awk — remove new line characters with awk or gawk

Either of the following commands can be used to delete new lines using awk or gawk.

awk '{ printf "%s", $0 }' days.txt
cat days.txt | awk '{ printf "%s", $0 }'

While the new line characters can be replaced with spaces using either of the following commands.

awk '{ printf "%s ", $0 }' days.txt
cat days.txt | awk '{ printf "%s ", $0 }' 

Please be careful when using printf's: never pass any unvalidated inputs to printf as a format string as it is a security risk. That is, use: { printf "%s ", $0 } but not: { printf $0 }.

perl — remove new line characters with perl

Either of the following commands can be used to delete new lines using perl.

perl -e 'while (<>) { chomp; print; }; exit;' days.txt
cat days.txt | perl -e 'while (<>) { chomp; print; }; exit;' 

While the new line characters can be replaced with spaces using either of the following commands.

perl -e 'while (<>) { chomp; print; print " "; }; exit;' days.txt
cat days.txt | perl -e 'while (<>) { chomp; print; print " "; }; exit;' 

Here are some more perl examples:

perl -p -e 's/\s+$/ /g' days.txt
cat days.txt | perl -p -e 's/\s+$/ /g'

The above regex approach is actually much simpler and neater!

sed — remove new line characters with sed

You could also use sed to remove new lines. The solution using sed is not very readable but it works. Use either of the following commands.

sed ':a;N;$!ba;s/\n//g' days.txt
cat days.txt | sed ':a;N;$!ba;s/\n//g' 

Or, to replace the new line characters with spaces use either of the following commands.

sed ':a;N;$!ba;s/\n/ /g' days.txt
cat days.txt | sed ':a;N;$!ba;s/\n/ /g' 

James of crypto.dsplabs.com.au suggested this, easier to read, sed solution on the Linux Blog Forums:

sed '{:q;N;s/\n//g;t q}' days.txt

C or C++ — remove new line characters with a C or C++ based binary

If you fancy none of the above approaches then create your own binary tool for this task using for example C or C++. Here is an example C source code: rmnl.c also courtesy of James.

#include 
 
int main(int argc, char *argv[]) {
    int k; // using int here and not char since EOF is an int
    while((k = getchar()) != EOF)
        if(k != 0x0D && k != 0x0A) putchar(k);
//        else putchar(' '); // replace newlines with spaces
    putchar(0x0A);
    return 0;
}

While my attempt at writing a new line removal tool in C++, rmnl.cpp, is listed below.

#include 
using namespace std;
 
int main() {
    string line;
    while( getline(cin, line) && !cin.eof() ) cout << line; // delete newlines
//  while( getline(cin, line) && !cin.eof() ) cout << line << ' '; // replace n with spaces
    cout << endl;
    return 0;
}

Another way in C++ is to use stream iterators like so:

#include 
#include 
 
int main () {
    std::remove_copy( std::istream_iterator( std::cin ),
                      std::istream_iterator(),
                      std::ostream_iterator( std::cout ),
                      '\n' );
}

To compile the above sources with GNU gcc or g++ compiler use the following commands, respectively.

gcc rmnl.c -o rmnl
g++ rmnl.cpp -o rmnl

Either of the following commands can be used to run the newly created rmnl binary assuming that rmnl is in your path.

cat days.txt | rmnl
rmnl < days.txt

If rmnl binary in not in your path variable, but it is in your current directory, then replace rmnl above with ./rmnl like so:

cat days.txt | ./rmnl
./rmnl < days.txt

Here is the compiled binary: rmnl and some information about it:

$>file rmnl
        rmnl: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
              dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

$>ldd rmnl
        linux-gate.so.1 =>  (0x0041d000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00b21000)
        libm.so.6 => /lib/libm.so.6 (0x0086c000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x009ec000)
        libc.so.6 => /lib/libc.so.6 (0x0072a000)
        /lib/ld-linux.so.2 (0x0070d000)

Other approaches for removal of new line characters

Replacing new lines with spaces using xargs:

xargs echo < days.txt
cat days.txt | xargs echo
cat days.txt | xargs echo -n

Removing new lines using Haskell, f.e. with the Glasgow Haskell Compiler, ghci:

cat days.txt | ghci -e "interact (concat . lines)"

Removing new lines or substituting them with spaces using sam text editor:

printf ", x/.*/nq" | sam -d days.txt 2>/dev/null
(echo ', x/\n/ c/ /'; echo ', p'; echo q) | sam -d days.txt 2>/dev/null

Or its stream version ssam:

ssam -n ', x/.*/ p' < days.txt 2>/dev/null
ssam ', x/\n/ c/ /' < days.txt 2>/dev/null

Another approach is to use GNU Bash like so:

while read; do echo -n "$REPLY "; done < days.txt

Or even simpler:

echo -n `cat days.txt`

Or, Python like this:

python -c 'import sys; print sys.stdin.read().replace("\n", " ")' < days.txt

Andreea

"Rome wasn't built in a day" !
  • | 120 articole

Nici un comentariu inca. Fii primul!
  • powered by Verysign