Friday, April 29, 2011

How to read and process binary (base-2) logical representations from file

I have a file containing 800 lines like:

id       binary-coded-info
---------------------------
4657     001001101
4789     110111111
etc.

where each 0 or 1 stands for the presence of some feature. I want to read this file and do several bitwise logical operations on the binary-coded-info (the operations depend on user input and on info from a second file with 3000 lines). Then, these re-calculated binary-coded-info should be written to file (with trailing zeros, e.g.

4657     000110011
4789     110110000
etc.

How should I do this without writing my own base conversion routine? I am open for anything, also languages I do not know, like python, perl, etc. And it should work without compiling.

So far, I tried to script, awk and sed my way. This would mean (I think): batch read as base-2, convert to base-10, do bitwise operations depending on user input and second file, convert to base-2, add leading zeros and print. The usual console hints to use bc do not seem elegant because I have many lines in a file. The same holds for dc.sed. And awk doesnt seem to have an equivalent to flagging input as binary ( as in "echo $((2#101010))" ), and also, the printf trick doesn't work for binary. So, how would I do this most elegantly (or, at all, for that matter) ?

From stackoverflow
  • In C, you can use "strtol(str, NULL, 2)" to do the conversion, if you're already doing this in C.

    Something like the following would work:

    FILE* f = fopen("myfile.txt", "r");
    char line[1024];
    while ((line = fgets(line, sizeof(line), f))
    {
      char* p;
      long column1 = strtol(line, &p, 10);
      long column2 = strtol(p, &p, 2);
      ...
    }
    

    You'll need to add error-handling, etc.

  • Why convert them and use bit operations?

    In Python, you can do all of this as a string.

    for line in myFile:
        key, value = line.split()
        bits = list(value)
        # bits will be a list of 1-char strings ['1','0','1',...]
        # ... do stuff to bits ...
        print key, "".join( value )
    
    FogleBird : You can also convert the items in bits to bools with: bits = [n!='0' for n in bits] This way you can just say if bits[2]:...
  • In python, you can convert to binary using int, specifying base 2. ie:

    >>> int('110111111',2)
    447
    

    To convert back, there is a bin function in python2.6 or 3, but not in python2.5, so you'd need to implement it yourself (or use something like the below):

    def bin(x, width):
        return ''.join(str((x>>i)&1) for i in xrange(width))[::-1]
    
    >>> bin(447, 9)
    110111111
    

    (The width is the number of digits to pad to - your examples seem to be using 9-bit numbers.)

  • "Simple" Perl one liner (replace foo bar baz quux with your flags

    perl -le '@f=qw/foo bar baz quux/;$_&&print($f[$i]),$i++for split//, shift' 1011
    

    Here is a readable Perl version:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    #flags that can be turned on and off, the first 
    #flag is turned on/off by the left-most bit
    my @flags = (
        "flag one",
        "flag two",
        "flag three",
        "flag four",
        "flag five",
        "flag six",
        "flag seven",
        "flag eight",
    );
    
    #turn the command line argument into individual
    #ones and zeros
    my @bits = split //, shift; 
    
    #loop through the bits printing the flag that
    #goes with the bit if it is 1
    my $i = 0;
    for my $bit (@bits) {
        if ($bit) {
         print "$flags[$i]\n";
        }
        $i++;
    }
    
  • Expanding on Brian's answer:

    # Get rid of the '----' line for simplicity
    data_file = '''id       binary-coded-info
    4657     001001101
    4789     110111111
    '''
    import cStringIO
    import csv, sys
    data = [] # A list for the row dictionaries (we could just as easily have used the regular reader method)
    # Read in the file using the csv module
    # Each row will be a dictionary with keys corresponding to the first row
    reader = csv.DictReader(cStringIO.StringIO(data_file), delimiter=' ', skipinitialspace = True)
    try:
        for row in reader:
            data.append(row) # Add the row dictionary to the data list
    except csv.Error, e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
    # Do something with the bits
    first = int(data[0]['binary-coded-info'],2) # First bit string
    assert(first & int('00001101',2) == int('1101',2)) # Bitwise AND
    assert(first | int('00001101',2) == int('1001101',2)) # Bitwise OR
    assert(first ^ int('00001101',2) == int('1000000',2)) # Bitwise XOR
    assert(~first == int('110110010',2)) # Binary Ones Complement
    assert(first << 2 == int('100110100',2)) # Binary Left Shift
    assert(first >> 2 == int('000010011',2)) # Binary Right Shift
    

    See the python docs on expressions for more information and csv module for more information on the csv module.

  • Hi, thanks for all of your answers! I started learning python and will hopefully look into perl alongside. This was a great starting point! - Matt

0 comments:

Post a Comment