Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2010/05/12

Refactor Question relating to an ancient protocol for transferring files

I work in a lab, and we have lots of instruments that do highly-complex technical things. Sometimes they spit out gobs and gobs of data, and sometimes they don't. We have lots of computers connected to those instrument machines. We call them instrument machines, and when we have an instrument that spits out lots of data, we have an FTP server that allows the main server to come it at scheduled times and check for new files.

We had a program that would go and get all the data off said machine in a serial manner. It looked pretty much like this:
my $ftp = Net::FTP->new(login stuff) ;
$ftp->binary() ;
my @list = $ftp->ls() ;
my @files ;

for ( @list ) {
    if( ! $downloaded{$_} ) {
        $ftp->get($_) ;
        push @files , $_ ;
        }
    }

for my $file ( @files ) {
   open my $fh , '<' , $file ;
   while (<$fh>) {
       chomp ;
       my @line = split m{\t} , $_ ;
       dump_into_database( $line[0] , $line[3] , $line[27] , $line[491] ) ;
       }
   $downloaded{ $file }++ ;
   }
Yes, it has been substantially simplified.

There are two problems with this program. The original, not the simplified version shown here. First, there's no clear separation between sections, which makes figuring out what each specific part is doing awful confusing. Second, it was modified and hacked upon several times by a clueless idiot.

(waves hands.)

My thought is to have everything in subroutines, each small enough to fit in my head and screen-height window. And I am doing it now.

sub handle_ftp {
    my $ftp = Net::FTP->new(login stuff) ;
    $ftp->binary() ;
    my @list = $ftp->ls() ;
    my @files ;
    for ( @list ) {
        download( $ftp , $_ ) ;
        }
    }

sub download {
    my ( $ftp , $file ) = @_ ;
    if ( !$downloaded{ $file } {
        $ftp->get( $file ) ;
        split( $file ) ;
        $downloaded{ $file } ++ ;
        }
    }

sub split {
    my ( $ftp , $file ) = @_ ;
    open my $fh , '<' , $file ;
    while (<$fh>) {
        chomp ;
        my @line = split m{\t} , $_ ;
        dump_into_database( $line[0] , $line[3] , $line[27] , $line[491] ) ;
        }
   }
My concern is that, previously, we're going through the FTP session as fast as possible, while in the second case, the FTP session is going until the last split is done. Here, where my car is parked further away from the instrument machine than the server room is, and it's all reasonable small text files, so run time shouldn't bee too complex. There' actually three variations on dump_into_database, including one that generates an image in R.

I think I like the second better because it seems more recursive, even through it really isn't.

Without dumping the whole thing here, does anyone see a problem with my approach?

No comments:

Post a Comment