Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2016/07/28

Log your machines and Check your logs

"Logs" by Aapo Haapanen is licensed under CC BY 2.0

Our VMs were having problems last fall. Their connections to the file system would falter, causing a large number of processes sitting around waiting to write. The symptom we found was that the load averages would then rise incredibly high.

Like four-digits high.

So, I wrote something that would log load average once an hour. It was a convergence of lab need and an excuse to learn Log::Log4Perl. I also used Pushover to tell me when load average was greater than 20, as if I could do anything about it.

Below, Mail is a wrapper around Email::Sender::Simple and Pushover around LWP::UserAgent that handle the formatting and authentication. Neither are necessary for the logging.

#!/usr/bin/env perl

# checks for high load average on a host using uptime and
# reports using Pushover
# logs uptime, high or low, via Log4perl

# Also sends result of ps to email to start to indicate what's
# actually doing something

# https://metacpan.org/pod/Log::Log4perl

use feature qw{ say } ;
use strict ;
use warnings ;
use utf8 ;

use Data::Dumper ;
use DateTime ;
use IO::Interactive qw{ interactive } ;
use Log::Log4perl ;

use lib '/home/djacoby/lib' ;
use Mail ;
use Pushover ;

# my $host = $ENV{HOSTNAME} ;
my $host = `/bin/hostname -s ` ;
chomp $host ;

Log::Log4perl::init( '/home/djacoby/.log4perl.conf') ;
my $logger = Log::Log4perl::get_logger( 'varlogrant.uptime' );
my @uptime = check_uptime() ;
$logger->trace( qq{$host : $uptime[0] $uptime[1] $uptime[2]});

if ( $uptime[0] > 20 ) {
    my $ps = process_table() ;
    my $message ;
    $message->{ message } = "High Load Average on $host: " . join ' ' , @uptime ;
    my $out = pushover( $message ) ;
    #send_table( join "\n\n" , ( join ' ' , @uptime ) , $ps ) ;
    }

exit ;

sub check_uptime {
    my $program = '/usr/bin/uptime' ;
    my $uptime = qx{$program} ;
    my @num = map {s/,//;$_ } ( split /\s+/ , $uptime )[-3,-2,-1] ;
    return @num ;
    }

sub process_table {
    my $out = qx{/bin/ps -U gcore -u gcore u } ;
    return $out ;
    }

sub send_table {
    my $body = shift ;
    my $date = DateTime->now()->iso8601() ;
    my $msg;
    $msg->{ identity } = 'example' ;
    $msg->{ subject } = qq{High Load on $host: $date} ;
    $msg->{ to } = 'varlogrant@example.com' ;
    $msg->{ body } = $body ;
    $msg->{ attachments } = [] ;
    send_mail($msg) ;
    }

Eventually, those issues worked out. The evidence of file system hinkiness is that, on occasion, we try to save or open a file, it takes a few minutes — I have learned from experience that mkdir does not display atomicity — but we never see the high load averages and catastrophically long file access times of a few months ago.

But the logging never left my crontab.

I started looking at and playing with new things, and I wrote an API that allowed me to curl from several machines once an hour, and I would get Pushover notifications when machines were down.

(You can really thank Phil Sturgeon and his excellent book, Build APIs You Won't Hate, for that. I'm not quite there with my API, though. It'd probably make an interesting blog post, but it's built on pre-MVC technology.)

(And yes, I really like Pushover. In general, I turn off notifications for most apps and only pay attention to what I have Pushover tell me.)

Anyway, I'd get notifications telling me my web server is down, then pull out my phone and find the web server up and responsive. I'm putting that into MySQL, so a query told me that, on some hours, I'd get five heartbeats, some four, and some 3, so I was sure that the issue wasn't with the API.

I log and get Pushover notifications set at the @reboot section of my crontab, and that hadn't warned me recently, so I knew the machines were up, but not responding.

Then I remembered that I never stopped monitoring load averages, and started looking at those logs.


#!/usr/bin/env perl

# reads and parses the uptime log

use feature qw{ say } ;
use strict ; use warnings ; use utf8 ; use DateTime ; my $file = q{/home/jacoby/mnt/rcac/.uptime.log} ; if ( -f $file && open my $fh, '<', $file ) { my $data ; my $last ; my $obj ; while (<$fh>) { chomp ; my ( $date, $time, $server ) = split m{\s+} ; next unless $server ; # say $server ; my ( $year, $month, $day ) = split m{/}, $date ; my ( $hour, $minute, $second ) = split m{:}, $time ; my $latest = DateTime->new( year => $year, month => $month, day => $day, hour => $hour, minute => $minute, second => 0, time_zone => 'UTC', ) ; my $diff = 0 ; # next if $year != 2016 ; # next if $month < 7 ; my $ymd = $latest->ymd ; my $hms = $latest->hms ; next if $ymd !~ /^2016-07/ ; push @{ $obj->{$ymd}{$hms} }, $server ; } my @hosts = sort qw{ genomics genomics-test genomics-apps genomics-db } ; for my $y ( sort keys %$obj ) { my $day = $obj->{$y} ; for my $h ( sort keys %$day ) { my @list = @{ $obj->{$y}{$h} } ; my %list = map { $_ => 1 } @list ; my @down = grep { !$list{$_} } @hosts ; next if !scalar @down ; say join ' ', $y, $h, @down ; } } } __DATA__ two days results: 2016-07-23 01:00:00 genomics-test 2016-07-23 02:00:00 genomics-test 2016-07-23 08:00:00 genomics genomics-db 2016-07-23 13:00:00 genomics-apps 2016-07-23 16:00:00 genomics-apps 2016-07-23 18:00:00 genomics-db 2016-07-23 19:00:00 genomics-test 2016-07-23 21:00:00 genomics-apps 2016-07-24 05:00:00 genomics genomics-apps 2016-07-24 07:00:00 genomics 2016-07-24 10:00:00 genomics-apps 2016-07-24 10:01:00 genomics genomics-db genomics-test 2016-07-24 11:00:00 genomics-db 2016-07-24 13:00:00 genomics-apps 2016-07-24 18:00:00 genomics genomics-apps 2016-07-24 23:00:00 genomics genomics-apps genomics-db

We see above that of the four VMs I monitor, all four fail to log multiple times, and many times, three of four VMs fail to run their crontabs. Since I had something more solid than "Hey, that's funny", I went to my admins about this. Looks like VMs are failing to authenticate with the LDAP server. My admins are taking it up the chain.

So, beyond how I make and parse logs, which might not be the best examples you can find, the message here is that it's hard to identify a problem unless you're tracking it, and even tracking something else might help you identify a problem.

2016/07/27

Net::Twitter Cookbook: How to Tweet



The first line between Twitter Use and Twitter Obsession is TweetDeck. That's when the update-on-demand single-thread of the web page gives way to multiple constantly-updated streams of the stream-of-consciousness ramblings of the Internet.

That's the first line.

The second line between Twitter use and Twitter obsession is when you want to automate the work. If you're an R person, that's twitteR. If you work in Python, that's tweepy.

And, if you're like me, and you normally use Perl, we're talking Net::Twitter.

What follows is the simplest possible Net::Twitter program.

#!/usr/bin/env perl
use feature qw{ say } ;
use strict ;
use warnings ;
use Net::Twitter ;

# the consumer key and secret identify you as a service. 
# you register your service at https://apps.twitter.com/
# and receive the key and secret

# you really don't want to have these written into your script

my $consumer_key    = 'ckckckckckckckckckckck' ;
my $consumer_secret = 'cscscscscscscscscscscscscscscscscscscscs' ;

my $twit = Net::Twitter->new(
    traits          => [qw/API::RESTv1_1/],
    consumer_key    => $consumer_key,
    consumer_secret => $consumer_secret,
    ssl             => 1,
    ) ;

# the access token and secret identify you as a user.
# the registration process takes place below.
# the first time you run this program, you will not be authorized,
# and the program will give you a URL to open in a browser where
# you are already logged into twitter.

# you really don't want to have these written into your script

my $access_token = '1111111111-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ;
my $access_token_secret = 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz' ;

$twit->access_token($access_token) ;
$twit->access_token_secret($access_token_secret) ;

# everything interesting will occur inside this if statement
if ( $twit->authorized ) {
    if ( $twit->update('Hello World!') ) {
        say 'It worked!' ;
        }
    else {
        say 'Fail' ;
        }
    }
else {
    # You have no auth token
    # go to the auth website.
    # they'll ask you if you wanna do this, then give you a PIN
    # input it here and it'll register you.
    # then save your token vals.

    say "Authorize this app at ", $twit->get_authorization_url,
        ' and enter the PIN#' ;
    my $pin = <stdin> ;    # wait for input
    chomp $pin ;
    my ( $access_token, $access_token_secret, $user_id, $screen_name ) =
        $twit->request_access_token( verifier => $pin ) ;

    say 'The following lines need to be copied and pasted above' ;
    say $access_token ;
    say $access_token_secret ;
    }

Again, this is as simple as we can reasonably do, without pulling the keys into a separate file, which I, again, strongly recommend you do. (I personally use YAML as the way I store and restore data such as access tokens and consumer keys. I will demonstrate that in a later post.)