Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to scrape some info from a large text file that has records separated by double carriage returns.

I'm basically trying to capture: 1. The ticket number: (for example, ACCOUNT-09782)
2. The CreatedDate field and value mentioned in the Remedy Case Data section (not the Remedy Account data section that appears earlier in the record).
3. The ClosedDate field and value mentioned in the Remedy Case Data section

I was thinking something along the lines of :

$/ = "\n\n"; while (<DATA>) { chomp; my $line = $_; $line =~/(ACCOUNT-\d{5}).*Remedy Case data.*(ClosedDate.*)\\.*(Created +Date.*)\\/; my $account = $1; my $closed = $2; my $created = $3; . . . }
Any tips much appreciated!
__DATA__ ACCOUNT-09782 comment \ Remedy Account data: \ IsDeleted = No\ Name = Singtel PMC\ Type = Existing Account\ Industry = Network Operator\ OwnerId = Foo Foobar (foo@whatever.com)\ CreatedDate = 2007-02-21 23:59:56\ \ Remedy Account Contact Role data: \ None found\ \ Remedy Case data: \ CaseNumber = 00013383\ ContactId = Sharon Tesla\ AccountId = Blah blah\ Type = Problem\ Status = Closed\ Reason = User create/change/delete from system\ IsVisibleInSelfService = 1\ Subject = Setup Login Credentials / Lando Franklin\ Priority = Medium\ Description = Please set up Lando Franklin with login credentials. Yo +u may mirror my profile. Her email address is 2326265@whatever.com. + Thank\ you!!!\ IsClosed = 1\ ClosedDate = 2007-10-26 16:31:53\ Actual Work Time = 1.00\ CreatedDate = 2007-10-23 21:10:10\ CreatedById = 00500000006onM1AAI\ Environment__c = Production\

Replies are listed 'Best First'.
Re: using regex with records
by JavaFan (Canon) on Jan 02, 2012 at 01:39 UTC
    Is there a hidden question? If your code is working, keep it. You're way better off with code that you've actually written yourself then copying lines from some random internet dude. I'd personally use indentation, but hey, if you prefer to not use indentation, more power to you.

    If you actually do have a problem (and at first glance, I've seen something that may be an issue), please provide us all the details, like what result are you getting, and what result are you getting. If you think I'm going to cut-and-paste your code, and run it just to see whether it works or not, and deduce a possible question from the result, you're mistaken.

Re: using regex with records
by Cristoforo (Curate) on Jan 02, 2012 at 02:09 UTC
    Instead of reading in a record at a time, this solution reads the file line by line. 'ACCOUNT-dddddd' precedes the closed date and the created date follows the closed date. So, if this data is uniform as presented, you could enter the two dates into the %data hash for the current account number.
    #!/usr/bin/perl use strict; use warnings; my (%data, $acct, $close_date, $create_date); while (<DATA>) { next if /^Remedy Account data:/ .. /^Remedy Case data:/; if (/^ACCOUNT-(\d+)/) { $acct = $1; } elsif (/^ClosedDate = ([-\d :]+)/) { $close_date = $1; } elsif (/^CreatedDate = ([-\d :]+)/) { $create_date = $1; $data{ $acct } = { closed => $close_date, created => $create_d +ate}; } } use Data::Dumper; print Dumper \%data;

    This prints

    $VAR1 = { '09782' => { 'created' => '2007-10-23 21:10:10', 'closed' => '2007-10-26 16:31:53' } };
    Hope this helps :-)
Re: using regex with records
by toolic (Bishop) on Jan 02, 2012 at 01:14 UTC
    The __DATA__ you posted does not have \n\n. It looks like you have a single backslash on a couple of lines.
      Sorry -- the \n\n isn't posted in the _DATA_ -- I'm just including one record for an example.
        .* does not match \n by default. You need to use //s. Read perlre.