The branching structure can be quite complex with one branch coming off another. And, there is no way to detect this by simply looking at the file names since the branching structure is flattened in the file names.

Nor, is there any one file on all branches. There are dozens of applications in this source archive, and almost all branches are involved with just a single app.

In order to analyze the branching structure, I have to look at the tens of thousands of branching record. Each record is one file being branched in a single branching event. Creating one branch can create hundreds or even thousands of these records since there could be hundreds or thousands of files branched at a single time.

The best way to analyze the data is to simplify these records: If I can strip out the directory and file information from the branch names, I then get a simple fromBranch->toBranch record. Throw away the duplicates, and I have maybe a few dozen records. Build a data tree from these records, and I have the branching structure.

Where I am getting stuck is removing the directory and file names from the branch names. That's why I asked this particular question.

Even though looping isn't that efficient, I could have easily written a program with a loop in an hour or two, and I doubt the whole program would have taken more than a few minutes to run. I would have saved a lot of time in attempting to research this problem and requesting help. My problem would have been solved, I would have gotten the kudos of those around me, and at the end of the week, I would collect my paycheck. What I wouldn't have done is improve my Perl hashing skills.

Instead, I decided there has to be a better way to manipulate the whole string at once instead of looping a single character at a time. Given Perl's toolkit of bitwise operations and regular expressions, I figured there must be some way to XOR or AND the strings together to separate the chaff from the wheat.

Finding an answer improves my understanding of Perl. That's what I am really after.


In reply to Re^4: Selecting the difference between two strings by qazwart
in thread Selecting the difference between two strings by qazwart

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.