in reply to Merging multiple variations of a serial number

Huge thanks to everyone who has commented! There are a lot of good suggestions and also some questions which I will do my best to work through.

The serial numbers are used on a piece of equipment that we source from an external company and deploy to engineers. We don't have any control over serial number generation and I don't know if there is any correlation between the core serial and the check digit.

My initial issue stems from the fact that since 2018, different people/departments have received, built, and added to, the various files using different formats of the serial numbers. This is likely what has caused issues such as duplicate entries and incorrect formats (Like the 13 digit version 2). It's a "Too many cooks spoil the broth" scenario which I have now been tasked with trying to fix. Going forward we want to implement unit tracking using any of the possible formats so a database looks like the right way to go.

Check digits are only over in the range 0-9.

Month numbers are 01 - 12 but as is apparent in our data, some entries miss the leading zero off of single digit months.

I'm going to go ahead and try some of the suggested solutions as the logic appears to be quite straight forward.

  • Comment on Re: Merging multiple variations of a serial number

Replies are listed 'Best First'.
Re^2: Merging multiple variations of a serial number
by LanX (Saint) on Jul 29, 2022 at 10:02 UTC
    > The serial numbers are used on a piece of equipment that we source from an external company and deploy to engineers. We don't have any control over serial number generation

    It really is important to know, if the serial numbers stay stable like IDs and how many there are.

    You can't seriously have 1e10 pieces of equipment°, so a lookup hash with correct numbers will help you filter out impossible matches.

    > different people/departments have received, built, and added to, the various files using different formats of the serial numbers.

    As I already said, it is very probable, that the effects of those people can be localized to certain files and time periods.

    Creating a histogram for each file will help you determine which 13number format was used and for which timestamps.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) 1e11 even with the check digit.

      Creating a histogram for each file will help you determine which 13number format was used and for which timestamps.

      Frequency analysis is definitely a good plan in this scenario. (++)


      🦛

      > It really is important to know, if the serial numbers stay stable like IDs and how many there are.

      They should now stay stable in the current format. We currently have around 3,500 units/serial numbers