How/where should I post the files? I need to heavily trim them, but the two of interest would be the repomd.cml which has the cpeid in it and the names of the other xml files of the group. and the 'primary.xml' which has a list of all the rpms in the release.

out of the 4 repos released / day, (oss/non-oss/src-oss/src-non-oss) I've been using src-non-oss for recent test runs since it's the shortest. with repomd.xml at 8869 and primary.xml at 41033 bytes.


Vs. for 'oss', ( repomd's are about the same), but primary.xml varying alot depending on an individual update, but say, with the same date as src-non-oss, 162MB.primary.xml has 3.2M lines and 67370 different rpm descriptions.

From beginning of repomd.xml through its cpeid entry, and including the listing for the primary.xml file. I'll list here:

<?xml version="1.0" encoding="UTF-8"?> <repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http:// +linux.duke.edu/metadata/rpm"> <revision>1625990264</revision> <tags> <content>pool</content> <content>gpg-pubkey-3dbdc284-53674dd4.asc?fpr=22C07BA534178CD02EFE +22AAB88B2FD43DBDC284</content> <content>gpg-pubkey-39db7c82-5f68629b.asc?fpr=FEAB502539D846DB2C09 +61CA70AF9E8139DB7C82</content> <content>gpg-pubkey-307e3d54-5aaa90a5.asc?fpr=4E98E67519D98DC7362A +5990E3A5C360307E3D54</content> <repo>obsproduct://build.opensuse.org/openSUSE:Factory/openSUSE/20 +210710/i586</repo> <repo>obsproduct://build.opensuse.org/openSUSE:Factory/openSUSE/20 +210710/x86_64</repo> <distro cpeid="cpe:/o:opensuse:opensuse:20210710">openSUSE Tumblew +eed</distro> </tags> <data type="primary"> <checksum type="sha256">60ac248489df31c61277a6872279561730d27d51b3 +bb7d15368d75b69d1ac80c</checksum> <open-checksum type="sha256">d101bad38f3a987c9a790f927031cfcc68c15 +98b4d6f329447c6fe338cfb7128</open-checksum> <location href="repodata/60ac248489df31c61277a6872279561730d27d51b +3bb7d15368d75b69d1ac80c-primary.xml.gz"/> <timestamp>1625990264</timestamp> <size>18659084</size> <open-size>171435824</open-size> </data>

That gives me my distro version or date (the cpeid number) and the location of the first primary.xml file of rpms that have changed since "yesterday" (previous release).

The header and 1st package of a primary for an oss release are below:

<?xml version="1.0" encoding="UTF-8"?> <metadata xmlns="http://linux.duke.edu/metadata/common" xmlns:rpm="htt +p://linux.duke.edu/metadata/rpm" packages="66746"> <package type="rpm"> <name>2048-cli</name> <arch>i586</arch> <version epoch="0" ver="0.9.1+git.20181118" rel="1.11"/> <checksum type="sha256" pkgid="YES">310f3c8e912923da08eab8debafd6fc0 +3afe9e1ae97304bcd029658959e099d0</checksum> <summary>A CLI version of the "2048" game</summary> <description>2048 is a mathematics-based puzzle game where the playe +r has to slide tiles on a grid to combine them and create a tile with the number 2048 +. The player has to merge the similar number tiles (2n) by moving the ar +row keys in four different directions. When two tiles with the same number touch, they will merge into one.</description> <packager>https://bugs.opensuse.org</packager> <url>https://github.com/tiehuis/2048-cli</url> <time file="1616702669" build="1616702650"/> <size package="20045" installed="26081" archive="27080"/> <location href="i586/2048-cli-0.9.1+git.20181118-1.11.i586.rpm"/> <format> <rpm:license>MIT</rpm:license> <rpm:vendor>openSUSE</rpm:vendor> <rpm:group>Amusements/Games/Strategy/Other</rpm:group> <rpm:buildhost>lamb25</rpm:buildhost> <rpm:sourcerpm>2048-cli-0.9.1+git.20181118-1.11.src.rpm</rpm:sourc +erpm> <rpm:header-range start="5096" end="9153"/> <rpm:provides> <rpm:entry name="2048-cli" flags="EQ" epoch="0" ver="0.9.1+git.2 +0181118" rel="1.11"/> <rpm:entry name="2048-cli(x86-32)" flags="EQ" epoch="0" ver="0.9 +.1+git.20181118" rel="1.11"/> </rpm:provides> <rpm:requires> <rpm:entry name="libncurses.so.6"/> <rpm:entry name="libncurses.so.6(NCURSEST6_5.7.20081102)"/> <rpm:entry name="libtinfo.so.6"/> <rpm:entry name="libtinfo.so.6(NCURSES6_TINFO_5.0.19991023)"/> <rpm:entry name="libtinfo.so.6(NCURSES6_TINFO_5.7.20081102)"/> <rpm:entry name="libc.so.6(GLIBC_2.7)"/> </rpm:requires> <file>/usr/bin/2048-cli</file> </format> </package>

I'm NOT include most fields -- only ones I need for downloading and sorting.

I'm also only downloading archs useful to me. as determined by my constants section:

use constant RepoNames => qw(oss non-oss src-oss src-non-oss); use constant ArchNames => qw(noarch nosrc src x86_64); use constant RepoMDFile => 'repomd.xml'; use constant Wanted_Names => {qw(susedata 1 appdata 1 other 1 filelists 1 primary 1 appdata-ic +ons 1)}; use constant RType => { map { $_ => $_ } @{[RepoNames]} }; use constant Archt => { map { $_ => $_ } @{[ArchNames]} }; sub Repo_valid($) { my $p = shift if HASH $_[0]; ErV RType, shift } sub Arch_valid($) { my $p = shift if HASH $_[0]; ErV Archt, shift; } our @EXPORT; use mem(@EXPORT = (qw( RType Archt Repo_valid Arch_valid RepoMDFile Wanted_Names ) ) ); use Xporter;
Hopefully that gives at least a bit more context. Can add more later if wanted, but already feel like I'm overwhelming....

In reply to Re^2: How to walk through convoluted data? by perl-diddler
in thread How to walk through convoluted data? by perl-diddler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.