Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to extract (in csv, txt or any human-readable format), the data from the table results in this website: https://desmace.com/provincia/asturias/
According to table headings, I want my script to ask for two dates for the top letf column "FECHA DEL TRAMITE" (prodedure date), and set two fixed dates (always 01/01/1900 and 31/12/2000) in the "FECHA MATRICULA" (plate date) column.
You can also add more columns to the table by clicking in "Columnas" at the topright. By doing this, I add "Prov. Matriculacion", and I also want my script to ask input for this field.
Then, according to this criteria, table results are displayed, and this is what I want to store in csv, txt or similar.
I have this code so far...it runs ok, but does not properly store the data.
I would be super grateful if I can get some help to make the script work.
Many thanks in advance!use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; use Text::CSV; # URL of the website my $url = 'https://desmace.com/provincia/asturias/'; # Filter criteria my $fecha_tramite_min = '24/11/2024'; my $fecha_tramite_max = '27/11/2024'; my $fecha_matricula_min = '01/01/1900'; my $fecha_matricula_max = '31/12/2000'; my $prov_matriculacion_filtro = 'ASTURIAS'; # HTML page content my $html = get($url) or die "No se pudo acceder a la URL: $!"; # Parse HTML my $tree = HTML::TreeBuilder->new; $tree->parse($html); # Open csv file my $csv = Text::CSV->new({ binary => 1, eol => "\n" }); open my $fh, ">", "resultados.csv" or die "No se pudo crear el archivo + CSV: $!"; # Headings for csv file $csv->print($fh, ['FECHA DEL TRÁMITE', 'TRÁMITE', 'FECHA MATRÍCULA', ' +MARCA', 'MODELO', 'BASTIDOR (VIN)', 'PROV. MATRICULACIÓN']); # This is the table that contains data my @rows = $tree->look_down(_tag => 'tr'); foreach my $row (@rows) { my @columns = $row->look_down(_tag => 'td'); my @data; # Extract values from columns foreach my $col (@columns) { push @data, $col->as_text; } # Row filtering if (@data >= 9) { my ($fecha_tramite, $fecha_matricula, $prov_matriculacion) = @ +data[0, 2, 7]; if ($fecha_tramite ge $fecha_tramite_min && $fecha_tramite le +$fecha_tramite_max && $fecha_matricula ge $fecha_matricula_min && $fecha_matricu +la le $fecha_matricula_max && $prov_matriculacion eq $prov_matriculacion_filtro) { $csv->print($fh, \@data); } } } # Close file close $fh; $tree->delete; print "Datos filtrados guardados en 'resultados.csv'.\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Script to scrap data
by marto (Cardinal) on Dec 02, 2024 at 10:29 UTC | |
by harangzsolt33 (Deacon) on Dec 02, 2024 at 13:25 UTC | |
by marto (Cardinal) on Dec 02, 2024 at 13:40 UTC | |
by harangzsolt33 (Deacon) on Dec 02, 2024 at 17:36 UTC | |
by marto (Cardinal) on Dec 03, 2024 at 10:18 UTC | |
by Anonymous Monk on Dec 07, 2024 at 20:12 UTC | |
|
Re: Script to scrape data
by hippo (Archbishop) on Dec 02, 2024 at 09:42 UTC | |
|
Re: Script to scrap data
by soonix (Chancellor) on Dec 02, 2024 at 15:14 UTC | |
|
Re: Script to scrap data
by 1nickt (Canon) on Dec 02, 2024 at 11:17 UTC | |
|
Re: Script to scrap data
by harangzsolt33 (Deacon) on Dec 02, 2024 at 01:22 UTC |