in reply to getting text from HTML
I believe what you're seeing is the concept of the Document Object Model, where basically "text nodes" are anything that's not an element, including everything between <script> tags etc. One easy workaround is to clobber all the tags you don't want:
use Mojo::Base -strict; use open qw/:std :utf8/; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new( max_redirects => 3 ); my $res = $ua->get('http://www.spacex.com/webcast')->result; die $res->message unless $res->is_success; my $dom = $res->dom; $dom->find('script, style')->map('remove'); my $text = $dom->at('body')->all_text; 1 while $text =~ s/\s{2,}/ /g; say $text; __END__ Jump to navigation Falcon 9 Falcon Heavy Dragon Starship Updates About SpaceX Careers Shop You are hereHome STARLINK MISSION On Wednesday, April 22 at 3:30 +p.m. EDT, or 19:30 p.m. UTC, SpaceX launched its seventh Starlink mis +sion. Falcon 9 lifted off from Launch Complex 39A (LC-39A) at NASA’s +Kennedy Space Center in Florida.Falcon 9’s first stage previously sup +ported Crew Dragon’s first flight to the International Space Station, + launch of the RADARSAT Constellation Mission, and the fourth Starlin +k mission. Following stage separation, SpaceX landed Falcon 9’s first + stage on the “Of Course I Still Love You” droneship, which was stati +oned in the Atlantic Ocean. Falcon 9’s fairing previously supported t +he AMOS-17 mission. You can watch a replay of the launch below and le +arn more about the mission here. | Twitter YouTube Flickr Instagram P +rivacy © 2020 Space Exploration Technologies Corp.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: getting text from HTML
by IB2017 (Pilgrim) on May 04, 2020 at 08:43 UTC | |
by haukex (Archbishop) on May 04, 2020 at 09:46 UTC | |
|
Re^2: getting text from HTML
by IB2017 (Pilgrim) on May 03, 2020 at 22:58 UTC |