http://qs1969.pair.com?node_id=11123653
Bod's user image
User since: Nov 15, 2020 at 00:48 UTC (3 years ago)
Last here: Mar 01, 2024 at 23:49 UTC (20 hours ago)
Experience: 7731
Level:Parson (16)
Writeups: 1201
CPAN ID:BOD
Location:Coventry, UK
User's localtime: Mar 02, 2024 at 19:37 UTC
Scratchpad: View
Member of: pmdev, SiteDocClan
For this user:Search nodes
Watch for posts by this user

Long time amateur coder since growing up with a ZX Spectrum and BBC Micro...

Introduced to Perl in the early 1990's which quickly became the language of choice. Built many websites and backend applications using Perl including the sites for my property business:
Lets Delight - company site
Lets Stay - booking site
Also a few simple TK based desktop apps to speed things up.

Guilty of only learning what I need to get the job done - a recipe for propagating bad practice and difficult to maintain code...difficult for me so good luck to anyone else!

Now (Nov 2020) decided to improve my coding skills although I'm not really sure what "improve" means in this context. It seems Perl and best practice have come along way since I last checked in and my programming is approach is stuck in the last decade.

Onwards and upwards...

20th October 2021 - added to Saint in our Book 😀
2nd October 2022 - promoted to Priest
7th July 2023 - promoted to Vicar
15th December 2023 - promoted to Parson


Find me on LinkedIn, or on Twitter


CPAN Releases

Business::Stripe::WebCheckout
Business::Stripe::Subs‎crip‎tion
Business::Stripe::Webhook

AI::Embedding


Nodes I find helpful

Modules

Re: What do I use to release a module to CPAN for the first time?
Basic Testing Tutorial


Posts by Bod
Module to extract text from HTML in Seekers of Perl Wisdom
7 direct replies — Read more / Contribute
by Bod
on Feb 27, 2024 at 06:10

    I've been searching unsuccessfully for a module to extract just the text from an HTML webpage...
    Any suggestions?

    Ideally, I want to feed in a URL and return the page's text as plain text - no formatting, tags, etc.

    Even most of the text would suffice.

    I'm currently using HTML::TreeBuilder and just extracting the p tags which is not quite good enough:

    my $http = HTTP::Tiny->new; my $resp = $http->get($url); my $tree = HTML::TreeBuilder->new; $tree->parse($resp->{'content'}); my @paragraph = $tree->look_down('_tag', 'p'); print "Content-type: text/plain\n\n"; foreach my $line(@paragraph) { print $line->as_trimmed_text . "\n"; }

    I thought I'd found a solution with HTML::Extract. But when the sample code in the documentation doesn't compile I knew I was heading down a dead end!

    Do you know of a module to extract just the text?

Bot vs human User Agent strings in Seekers of Perl Wisdom
2 direct replies — Read more / Contribute
by Bod
on Feb 09, 2024 at 13:42

    We are wanting to supplement Google Analytics or a few reasons. Not least because we want to have site traffic information held in our own database so we can interrogate it automagically. We've created a database table to hold this data.

    Within the common header method, we've added some code that sets a cookie with a max age of 2 hours or refreshes the cookie if it is already set. If the cookie isn't already there, we write a row to the database table with the entry time, entry page, etc. If the cookie exists we update the row with exit page, exit time and bump the page count.

    This approach is working and it's been running for a week.

    But, it is reading about 11 times higher for site traffic than Google Analytics. I'd expect some discrepancy but not that much. Looking at the visits, we are getting a quite a few with the same or very close timestamp so my best guess is that it's a client that isn't accepting the cookie - perhaps a web crawler. To check this out, I've added IP and User Agent to the database table and sure enough these have a user agent of a crawler/bot.

    To solve this, I've added a condition to the line that writes the new line to the database:

    $dbh->do("INSERT INTO Site_Visit SET firstVisit = NOW(), lastPage = ?, + firstPage = ?, IP = ?, userAgent = ?, orsa = ?, orta = ?, Person_idP +erson = ?", undef, $ENV{'REQUEST_URI'}, $ENV{'REQUEST_URI'}, $ENV{'REMOTE_ADDR' +}, $ENV{'HTTP_USER_AGENT'}, $cookie{'orsa'}, $data{'orta'}, $user) unless $ENV{'HTTP_USER_AGENT'} =~ /bot/i or $ENV{'HTTP_USER_AGEN +T'} =~ /facebook/i or $ENV{'HTTP_USER_AGENT'} =~ /dataprovider/i;
    This seems to be working...but...the list of 'blocked' user agent strings could get quite large.

    Is there a more Perlish way to write this condition?

    I did think of putting them all in a database table for querying the user string against this table:

    SELECT ? IN ( SELECT userAgent FROM Blocked_Users )
    untested

    But, that would mean having the full and exact user agent strings instead of using a regexp.

    Note that I don't want to block crawlers, I just don't want them written to the site visit logs. This makes it quite difficult to Google because most articles are about blocking crawlers and bots from a website.

Is require still required? in Seekers of Perl Wisdom
6 direct replies — Read more / Contribute
by Bod
on Jan 31, 2024 at 17:48

    I've been looking at a question I asked 3 years ago in Refactoring webcode to use templates

    How things have changed since then. We've closed down the part of the business that I refactored all the code for, but I certainly learnt a lot in the process.

    One of the things I refactored and now do as standard is to have pretty much all common code in modules. Although they were common in my code until a few years back, I now never use the require keyword. This got me thinking...is require ever still required or is it obsolete in the modern world?

Persistent data in Seekers of Perl Wisdom
2 direct replies — Read more / Contribute
by Bod
on Jan 31, 2024 at 17:37

    I'm writing an XML Sitemap generator based around WWW::Crawl

    I want to record the priority to set each entry in the sitemap. My first thought was to use a CSV or similar text file but it could become huge and cumbersome. So what are the alternatives?

    I could write this server-side where I have a MariaDB instance running so storage is no problem. But I'm thinking I want to run it client side although I don't really know why. So my choice seems to be to hold the data in a Storable object. Run MariaDB, MySQL or similar locally or use DBD::SQLite from within Perl. No doubt there are other choices...

    Which would you do and why?

    What would you definitely avoid doing and why?

[OT] - Mutilated email addresses (not Perl) in Seekers of Perl Wisdom
1 direct reply — Read more / Contribute
by Bod
on Jan 15, 2024 at 14:08

    This has absolutely nothing to do with Perl but I know the vast range of knowledge and experience in The Monastery might save me a lot of searching...

    A client is using Ecwid e-commerce platform and wants to sent the data to a Go High Level (GHL) CRM. I've set up a webhook that notifies GHL when an order is placed. GHL then calls the Ecwid API to get email address, name, etc. Everything works fine except the email addresses get mutilated in the process!

    abc123@example.com -> abc123@example.comabc andrew.test@gmail.com -> andrew.test@gmail.comandrew.test a.test123@example.com -> a.test123@example.coma.test

    From the limited dataset I have it appears that everything from the start of the email address to the @ or the first numeric digit is appended to the end of the email address...

    Have you ever seen anything like this before?
    Do you have any suggestions before I spend too long trying to work out where the problem is and what is causing it, then dealing with two lots of customer support teams who are likely to blame each other.

Where to place POD in Seekers of Perl Wisdom
9 direct replies — Read more / Contribute
by Bod
on Jan 14, 2024 at 16:06

    I've seen various options for where to place POD:

    • Next to the methods
    • At the end of the module
    • In a separate file
    I've yet to see the POD at the start of the module but I guess that someone, somewhere, has done that!

    For a module I am going to publish on CPAN, I put the POD at the end of the file. I figure that the current version of the module will be 'finished' before it gets uploaded to CPAN.

    I assume that a separate POD file is only really helpful for large modules that have several packages that need to be covered by one POD file.

    But, for one project, I have a module of helper functions - this is a private module, not something I will publish. The module deals with things I either want to access in different places or that it makes sense to take out of the main scripts. Currently, this file has comments to document the methods, which is OK for a short module. But it's now exceeded 700 lines with a couple of dozen methods, and finding my way around it is becoming frustrating and time-consuming.

    I have started to create POD for this module to make it easier for me to find the method and syntax I need when I add some new functionality or update existing functionality.

    Because the module is never really 'finished' and gets added to whenever I need a new method, it seems sensible to add the POD next to each method. But I am sure there is more to it than this...am I opening myself up to future problems if I spread the POD through the module and document each method next to that method's code?

    Where do you place your POD, and why do you do it that way?

Outlier test fail in Seekers of Perl Wisdom
3 direct replies — Read more / Contribute
by Bod
on Jan 11, 2024 at 16:34

    I've been checking CPAN Tester results for one of my modules and noticed a single failure amidst a sea of passes. On checking it, is from Test::More is returning 0.99999999999999996 instead of 1.

    My best guess is that the machine doing the testing has a peculiar build somewhere that is affecting the floating point arithmetic.

    • Is that a fair assumption?
    • Is there anything I can do about that?
    • Is it worth doing anything about this?
Another Unicode/emoji question in Seekers of Perl Wisdom
5 direct replies — Read more / Contribute
by Bod
on Dec 21, 2023 at 19:31

    I realise that we've had a very long recent thread about Unicode...sorry if this drags the issue out further! However, I have little understanding of Unicode, and I rarely needed to know. But I could do with some help, please...

    My partner runs a dog care business and I have built the booking platform for her. Part of this provides a URL to Google Calendar to update our mobile calendars from the booking system. This all works fine but I'll ask a couple of more general questions at the end.

    I decided it would be nice to have a Dog Face Emoji as the first character of the title of the calendar entry. But I cannot get this to display. The script uses Template to generate the ICS feed for Google Calendar.

    BEGIN:VCALENDAR VERSION:2.0 PRODID:Pawsies Calendar 1.0//EN CALSCALE:GREGORIAN METHOD:PUBLISH [% FOREACH event IN day %]BEGIN:VEVENT SUMMARY:\x{e052} [% event.type %][% IF event.dog %] - [% event.dog %][ +% END %] [% IF event.note %]DESCRIPTION: [% event.note %] [% END %]UID:pawsies[% event.idBooking %][% event.id %]@pawsies.uk SEQUENCE:[% event.sequence %] DTSTAMP:[% event.dtstart %] DTSTART:[% event.dtstart %] DTEND:[% event.dtend %] URL:https://www.pawsies.uk/admin/calendar/?day=[% event.date %] COLOR:[% event.color %] END:VEVENT [% END %]END:VCALENDAR

    Everything works as expected except the emoji is printed as a literal \x{e052} instead of a 🐶 - I have use utf8; at the top of the script and the HTTP Header is:

    Content-type: text/calendar; charset=utf-8

    A couple of extra questions if you are experienced at feeding data to Google Calendar:
    1 - Is it possible to force Google to refresh the feed? Waiting for 24 hours or so makes debugging slow and tedious.
    2 - Is it possible to set the colour of the event from the ICS feed so we can have multiple colours from one feed? Currently, we have two feeds just to get two different colours. I've tried the COLOR property but it seems to be ignored.

    Updated to correct MIME type

CPAN Testers shows N/A in Seekers of Perl Wisdom
2 direct replies — Read more / Contribute
by Bod
on Dec 19, 2023 at 18:24

    I've released an updated version of AI::Embedding - version 1.1

    The CPAN documentation shows the latest version. However, CPAN Testers is showing N/A for all Perl versions, not just the ones before the minimum Perl version. This seems very strange. Have you come across this before?

    Also, when I try to upgrade to the latest version, it doesn't happen

    C:\Users\bod>cpanm AI::Embedding AI::Embedding is up to date. (1.01)

    I've checked that I haven't inadvertently specified a minimum Perl version that doesn't yet exist.

    Any suggestions for what to check that might be causing this?

Path to prove in Seekers of Perl Wisdom
2 direct replies — Read more / Contribute
by Bod
on Dec 18, 2023 at 18:12