>Thread drift is allowed. For good netiquette, also change the title in the reply form.
OK then. So and first of all I am not a staff developer of Wikipedia, just one of volunteer editors. We needed a script for a set of users willing to get notifications about upcoming internal elections, acting like a daemon (checking every 24 hrs some place and notify if there is something).
tools.wmflabs.org gives you anything of your choice (Perl, PHP, Python, C#, you name it) in latest stable versions. I don't like Python, have no idea about C#, remember something about Perl - so I did Perl.

This is to make it clear that the list=allusers query has nothing to do with the actual task. It is only to show the exact data format to query and to expect. The full MediaWiki API help is here: https://ru.wikipedia.org/w/api.php?action=help&uselang=en

Now... The script has to be able to handle Unicode/UTF-8/whatever literals in the code: so I needed use utf8; It also has to output it in HTML- so I needed binmode STDOUT, ':utf8';
It also has to receive JSON, decode it, slice it, string compare/replace and all other thing - all with Cyrillic in them. I dropped all (en|de)coding things called in this thread unnecessary so came to:

#!/usr/bin/perl

use strict;
use warnings;

use utf8;
use Encode;

use LWP::UserAgent;
use HTTP::Request::Common;
use HTTP::Cookies;

use JSON;

my $browser = LWP::UserAgent->new;

# they ask to use descriptive user-agent - not LWP defaults
# w:ru:User:Bot_of_the_Seven = https://ru.wikipedia.org/wiki/Участник:Bot_of_the_Seven
$browser->agent('w:ru:User:Bot_of_the_Seven (LWP like Gecko) We come in peace');

# I need cookies exchange enabled for auth
# here is doesn't matter but to give full LWP picture:
$browser->cookie_jar({});

# a very few queries can be done by GET - most of MediaWiki require POST
# so I do POST all around rather then remember where GET is allowed or not:
my $response = $browser->request(POST 'https://ru.wikipedia.org/w/api.php',
        {
            'format' => 'json',
            'formatversion' => 2,
            'errorformat' => 'bc',
                
            'action' => 'query',
            'list' => 'allusers',
            'auactiveusers' => 1,
            'aulimit' => 10,
            'aufrom' => 'Б'
        }
    );

my $data = decode_json($response->content);

my $test_scalar = $data->{query}->{allusers}[0]->{name};

my @test_array = @{$data->{query}->{allusers}}[0..2];

display_html($test_array[1]->{name});


sub display_html {

    my @html = (
        '<!DOCTYPE html>',
        '<html>',
        '<head>',
        '<meta charset="UTF-8">',
        '<title>Мой тест</title>',
        '</head>',
        '<body>',
        shift // 'Статус — ОК', # soft OR: 0 and empty string accepted
        '</body>',
        '</html>'
    );
    
    # to avoid "wide character" warnings:
    binmode STDOUT, ':utf8';
    
    print "Content-Type: text/html; charset=utf-8\n\n";
    
    print join("\n", @html);
}

Is there anything that might go badly wrong concerning Cyrillic in Unicode/UTF-8?


In reply to Proper Unicode handling in Perl by VK
in thread Is there some universal Unicode+UTF8 switch? by VK

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.