Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am looking into ways that my script (run as a non-privelaged) can determin whether or not a user on the system exists. I have come up with the following ideas:

1. Check in /etc/passwd for the account*
2. Check to see if a homedir exists using getpwnam.
3. Use A Text File To Log Users Added*

I have got 2. working, but i am looking for a better solution. I like the idea of a text file, because then i can manualy add usernames that are not on my system (as a kind of blacklist). The only worry i have, is that this system is goign to have around 5-6000 users, so i'm not sure if this solution will be slow.

Any Ideas?

Replies are listed 'Best First'.
Re: User Existance?
by Limbic~Region (Chancellor) on May 17, 2003 at 17:35 UTC
    Anonymous Monk,
    There are a few things in your post that make providing a solution difficult.

  • If you are worried about speed, it usually comes at the cost of space (in this case memory).
  • If you are worried about memory, it usually comes at a cost of speed.
  • You haven't indicated how long your script is going to be running. Sometimes it is worth it to spend time at the beginning of the script if subsequent lookups will outweigh that time.
  • You haven't indicated how often your script is going to be checking for users. It may be a moot point (premature optimization) to even be worried.
  • You haven't indicated if you only want to know if the user exists, or if you also want some other information such as their home directory (which you mentioned as a validation means).

    You also indicate that you would like to add users that are not on your system. It is for this reason that I suggest using a hash lookup (which is very fast) in this untested code.

    #!/usr/bin/perl -w use strict; my %User; { local @ARGV; @ARGV = qw(/etc/passwd /usr/defined/file); while (<>) { chomp; my @field = split /:/; unless (exists $User{$field[0]}) { $User{$field[0]} = \@field; } else { print "WARNING: Duplicate found for $field[0]\n"; } } } my $testuser = "blah"; if (exists $User{$testuser}) { print "$testuser exists on system\n"; print "Field 4 for $testuser is $User{$testuser}->[3]\n"; }
    Of course /user/defined/file should be set up like /etc/passwd if you care about the extra fields. Additionally, if you don't want to use /etc/passwd - I have shown the framework - you can have an "ok" %User and a "blacklist" %BadUser.

    Cheers - L~R

      Hi there, Thanks for that. This script is going to be run every time a new user registers (which is about 2 minutes) and the user will be waiting on the script, so however long it takes, is however long a user will be sitting at a blank webpage...
        Anonymous Monk,
        This might be a great deal more complicated than you think. It might also be a lot more simple.

        If the script is going to be called from scratch each time - then it seems moot to try and speed up a single lookup. You do have other problems to consider though.

        If the script will always be running, then it makes sense to have that information in memory if you can afford that memory. Then you run into the problem if multiple copies can be running at the same time. How do you share the memory? How do you avoid race conditions on files on the system.

        You very well may want to consider a database instead. I can't offer any suggestions for how to do this as I don't do web/database programming. Consider for instance updating your hash and your external file. What happens if the system crashes after the user has been added, but before you have had a chance to update your text file.

        There are a lot of sagely monks here that have probably done exactly what you are trying to do or something very similar. Take the time to explain the entire process and I am sure you will get the advice you need - if not working code.

        Cheers and goodluck - L~R

      why am i getting an error when trying to run this?
      if getpwnam($username) { $message = $message.'<p>The Chosen Username Already Exists.</p>'; $found_err = 1; }
      Also, if i run mad-hatters code and change the username to root, it does not display that the username is in use!
        What's the error? As for not complaining about root, perldoc -f getpwnam mentions that the return value is the uid corresponding to the name, or undef if the username does not exist. Root is uid 0, so you really need to check the definedness of the return value:
        if(defined(getpwnam($username))) ...
        Post the error message and we'll figure it out.

        --isotope
Re: User Existance?
by pzbagel (Chaplain) on May 17, 2003 at 18:01 UTC

    Well, let's benchmark and see. I created a /etc/passwd file with over 6000 users each named x and some number. Here I search for the user x1, x50, x1000, x3000, and x6000. I test with getpwnam and a regex. Each test is tried 1000 times and the output follows(All tests run on a PII-450 with 384MB RAM running Redhat 9):

    #!/usr/bin/perl -w use Benchmark; for $x (1, 50, 1000, 3000, 6000){ timethese (1000,{ "getpwnam${x}" => q { $there=getpwnam("x${x}") }, "regex${x}" => q {open(PASSWD,"/etc/passwd"); while(<PASSWD>) { if (/^x${x}:/){$alsothere=1}; last if $alsothere == 1; } close PASSWD;}, }); } Benchmark: timing 1000 iterations of getpwnam1, regex1... getpwnam1: 0 wallclock secs ( 0.13 usr + 0.05 sys = 0.18 CPU) @ 55 +55.56/s (n=1000) (warning: too few iterations for a reliable count) regex1: 0 wallclock secs ( 0.10 usr + 0.03 sys = 0.13 CPU) @ 76 +92.31/s (n=1000) (warning: too few iterations for a reliable count) Benchmark: timing 1000 iterations of getpwnam50, regex50... getpwnam50: 1 wallclock secs ( 0.29 usr + 0.02 sys = 0.31 CPU) @ 32 +25.81/s (n=1000) (warning: too few iterations for a reliable count) regex50: 0 wallclock secs ( 0.08 usr + 0.05 sys = 0.13 CPU) @ 76 +92.31/s (n=1000) (warning: too few iterations for a reliable count) Benchmark: timing 1000 iterations of getpwnam1000, regex1000... getpwnam1000: 2 wallclock secs ( 2.36 usr + 0.24 sys = 2.60 CPU) @ +384.62/s (n=1000) regex1000: 0 wallclock secs ( 0.10 usr + 0.04 sys = 0.14 CPU) @ 71 +42.86/s (n=1000) (warning: too few iterations for a reliable count) Benchmark: timing 1000 iterations of getpwnam3000, regex3000... getpwnam3000: 8 wallclock secs ( 7.26 usr + 0.43 sys = 7.69 CPU) @ +130.04/s (n=1000) regex3000: 1 wallclock secs ( 0.12 usr + 0.05 sys = 0.17 CPU) @ 58 +82.35/s (n=1000) (warning: too few iterations for a reliable count) Benchmark: timing 1000 iterations of getpwnam6000, regex6000... getpwnam6000: 16 wallclock secs (14.73 usr + 0.67 sys = 15.40 CPU) @ +64.94/s (n=1000) regex6000: 0 wallclock secs ( 0.13 usr + 0.06 sys = 0.19 CPU) @ 52 +63.16/s (n=1000) (warning: too few iterations for a reliable count)

    Suprisingly the regex wins hands down. Not too shabby. I don't think you need to worry about your script being slow at all.

    HTH

      I apologize, my while loop exit code was wrong. No wonder the regex beat getpwnam. Here is the correct code and results. Stick wirh getpwnam().

      #!/usr/bin/perl -w use Benchmark; for $x (1, 50, 1000, 3000, 6000){ timethese (1000,{ "getpwnam${x}" => q { $there=getpwnam("x${x}") }, "regex${x}" => q {open(PASSWD,"/etc/passwd"); while(<PASSWD>) { if (/^x${x}:/){$alsothere=1; last}; } close PASSWD;}, }); } Benchmark: timing 1000 iterations of getpwnam1, regex1... getpwnam1: 0 wallclock secs ( 0.12 usr + 0.06 sys = 0.18 CPU) @ 55 +55.56/s (n =1000) (warning: too few iterations for a reliable count) regex1: 0 wallclock secs ( 0.41 usr + 0.02 sys = 0.43 CPU) @ 23 +25.58/s (n =1000) Benchmark: timing 1000 iterations of getpwnam50, regex50... getpwnam50: 1 wallclock secs ( 0.28 usr + 0.02 sys = 0.30 CPU) @ 33 +33.33/s (n =1000) (warning: too few iterations for a reliable count) regex50: 1 wallclock secs ( 0.80 usr + 0.04 sys = 0.84 CPU) @ 11 +90.48/s (n =1000) Benchmark: timing 1000 iterations of getpwnam1000, regex1000... getpwnam1000: 2 wallclock secs ( 2.41 usr + 0.20 sys = 2.61 CPU) @ +383.14/s ( n=1000) regex1000: 9 wallclock secs ( 8.52 usr + 0.13 sys = 8.65 CPU) @ 11 +5.61/s (n= 1000) Benchmark: timing 1000 iterations of getpwnam3000, regex3000... getpwnam3000: 9 wallclock secs ( 7.36 usr + 0.38 sys = 7.74 CPU) @ +129.20/s ( n=1000) regex3000: 28 wallclock secs (24.94 usr + 0.54 sys = 25.48 CPU) @ 39 +.25/s (n=1 000) Benchmark: timing 1000 iterations of getpwnam6000, regex6000... getpwnam6000: 16 wallclock secs (14.44 usr + 0.77 sys = 15.21 CPU) @ +65.75/s (n =1000) regex6000: 50 wallclock secs (49.24 usr + 1.17 sys = 50.41 CPU) @ 19 +.84/s (n=1 000)

      Forgiveness Please.

      Beating myself with a bamboo cane as I speak.

      Thanks for that. Using the regexp method, how would i apply it to this custom made file:
      user1 user2 user3 user4 user5,pass5,email5,ip5 user5,pass5,email5,ip5 user5,pass5,email5,ip5 user5,pass5,email5,ip5 user5,pass5,email5,ip5
      As you can see, the file varies slightly. So i want the regexp to open up the file (located at /etc/adduser/user.log) and match $username against the first part (userX) of each line. If there is a match, print an error.

      Cheers peeps, The help is massivly appreciated :)
        Probably a better way to do this, but here's one way:

        Update So getpwnam is faster? Ok...

        my $username = "usernamehere"; print "Duplicate!\n" if getpwnam($username);
        my $username = "user1"; while (<DATA>) { chomp; my @info = split ","; foreach (@info) { print "Duplicate!\n" if /\A\Q$username\E\z/ } } __DATA__ user1 user2 user3 user4 user5,pass5,email5,ip5 user5,pass5,email5,ip5 user5,pass5,email5,ip5 user5,pass5,email5,ip5 user5,pass5,email5,ip5
Re: User Existance?
by mr_mischief (Monsignor) on May 17, 2003 at 19:38 UTC
    If you wish this to be easily scalable, use the tools the system supports or go with a real database.

    In many server setups, the users are not listed in /etc/passwd or /etc/shadow at all. It is even more common for a user who doesn't have shell access to not have a home directory either.

    Your system's getpwnam(), which Perl will call when you use the Perl function by the same name, knows how to get the info about your users whether you use /etc/passwd, an LDAP server, PAM through an SQL server backend, etc. If you don't want to use the system call, then store your user data in a real database -- or at least in a DBD::SQLite, DBD:CSV, or DBD::RAM database -- for future scalability and ease of changing the backend.

    One advantage of using a DBI midend for this is that you can change from a plain file to MySQL, PostgreSQL, DBD:LDAP, Oracle, Sybase, etc. with little or no change to the code. One advantage to using getpwnam() is that you can change your system from using password files to using LDAP to using whatever other authentication backend your system supports with little or no change to the code.

    Christopher E. Stith
    use coffee;
      Hi there, Thanks for that. How do i go about using getpwnam to determin whether a user exists? I am currently using it to tell if they have a home directory, and basing whether they exist on that, but it would be much better if i could just use it to find if the account exists.
        As I said earlier,
        my $username = "someusername"; print "Username '$username' exists!\n" if getpwnam($username);