ofer has asked for the wisdom of the Perl Monks concerning the following question:

Hi, My problem is I want to use the getdents() syscall from perl to receive a file list since the h2ph interface feels a bit awkward to me I'm trying to use Inline C function and receive the contents of the directory back into a variable since speed is an issue. this is my first time using Inline C in perl and there isn't a lot of documentation about it's usage and it seems that Inline has some internals variables to push the data back into perl ideally as a scalar.
#!/usr/bin/env perl use 5.012; use strict; use warnings; my $str = &listfiles('<Directory>'); print $str; use Inline C => <<'END_OF_C_CODE'; #include <dirent.h> /* Defines DT_* constants */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/syscall.h> #define handle_error(msg) do { perror(msg); exit(EXIT_FAILURE); } whil +e (0) struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; #define BUF_SIZE 1024*1024*5 int listfiles(int argc, char *argv[]) { int fd, nread; char buf[BUF_SIZE]; struct linux_dirent *d; int bpos; char d_type; fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY); if (fd == -1) handle_error("open"); for ( ; ; ) { nread = syscall(SYS_getdents, fd, buf, BUF_SIZE); if (nread == -1) handle_error("getdents"); if (nread == 0) break; for (bpos = 0; bpos < nread;) { d = (struct linux_dirent *) (buf + bpos); if (d->d_ino != 0) printf("%s\n", (char *) d->d_name); bpos += d->d_reclen; } } exit(EXIT_SUCCESS); } END_OF_C_CODE

Replies are listed 'Best First'.
Re: using syscalls in perl through inline c
by afoken (Chancellor) on Jan 16, 2017 at 14:17 UTC
      the reason behind using inline c is speed readdir is not fast enough to read millions of files.

        Have you compared the C code to pure Perl code? Most often, it's the I/O of the disk subsystems that are the bottleneck, not the code itself.

        I'm not saying that's absolute in all cases, but it may be prudent to do a one-off test between the two before making any decisions.

      See using Linux getdents syscall and failing to use getdents system call on Linux for other futile attempts.

      No, it is not futile. See Linux::NFS::BigDir.

      It's harder to solve that way, but performance difference on the specific case I tested (large directories over NFS version 3) by using readdir and going down to the getdents can be measured in hours.

      It might be a problem with the NFS server configuration? Yes, but I didn't have time to search for it or even root access to attempt.

      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill
Re: using syscalls in perl through inline c
by stevieb (Canon) on Jan 16, 2017 at 15:09 UTC

    You didn't really ask a question, so I assumed that you were wondering how to fix the C code to get it to do what you want. It looks like you copy/pasted code from different sources or examples online (by the looks of it). In your listfiles() function, you're using argc, argv which is wrong (those are command line arguments, which belong to a main() function in a program. They don't have any use inside of a non-entry-point library, which is essentially what you've got here). I changed the parameter list to a single const char * dir, then changed the argv[1] to dir. I've inserted comments in the C code of what was changed:

    #!/usr/bin/env perl use 5.012; use strict; use warnings; my $str = listfiles('/home/steve/test'); print $str; use Inline C => <<'END_OF_C_CODE'; #include <dirent.h> /* Defines DT_* constants */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/syscall.h> #define handle_error(msg) do { perror(msg); exit(EXIT_FAILURE); } whil +e (0) struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; #define BUF_SIZE 1024*1024*5 /**** CHANGED LINE BELOW ***/ int listfiles(const char * dir) { int fd, nread; char buf[BUF_SIZE]; struct linux_dirent *d; int bpos; char d_type; /**** CHANGED LINE BELOW ****/ fd = open(dir, O_RDONLY | O_DIRECTORY); if (fd == -1) handle_error("open"); for ( ; ; ) { nread = syscall(SYS_getdents, fd, buf, BUF_SIZE); if (nread == -1) handle_error("getdents"); if (nread == 0) break; for (bpos = 0; bpos < nread;) { d = (struct linux_dirent *) (buf + bpos); if (d->d_ino != 0) printf("%s\n", (char *) d->d_name); bpos += d->d_reclen; } } exit(EXIT_SUCCESS); } END_OF_C_CODE

    Output:

    one.txt two.txt

    Now, if you want to be able to send in a directory as an argument, do it in Perl:

    if (! defined $ARGV[0]){ print "Usage: script.pl <directory>\n"; exit; } my $dir = $ARGV[0]; my $str = listfiles($dir); ...

    You may want to do extra argument checking in Perl, or you can just let the back end report any issues with bad dir names.

      actually what I was struggling with is how to get the data from the syscall as a scalar I played so much with the original code I just pasted it from the source without my changes by mistake... Thanks
        This is what I ended up with the C function returns a reference to a hash with all files listed I might add a second for loop so it'll list the file recursively, any suggestions welcome.
        #!/usr/bin/env perl use 5.012; use strict; use warnings; use Data::Dumper qw(Dumper); my $str = &listfiles('<directory>'); print Dumper \$str; use Inline C => <<'END_OF_C_CODE'; #include <dirent.h> /* Defines DT_* constants */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/stat.h> #include <sys/syscall.h> #define handle_error(msg) do { perror(msg); exit(EXIT_FAILURE); } whil +e (0) struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; #define BUF_SIZE 1024*1024*5 SV* listfiles(const char * dir) { int fd, nread; char buf[BUF_SIZE]; struct linux_dirent *d; int bpos; char d_type; char inode; HV* hash = newHV(); fd = open(dir, O_RDONLY | O_DIRECTORY); if (fd == -1) handle_error("open"); for ( ; ; ) { nread = syscall(SYS_getdents, fd, buf, BUF_SIZE); if (nread == -1) handle_error("getdents"); if (nread == 0) break; for (bpos = 0; bpos < nread;) { d = (struct linux_dirent *) (buf + bpos); if (d->d_ino != 0) { size_t nbytes = snprintf(NULL, 0, "%d", d->d_ino) + 1; char *inode = malloc(nbytes); snprintf(inode, nbytes, "%d", d->d_ino); hv_store(hash, inode, strlen(inode), newSVpvf("%s",(cha +r *) d->d_name, 0), 0); } bpos += d->d_reclen; } } return newRV_noinc((SV*) hash); } END_OF_C_CODE