ls character range wildcard oddity?
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening. Dir with two files: brett@spider /tmp/test $ \ls apple Berry This is what I'd expect to see: brett@spider /tmp/test $ \ls [a-b]* apple This makes no sense. Is this a bug?: *brett@spider /tmp/test $ \ls [a-c]*apple Berry*
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory brett@spider /tmp/test $ \ls a* apple brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP] Thanks, Brett
b != B HTH, Ted On Fri, Jun 19, 2015 at 9:29 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening.
Dir with two files:
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
brett@spider /tmp/test $ \ls [a-c]* apple Berry
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple
brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory
brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory
brett@spider /tmp/test $ \ls a* apple
brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory
brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory
brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls
brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP]
Thanks, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Disregard last transmission. I shoulda read your post more carefully. Ted On Fri, Jun 19, 2015 at 9:44 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
b != B
HTH, Ted
On Fri, Jun 19, 2015 at 9:29 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening.
Dir with two files:
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
brett@spider /tmp/test $ \ls [a-c]* apple Berry
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple
brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory
brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory
brett@spider /tmp/test $ \ls a* apple
brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory
brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory
brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls
brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP]
Thanks, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
It looks like a locale issue. In the old "C" locale, sorting a list put all cap letters before all lowercase letters. In the "en_US.UTF-8" locale, apparently the Ubuntu default, they're sorted as in a regular dictionary, regardless of caps. For example: ls apple berry Berry charlie Hope that's more helpful than my first response! Ted On Fri, Jun 19, 2015 at 9:46 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
Disregard last transmission. I shoulda read your post more carefully.
Ted
On Fri, Jun 19, 2015 at 9:44 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
b != B
HTH, Ted
On Fri, Jun 19, 2015 at 9:29 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening.
Dir with two files:
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
brett@spider /tmp/test $ \ls [a-c]* apple Berry
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple
brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory
brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory
brett@spider /tmp/test $ \ls a* apple
brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory
brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory
brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls
brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP]
Thanks, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Well, this may not get you only a subset of lower-case letters, but it does at least limit to lower case: [[:lower:]]. More details below. e.g.: ls [[:lower:]]* Regards, Chris See also: http://www.tldp.org/LDP/GNU-Linux-Tools-Summar/html/x11655.htm#STANDARD-WILD... man 7 glob ---------- Character classes and internationalization Of course ranges were originally meant to be ASCII ranges, so that "[ -%]" stands for "[ !"#$%]" and "[a-z]" stands for "any lowercase letter". Some UNIX implementations generalized this so that a range X-Y stands for the set of characters with code between the codes for X and for Y. However, this requires the user to know the character cod‐ ing in use on the local system, and moreover, is not convenient if the collating sequence for the local alphabet differs from the ordering of the character codes. Therefore, POSIX extended the bracket notation greatly, both for wildcard patterns and for regular expressions. In the above we saw three types of items that can occur in a bracket expression: namely (i) the negation, (ii) explicit single characters, and (iii) ranges. POSIX specifies ranges in an internationally more useful way and adds three more types: (iii) Ranges X-Y comprise all characters that fall between X and Y (inclusive) in the current collating sequence as defined by the LC_COLLATE category in the current locale. (iv) Named character classes, like [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] so that one can say "[[:lower:]]" instead of "[a-z]", and have things work in Denmark, too, where there are three letters past 'z' in the alphabet. These character classes are defined by the LC_CTYPE category in the current locale. On 06/19/2015 09:58 AM, Theodore Ruegsegger wrote:
It looks like a locale issue. In the old "C" locale, sorting a list put all cap letters before all lowercase letters. In the "en_US.UTF-8" locale, apparently the Ubuntu default, they're sorted as in a regular dictionary, regardless of caps.
For example:
ls apple berry Berry charlie
Hope that's more helpful than my first response! Ted
On Fri, Jun 19, 2015 at 9:46 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
Disregard last transmission. I shoulda read your post more carefully.
Ted
On Fri, Jun 19, 2015 at 9:44 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
b != B
HTH, Ted
On Fri, Jun 19, 2015 at 9:29 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening.
Dir with two files:
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
brett@spider /tmp/test $ \ls [a-c]* apple Berry
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple
brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory
brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory
brett@spider /tmp/test $ \ls a* apple
brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory
brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory
brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls
brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP]
Thanks, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Ted, Chris, Thanks for straightening me out on this! Haven't had to pay much attention to locale before. Makes sense. -Brett On Fri, Jun 19, 2015 at 10:22 AM, Chris Thompson <wolcen@riseup.net> wrote:
Well, this may not get you only a subset of lower-case letters, but it does at least limit to lower case: [[:lower:]]. More details below.
e.g.:
ls [[:lower:]]*
Regards, Chris
See also:
http://www.tldp.org/LDP/GNU-Linux-Tools-Summar/html/x11655.htm#STANDARD-WILD...
man 7 glob ----------
Character classes and internationalization Of course ranges were originally meant to be ASCII ranges, so that "[ -%]" stands for "[ !"#$%]" and "[a-z]" stands for "any lowercase letter". Some UNIX implementations generalized this so that a range X-Y stands for the set of characters with code between the codes for X and for Y. However, this requires the user to know the character cod‐ ing in use on the local system, and moreover, is not convenient if the collating sequence for the local alphabet differs from the ordering of the character codes. Therefore, POSIX extended the bracket notation greatly, both for wildcard patterns and for regular expressions. In the above we saw three types of items that can occur in a bracket expression: namely (i) the negation, (ii) explicit single characters, and (iii) ranges. POSIX specifies ranges in an internationally more useful way and adds three more types:
(iii) Ranges X-Y comprise all characters that fall between X and Y (inclusive) in the current collating sequence as defined by the LC_COLLATE category in the current locale.
(iv) Named character classes, like
[:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]
so that one can say "[[:lower:]]" instead of "[a-z]", and have things work in Denmark, too, where there are three letters past 'z' in the alphabet. These character classes are defined by the LC_CTYPE category in the current locale.
On 06/19/2015 09:58 AM, Theodore Ruegsegger wrote:
It looks like a locale issue. In the old "C" locale, sorting a list put all cap letters before all lowercase letters. In the "en_US.UTF-8" locale, apparently the Ubuntu default, they're sorted as in a regular dictionary, regardless of caps.
For example:
ls apple berry Berry charlie
Hope that's more helpful than my first response! Ted
On Fri, Jun 19, 2015 at 9:46 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
Disregard last transmission. I shoulda read your post more carefully.
Ted
On Fri, Jun 19, 2015 at 9:44 AM, Theodore Ruegsegger <gruntly@gmail.com> wrote:
b != B
HTH, Ted
On Fri, Jun 19, 2015 at 9:29 AM, Brett Russ <bruss@alum.wpi.edu> wrote:
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening.
Dir with two files:
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
brett@spider /tmp/test $ \ls [a-c]* apple Berry
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple
brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory
brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory
brett@spider /tmp/test $ \ls a* apple
brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory
brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory
brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls
brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP]
Thanks, Brett
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
_______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
Interesting. I see the same behavior. Near as I can tell bash decided that the alphabet is: aAbBcCdD... I wouldn't expect that to be the case but when I played around with various file name that was the behavior I saw. So [a-c] includes upercase B while [a-b] does not. 'ls [A-c]*' will show Berry but not apple. On Fri, 2015-06-19 at 09:29 -0400, Brett Russ wrote:
Can someone explain this to my evidently inadequately caffeinated brain? I was trying to use a lower case character range wildcard to list all files not starting with an uppercase letter and I noticed it wasn't working as I'd expect. So I created this simple example and still don't understand what's happening.
Dir with two files:
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
brett@spider /tmp/test $ \ls [a-c]* apple Berry
From here on are just a few extra examples confirming the oddity of the above.
brett@spider /tmp/test $ \ls [a]* apple
brett@spider /tmp/test $ \ls [b]* ls: cannot access [b]*: No such file or directory
brett@spider /tmp/test $ \ls [c]* ls: cannot access [c]*: No such file or directory
brett@spider /tmp/test $ \ls a* apple
brett@spider /tmp/test $ \ls b* ls: cannot access b*: No such file or directory
brett@spider /tmp/test $ \ls c* ls: cannot access c*: No such file or directory
brett@spider /tmp/test $ dpkg -S `which ls` coreutils: /bin/ls
brett@spider /tmp/test $ dpkg -s coreutils Package: coreutils Essential: yes Status: install ok installed Priority: required Section: utils Installed-Size: 6020 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: foreign Version: 8.21-1ubuntu5.1 [SNIP]
Thanks, Brett _______________________________________________ Wlug mailing list Wlug@mail.wlug.org http://mail.wlug.org/mailman/listinfo/wlug
brett@spider /tmp/test $ \ls apple Berry
This is what I'd expect to see:
brett@spider /tmp/test $ \ls [a-b]* apple
This makes no sense. Is this a bug?:
This is probably more bash than ls. When you use `*' on a command line, the shell does filename glob expansion; the results of the glob expansion become arguments (to ls, in this case). Out of the box, I've always seen bash do case sensitive globs, but you can change this. # four files $ touch ax Ax bx Bx # all files starting with `a' $ ls a* ax # make glob expansion case insensitive $ shopt -s nocaseglob $ ls a* Ax ax # make it case sensitive again $ shopt -u nocaseglob $ ls a* ax If you actually want to see how the glob is expanded $ sh -x -c "ls [aA]*" + ls Ax ax <<< glob expansion Ax ax Steve
participants (5)
-
Brett Russ
-
Chris Thompson
-
Dennis Payne
-
Steve Revilak
-
Theodore Ruegsegger