数え上げに使えるcollections の Counter のお話

"collections" ライブラリの "Counter"。文字通り、数えるためにあります。

『数え上げ？普通にfor文回せばいいじゃんアゼルバイジャン』

という方もいらっしゃるでしょう。

お前それ競プロの前でも言えんの？

for + リストでの数え上げ
defaultdict をつかう数え上げ
Counter での数え上げ
Counterの便利な使い方
いいお話

))

では、for + リストでの数え上げがどの程度の愚行なのか実験していきましょう。

for + リストでの数え上げ

import time  # @1
summ = 0
for j in range(1, 11):  #@2
    start = time.time()
    for i in range(int(1e4)):  #@3
        count = [0 for m in range(200)]
        for i in """
        Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably only one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
        """:  #@4
            count[ord(i)] += 1  #@5
    end = time.time()
    print("{}: {}  (sec)".format(j, end - start))  #@6
    summ += end - start
for i in range(len(count)):
    if count[i]:
        print("{!r}: {}, ".format(chr(i), count[i]), end="")  #@7
print("")
print("average: {}  (sec)".format(summ / 10))  #@8

@1: 時間を計りたいので、timeをimportします
@2 10回回して、平均を出そうと思います
@3 10000回計算します
@4 Python インタープリタで >>>import this と入力してみてください。同じものが出ます。これは、Pythonのエッセンスです
@5 それぞれの番号を決めるのは面倒なので、文字ごとの文字番号をインデックスにしています
@6 10000回計算するごとに、その回の計算時間を出力します
@7 改行を改行として出力してしまうので、{!r}を使うことにより、エスケープシーケンスをつけて出力できます
@8 10回の平均を出します

出力がこちらになります

1: 1.5162262916564941  (sec)
2: 1.5041000843048096  (sec)
3: 1.5104267597198486  (sec)
4: 1.574958086013794  (sec)
5: 1.4988515377044678  (sec)
6: 1.488734483718872  (sec)
7: 1.499678611755371  (sec)
8: 1.5089690685272217  (sec)
9: 1.5006461143493652  (sec)
10: 1.4950613975524902  (sec)
'\n': 20, ' ': 278, '!': 1, "'": 4, '*': 2, ',': 3, '-': 6, '.': 18, 'A': 3, 'B': 1, 'C': 1, 'D': 1, 'E': 2, 'F': 1, 'I': 3, 'N': 2, 'R': 1, 'S': 3, 'T': 1, 'U': 1, 'a': 50, 'b': 19, 'c': 16, 'd': 16, 'e': 86, 'f': 10, 'g': 11, 'h': 29, 'i': 49, 'k': 2, 'l': 33, 'm': 15, 'n': 38, 'o': 41, 'p': 20, 'r': 31, 's': 42, 't': 74, 'u': 20, 'v': 5, 'w': 4, 'x': 6, 'y': 15, 
average: 1.5097652435302735  (sec)

1.5秒ですか～…。

う～ん、遅い!ｗ

つぎに、ディクショナリを使ってみましょう。

defaultdict をつかう数え上げ

defaultdictを使えば、存在しない要素(key)を指定したときに、勝手に初期化してくれます。

import time
import collections
summ = 0
for j in range(1, 11):
    start = time.time()
    for i in range(int(1e4)):
        count = collections.defaultdict(int)
        for i in """
        Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably only one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
        """:
            count[i] += 1
    end = time.time()
    print("{}: {}  (sec)".format(j, end - start))
    summ += end - start
print(count)
print("average: {}  (sec)".format(summ / 10))

こちらの出力はこうなります

1: 1.0501744747161865  (sec)
2: 1.0498456954956055  (sec)
3: 1.0457956790924072  (sec)
4: 1.050318956375122  (sec)
5: 1.0362188816070557  (sec)
6: 1.044149398803711  (sec)
7: 1.0807437896728516  (sec)
8: 1.0319969654083252  (sec)
9: 1.0468177795410156  (sec)
10: 1.0291187763214111  (sec)
defaultdict(<class 'int'>, {'\n': 20, ' ': 278, 'B': 1, 'e': 86, 'a': 50, 'u': 20, 't': 74, 'i': 49, 'f': 10, 'l': 33, 's': 42, 'b': 19, 'r': 31, 'h': 29, 'n': 38, 'g': 11, 'y': 15, '.': 18, 'E': 2, 'x': 6, 'p': 20, 'c': 16, 'm': 15, 'S': 3, 'o': 41, 'C': 1, 'd': 16, 'F': 1, 'R': 1, "'": 4, 'k': 2, 'A': 3, 'v': 5, 'U': 1, 'I': 3, ',': 3, 'T': 1, '-': 6, 'w': 4, 'D': 1, 'N': 2, '*': 2, '!': 1})
average: 1.046518039703369  (sec)

1秒ですか…そこそこですね…
まだまだいけそうですね()

では、大本命、Counterを使ってみましょう

Counter での数え上げ

import time
from collections import Counter
summ = 0
for j in range(1, 11):
    start = time.time()
    for i in range(int(1e4)):
        count = Counter("""
        Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced.
        In the face of ambiguity, refuse the temptation to guess.
        There should be one-- and preferably only one --obvious way to do it.
        Although that way may not be obvious at first unless you're Dutch.
        Now is better than never.
        Although never is often better than *right* now.
        If the implementation is hard to explain, it's a bad idea.
        If the implementation is easy to explain, it may be a good idea.
        Namespaces are one honking great idea -- let's do more of those!
        """)
    end = time.time()
    print("{}: {}  (sec)".format(j, end - start))
    summ += end - start
print(count)
print("average: {}  (sec)".format(summ / 10))

そして気になる実行時間ですが

1: 0.4078483581542969  (sec)
2: 0.41309595108032227  (sec)
3: 0.4018237590789795  (sec)
4: 0.40430498123168945  (sec)
5: 0.4073166847229004  (sec)
6: 0.424285888671875  (sec)
7: 0.4343554973602295  (sec)
8: 0.42177391052246094  (sec)
9: 0.43581199645996094  (sec)
10: 0.4025719165802002  (sec)
Counter({' ': 278, 'e': 86, 't': 74, 'a': 50, 'i': 49, 's': 42, 'o': 41, 'n': 38, 'l': 33, 'r': 31, 'h': 29, '\n': 20, 'u': 20, 'p': 20, 'b': 19, '.': 18, 'c': 16, 'd': 16, 'y': 15, 'm': 15, 'g': 11, 'f': 10, 'x': 6, '-': 6, 'v': 5, "'": 4, 'w': 4, 'S': 3, 'A': 3, 'I': 3, ',': 3, 'E': 2, 'k': 2, 'N': 2, '*': 2, 'B': 1, 'C': 1, 'F': 1, 'R': 1, 'U': 1, 'T': 1, 'D': 1, '!': 1})
average: 0.4153188943862915  (sec)

Yeah!! Pythonic!!

0.4秒ですって。

リストに比べて約1/4程になりましたね…

このように、非常に高速な演算が可能です。

ちなみに、Counterはdict型を返します。

Counterの便利な使い方

Counterには多くの便利な機能があります。

Counter.elements()

それぞれのkeyを、value個入れたリストを返してくれます。

from collections import Counter
let = Counter("abraham lincoln said\"abracadabra\"")
print(let)
print("".join(sorted(let.elements())))

とすれば、

Counter({'a': 9, 'b': 3, 'r': 3, ' ': 2, 'l': 2, 'i': 2, 'n': 2, 'c': 2, 'd': 2, '"': 2, 'h': 1, 'm': 1, 'o': 1, 's': 1})
  ""aaaaaaaaabbbccddhiillmnnorrrs

となります。

Counter.most_common()

出現回数の多い(valueの多い)ものを、上から引数個分だけ取り出してくれます。引数が無ければ、全て取り出してくれます。

sorted(list(let.items()), key=lambda x: x[1], reverse = True)[:引数]

と同じですね。

from collections import Counter
let = Counter("abraham lincoln said\"abracadabra\"")
print(let.most_common(3))

とすれば

[('a', 9), ('b', 3), ('r', 3)]

が返ってきます。

Counter.substract(another Counter)

Counterから、another Counterの値を引きます。破壊的な関数です。

from collections import Counter
leta = Counter("abraham lincoln said\"abracadabra\"")
letb = Counter("baby array and bad bully")
leta.subtract(letb)
print(leta)

答えがマイナスなどになることも十二分にあり得ます。

Counter({'a': 4, 'i': 2, 'c': 2, '"': 2, 'r': 1, 'h': 1, 'm': 1, 'n': 1, 'o': 1, 's': 1, 'l': 0, 'd': 0, 'b': -1, 'u': -1, ' ': -2, 'y': -3})

どうでしょうか？

非常に使いやすいと思います。

Pythonのボトルネックは速度なので、それさえ乗り越えられれば愛せるようになりますよ。すでに愛している方は、さらに愛せるように…

いいお話

あ、そうだ(唐突)

Pythonパーカーを作ろうと思っています。興味の湧いた方はどうぞご一報ください。

かう人が多ければ多いほど安くできるので、募集中です(宣伝)

ちなみに、私に利益な一切来ないようにします…(純粋な愛ゆえのパーカーなので、それで金儲けしようとは考えていません。)
なので、思われているより安いと思いますよ笑

#Python3 パーカーを作りたいと思っています。
購入したい!という人が多ければ安く作れるので、いいなぁ、と思えば、リプでもDMでもどうぞ！
I wanna make the "#Python hoody". And I want YOU to buy one, too. cuz' lots of buyer make the hoody more inexpensive. give me contact #パーカー pic.twitter.com/0ElGJetNcb
— ながたさん.py (@nagataaaas) 2018年2月10日

では皆さん、よいPythonライフを!!