How to get the values in a single list, count the number of same values in the second?

Example:
Let's have the list: lst1 = [a, b, c]. And a second list: lst2 = [d, a, d, e, a, f, a, d, b, b, c, a, b, k, e, a, c, c, b].
It is necessary for each of the values of lst1 to find the number of the same a, b, c in the lst2 to the conclusion was this:
a: 5
b: 4
c: 3
How this is implemented with the help of a dictionary? Maybe there is some elegant solution, not to make a fuss of counters for each of the values lst1?
June 14th 19 at 18:22
2 answers
June 14th 19 at 18:24
Solution
from collections import Counter

lst1 = ["a", "b", "c"]
lst2 = ["d", "a", "d", "e", "a", "f", "a", "d", "b", "b", "c", "a", "b", "k", "e", "a", "c", "c", "b"]

counter = Counter(lst2)
for key in lst1:
 print(f,'{key}: {counter[key]}')
Correct: one should do without checking 'k in lst1':
counter = Counter(lst2)
for key in lst1:
 print(f,'{key}: {counter[key]}')
- Evangeline.Waters97 commented on June 14th 19 at 18:27
June 14th 19 at 18:26
Use the dictionary generator - get the most elegant solution (and most importantly fast):
print({key: lst2.count(key) for key in lst1})
It is the slow solution, its complexity is O(n*m).
An effective solution requires O(n) operations. - Evangeline.Waters97 commented on June 14th 19 at 18:29
and now run the test for performance:
from collections import Counter

import datetime

lst1 = ["a", "b", "c"]
lst2 = ["d", "a", "d", "e", "a", "f", "a", "d", "b", "b", "c", "a", "b", "k", "e", "a", "c", "c", "b"]

time_start = datetime.datetime.now()
for k,v in {key: lst2.count(key) for key in lst1}.items():
 print("{}: {}".format(k,v))
print('Time2: ', datetime.datetime.now()-time_start)

time_start = datetime.datetime.now()
for k, v in Counter(lst2).items():
 if k in lst1:
 print(f"{k}: {v}")
print('Time: ', datetime.datetime.now()-time_start)

And the result is that the variant with the generator dictionary is 2-3 times faster than yours. Don't believe you can check for yourself. - sister58 commented on June 14th 19 at 18:32
because the amount of data is small, the creation of an object longer than walking the list.
Constants complexity miscellaneous: k1*O(n*m) and k2*O(n), where k1 < k2.

from collections import Counter
import string

import datetime

lst1 = list(string.ascii_lowercase)
lst2 = list(string.ascii_lowercase)*1000

time_start = datetime.datetime.now()
for k,v in {key: lst2.count(key) for key in lst1}.items():
 print("{}: {}".format(k,v))
print('Time2: ', datetime.datetime.now()-time_start)


time_start = datetime.datetime.now()
counter = Counter(lst2)
for k in lst1:
 print(f"{k}: {counter.get(k, 0)}")
print('Time: ', datetime.datetime.now()-time_start)


If you increase the number of elements, your method is much behind. Counter passes through the list once. Your method iterates through the list as many times as there are elements in lst1. - miguel_Reiche commented on June 14th 19 at 18:35
Your algorithm is only faster on sorted data.
Take any large text file as "lst2" (MB 50) and repeat the measurement.

import timeit
from collections import Counter
import string

with open("book.txt", "r") as f_in:
 lst1 = list(string.ascii_uppercase)

 lst2 = f_in.read().upper()
 #lst2 = list(string.ascii_lowercase) * 100000

 t = timeit.default_timer()
 counter = {i: lst2.count(i) for i in lst1}
 for k in lst1:
 print(f"{k} : {counter.get(k, 0)}")
 print("\nCode execution time: {0:.4} sec.\n".format(timeit.default_timer() - t))

 t = timeit.default_timer()
 counter = Counter(lst2)
 for k in lst1:
 print(f"{k} : {counter.get(k, 0)}")
 print("\nCode execution time: {0:.4} sec.\n".format(timeit.default_timer() - t))
- sister58 commented on June 14th 19 at 18:38
sorting has nothing to do with this. Besides, list(string.ascii_lowercase)*1000 is not sorted.

You code is wrong. lst2 you have not a list but a string. count the lines is not something that count on the list because the string in memory is a continuous chunk of data is stored. Counter also works with iterable.

lst2 = list(f_in.read().upper())
And already the numbers are not good. - ocie24 commented on June 14th 19 at 18:41
for strings it is better to use a defaultdict. - sister58 commented on June 14th 19 at 18:44

Find more questions by tags ProgrammingPython