We have a piece of code that is going to be run A LOT on a server infrastructure that needs to be fast. I know that I/O is much more important but because I had the time I wanted to figure out which is fastest:
def a(s, m):
if len(s) > m:
s = s[:m]
return s
...or...
def b(s, m):
return s[:m]
I wrote a simple benchmark that bombarded these two string manipulation functions with strings that were on average 50% chance longer than the max length. In other words, half of strings sent to these two functions where so short they didn't need to be truncated.
Turns out, there is absolutely no significant difference! I'm not even going to post the timings.
I could go on an repeat the iterations till my CPU starts to smoke but then I'm sure the benchmark is becoming invalid and needs to be re-engineered and by then the realm of the test becomes surreal. Now, carry on with your life of writing real code.
Comments
Post your own commentIt looks like Python (2.7 here) is optimizing this case by returning the original string:
>>> s = "abc"
>>> s[:50] is s
True
And here's the C code that makes it happen:
1124 static PyObject *
1125 string_slice(register PyStringObject *a, register Py_ssize_t i,
1126 register Py_ssize_t j)
...
1133 if (j > Py_SIZE(a))
1134 j = Py_SIZE(a);
1135 if (i == 0 && j == Py_SIZE(a) && PyString_CheckExact(a)) {
1136 /* It's the same as a */
1137 Py_INCREF(a);
1138 return (PyObject *)a;
That said, I'd predict a measurable difference in performance, due to the extra bytecodes and the len() function call in a, at least if you're using traditional cpython.
Thanks for that!
Yeah maybe you could see a measurable difference but that would require a workload that vastly exaggerates the conditions of my application.
I wonder if b(s,m) is guaranteed to return a new reference, or if id(b(s,m)) == id(s) when len(s) < m.
See Jeff's interesting response above.
Also,
>>> s='peter'
>>> m=3
>>> id(b(s,m)) == id(s)
False
In your tests, the Python function call overhead probably swamped any difference in the performance of the code inside the functions.
Swamped?
python -m timeit -r 10 -s "def myfunc(x): return x[:10]" "myfunc('abcdefg')"
100000 loops, best of 10: 2.75 usec per loop
./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.714 usec per loop
0.714 is 25% of 2.75. So "swamped" might appear to be an overstatement. ("Overwhelm with an excessive amount of something"). However, even with -r 10 I'm getting measurements for the smaller numbers that vary by as much as 50%, so I think swamped is probably appropriate. Better measurements could doubtless be obtained if one had access to a test machine that wasn't doing anything else (I'm measuring on my laptop).
Some additional measurments:
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.41 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.41 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.42 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.633 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.714 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.696 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:10]"
1000000 loops, best of 10: 0.319 usec per loop
rdmurray@hey:~/python/p33>./python -m timeit -r 10 "'abcdefg'[:3]"
1000000 loops, best of 10: 1.35 usec per loop
So, it looks like the slice is indeed faster if longer than the string on CPython (my measurements were on my current checkout of 3.3, by the way). That, however, is a (current) CPython implementation artifact, and not something on which to rely (possibly outside of tuning a particular application for a particular machine/environment).
There is indeed a solid difference, but yeah, it's tiny.
$ python --version
Python 2.7.1
$ python -m timeit -r16 -s 'd=["a"*i for i in range(10000)]; l=len' -- 'for x in d:' ' if l(x) > 5000: x[:5000]'
100 loops, best of 16: 4.94 msec per loop
$ python -m timeit -r16 -s 'd=["a"*i for i in range(10000)]' -- 'for x in d:' ' x[:5000]'
100 loops, best of 16: 4.14 msec per loop
The numbers were pretty stable (around +-.4 ms, so +- 40ns per actual call at 10000 calls per loop) for several runs. These were the two fastest measurements for each approach. Not worth worrying about because if you've got an inner loop that tight, you need to either switch to pypy or write it in C.
Overhead:
$ python -m timeit -r16 -s 'd=["a"*i for i in range(10000)]; l=len' -- 'for x in d: pass'
1000 loops, best of 16: 599 usec per loop