A long time I go I wrote an angular app that was pleasantly straight forward. It loads all records from the server in one big fat AJAX GET. The data is large, ~550Kb as a string of JSON, but that's OK because it's a fat-client app and it's extremely unlikely to grow any multiples of this. Yes, it'll some day go up to 1Mb but even that is fine.
Once ALL records are loaded with AJAX from the server, you can filter the whole set and paginate etc. It feels really nice and snappy. However, the app is slightly smarter than that. It has two cool additional features...
-
Every 10 seconds it does an AJAX query to ask "Have any records been modified since {{insert latest modify date of all known records}}?" and if there's stuff, it updates.
-
All AJAX queries from the server are cached in the browser's local storage (note, I didn't write
localStorage
, "local storage" encompasses multiple techniques). The purpose of that is to that on the next full load of the app, we can at least display what we had last time whilst we wait for the server to return the latest and greatest via a slowish network request. -
Suppose we have brand new browser with no local storage, because the default sort order is always known, instead of doing a full AJAX get of all records, it does a small one first: "Give me the top 20 records ordered by modify date" and once that's in, it does the big full AJAX request for all records. Thus bringing data to the eyes faster.
All of these these optimization tricks are accompanied with a flash message at the top that says: <img src="spinner.gif"> Currently using cached data. Loading all remaining records from server...
.
When I built this I decided to use localForage which is a convenience wrapper over localStorage
AND IndexedDB
that does it all asynchronously and with proper promises. And to make it work in AngularJS I used angular-localForage so it would work with Angular's cycle updates without custom $scope.$apply()
stuff. I thought the advantage of this is that it being async means that the main event can continue doing important rendering stuff whilst the browser saves things to "disk" in the background.
Also, I was once told that localStorage
, which is inherently blocking, has the risk that calling it the first time in a while might cause the browser to have to take a major break to boot data from actual disk into the browsers allocated memory. Turns out, that is extremely unlikely to be a problem (more about this is a future blog post). The warming up of fetching from disk and storing into the browser's memory happens when you start the browser the very first time. Chrome might be slightly different but I'm confident that this is how things work in Firefox and it has for many many months.
What's very important to note is that, by default, localForage
will use IndexedDB
as the storage backend. It has the advantage that it's async to boot and it supports much large data blobs.
So I timed, how long does it take for localForage
to SET and GET the ~500Kb JSON data? I did that like this, for example:
var t0 = performance.now();
$localForage.getItem('eventmanager')
.then(function(data) {
var t1 = performance.now();
console.log('GET took', t1 - t0, 'ms');
...
The results are as follows:
Operation | Iterations | Average time |
---|---|---|
SET | 4 | 341.0ms |
GET | 4 | 184.0ms |
In all fairness, it doesn't actually matter how long it takes to save because my app actually doesn't depend on waiting for that promise to resolve. But it's an interesting number nevertheless.
So, here's what I did. I decided to drop all of that fancy localForage
stuff and go back to basics. All I really need is these two operations:
// set stuff
localStorage.setItem('mykey', JSON.stringify(data))
// get stuff
var data = JSON.parse(localStorage.getItem('mykey') || '{}')
So, after I've refactored my code and deleted (6.33Kb + 22.3Kb) of extra .js files and put some performance measurements in:
Operation | Iterations | Average time |
---|---|---|
SET | 4 | 5.9ms |
GET | 4 | 3.3ms |
Just WOW!
That is so much faster. Sure the write operation is now blocking, but it's only taking 6 milliseconds. And the reason it took IndexedDB less than half a second also probably means more hard work for it to sweat CPU over.
Sold? I am :)
Comments
Post your own commentNote that when using localStorage, it is still possible for you to block the main thread of your browser process to read the data. See my recent comment at http://nolanlawson.com/2015/09/29/indexeddb-websql-localstorage-what-blocks-the-dom/#comment-79519 (and the linked to design doc for firefox elsewhere in the article). The optimization that happens at browser startup-ish-time is knowing what domains use localStorage. If you want to observe the janking you'll want to put your read test in your very first script tag and probably test from a freshly booted Firefox. (Otherwise you'll want to wait 20 seconds after closing your tab and ensuring it has been flushed from the bfcache in order to experience the disk reads. Which admittedly probably are going to be cached by SQLite in Gecko if you didn't do a cold start. And your OS may also have cached the data. And SQLite may have slurped the database contents if your testing profile doesn't have a lot of other real-world usage that results in local storage use.)
Only cookies are always loaded at all times on Gecko and that inherently means that if you (ab)use cookies to store data, you are making the user keep your data in memory all the time, forever.
I mention this (after seeing the reference from the "Extravaganza" post) mainly because if your data is scaling up to 500k and that wasn't just a test example, you've probably crossed the threshold of what is suitable for localStorage. It doesn't sound like custom IndexedDB code is merited here, but it sounds like your use case could be well-aligned with PouchDB for querying/filtering and replication/sync purposes. (Not that the IndexedDB code to store events using keys that order the records by time would be that egregious, but PouchDB also gets you fallback to localStorage/WebSQL/etc.)
Thank you! That's very insightful.
I get a feeling that the "theory" is that it might cause yank on the main thread but it's realistically just very unlikely to happen. Especially if you follow best practice of putting the javascript loading late.
See http://www.peterbe.com/localvsxhr/
This experiment is actually about IDB vs. localStorage vs. XHR but one of the outcomes is that I've managed to collect results for a lot of visitors when a .js file has these for the first couple of lines:
var a = performance.now();
localStorage.getItem('anything');
var b = performance.now();
That's code that is put at the bottom of the HTML document and loaded after loading some jquery.
The results is that the "Time to Boot" median is extremely small. If you look carefully, there is a max number there of 11.72ms. That number I got when I booted a custom build of Nightly, that Boris Zbarsky compiled for me. It was regular Nightly but with the code that pre-heats the disk -> localStorage on load commented out. In other words, even without that code, it's only 11.72ms.
What's really interesting is that the 500Kb has to be loaded with that domain ALL the time. Even though it's not used. It might not matter a great deal on a fast desktop but it certainly matters for lower power devices. Especially if it's 500Kb you don't need to get the main task done on that app.
Yeah, my only real concern with localStorage these days is excess use of it because of that memory cost and the potential for the read costs to add up to a real problem. On non-SSD storage, if you had multiple megabytes of data in there, especially sharded over a high number of keys, given our SQLite page size (32k everywhere but b2g), you could get into a large number of disk seeks.
I do believe that localStorage hits a sweet spot for helping get a populated first render to the user. Anything above the fold, really. For the Firefox OS email app we use localStorage to cache the HTML of the page (more details at http://www.visophyte.org/blog/2015/04/30/talk-script-firefox-os-email-performance-strategies/), which is something that we can hopefully replace with the new cache API at some point. And in a desktop-related effort, I use it for persisting splitter positions too (https://clicky.visophyte.org/files/screenshots/20150830-222533.png).
But once you get to anything that you'd call a database where there's a growth factor expected, I think it's preferable to move to using something IndexedDB-backed. Although obviously, there are always engineering trade-offs to be made. For a tool used by a limited user-base used on desktop where they probably have oodles of RAM, it's not a big deal. The bigger risk when working on Mozilla affiliated projects is the hazard where people replicate your implementation choices without doing a trade-off analysis of their own. (If localStorage is good enough for Mozilla, etc. ;)
I love this "debate"! These are really interesting questions to solve.
We're not disagreeing on anything but one argument I'd like to make is that if you look at http://www.peterbe.com/plog/lovefield-ajax-proxy where I used a wrapper over IDB called Lovefield. It's a similar pattern, load from Lovefield first, then wait for AJAX to get more fresh data. What I found was that loading stuff from Lovefield takes 400-500ms EVERY time. And it only takes 1-2ms with localStorage. (granted the point of something like Lovefield is not to select EVERY record every time but we'll let that slide). So my question is, what's it doing to my CPU/RAM during that half second? Whatever it is, even if it's off the main thread, is bound to cause slowdown on the main thread and strain on the resources.
I.e. localStorage is like a buffalo but it gets the job done. Perhaps that ultimately wins despite it's shortcomings.
Yeah, 400-500ms is really bad, although I should note I just tried http://www.peterbe.com/ajaxornot/view7a several times in a row and my times on e10s nightly the times reported were largely in the 100ms-150ms time range. (I did get one outlier at 207ms.) Since you're loading all the rows, it's possible the improvement came from https://bugzilla.mozilla.org/show_bug.cgi?id=1168606 landing which implements pre-fetching.
For FxOS we in general have been using the non-standard mozGetAll method to fetch things in batches, which avoids wasteful roundtrips, etc. (They really could get out of hand... IDB was erring on the side of not letting people footgun too much. And people did. Many a tale of woe can be told about overuse of mozGetAll by FxOS apps/custom web APIs!)
There is definitely a non-trivial overhead in the act of opening an IDB database. It looks like your call to connect() is part of your timed logic, and per https://github.com/google/lovefield/blob/master/docs/spec/03_life_of_db.md this would seem to involve the database opening, so I expect that could have a lot to do with it. It would be interesting to split the numbers out to separate the connect(), although obviously that number is very important/significant to user responsiveness at startup!
And yes, absolutely there is a non-trivial cost to spinning up/and opening an IndexedDB database in terms of disk I/O etc. IDB is pretty good about spinning things back down, but it wouldn't surprise me if at 500K of data local storage was better on balance, especially if you're storing the data in only one or a few keys.
For what it's worth, the 400-500ms I can't get now any more either. Hovering around 190-230ms in my laptop.
It's perhaps silly to compare pure localStorage with Lovefield + connect() + IDB all in one lump but the pragmatist in me is only interested in "getting the data out" :)
I suspect a lot comes down to how localStorage is capped. I now understand that the reason it's capped is because it's to prevent you from slugging around too much stuff in RAM. So, I'd welcome a nice cap that be easy to business-logic around. Would be nice if the browser vendor could help tell web developers what's too much and too dangerous for the greater benefit of the browser experience.