Pluses and Pitfalls of Repo.stream
By- October 24, 2018
Scenario: you are working on a phoenix app that has seen a good deal of use and need to do some transformations of some tables encompassing an exceptionally large number of rows and their relations. Obviously, some amount of considerations for performance are necessary; if you can avoid loading an entire table into memory in order to achieve this, that would be ideal right? Enter Ecto.Repo.stream, turn that giant list into a lazily evaluated enumerable and load rows as needed. Job done, right? Well, depends.
The good news is you will definitely address the issue of memory use, however it does come at the cost of time, which can increase greatly if, for instance, you need to access a number of rows in an associated table for every row you are referencing. For instance:
This might seem like a good idea, if there are large number Bar’s for every Foo entry, but since the stream must be inside a transaction, you have one connection you’re working with to finish enumerating over the stream. This can be adjusted with the :timeout option on Repo.stream which can be relaxed from its default at 15000 milliseconds all the way to :infinity, but if your streaming changes rely on a flaky connection or some other piece of code, you could run into an issue again on that side. Safer to avoid nesting streams if possible, or to find a different way of chunking data.
If memory is a more pressing constraint than time, Repo.stream is a pretty convenient way to manage how much is loaded into memory at a given time. Just remember to choose an appropriate timeout value before you start.