Size cost of using Facebook’s React javascript framework

Here at finderful.com we are obsessed with page size. We want pages to load fast and render fast. We were excited when we learned about React and how they speed up DOM changes. We were also excited by create-react-app and how it comes with webpack, autoprefixer, and babel out of the box. But what we really wanted to know was: what is the cost of using this framework? We normally don’t use frameworks for client-side pages here at finderful.com because we find that they bloat the size quite a bit for the few features we want to use. For example, all but one of our webpages currently (2017-05-14) are smaller than the jquery library. Probably the most popular javascript library, and we aren’t using it because it’ll blow out our page sizes.

React is a javascript library more than it is a framework. The main functions that would be used are React.createElement() and ReactDOM.render(). There is also the class React.Component which you are expected to subclass and fill in the render() function, returning an HTML/JSX element or React component.

JSX is a core part of React, and the main reason we love React so much. JSX is like components for HTML. You can define your own components, made up of HTML, and then reuse those components just like you would use HTML elements.

create-react-app helps you to bootstrap a React-based project. It sets up a directory structure with some stub files that you can fill in. The main one you would focus on is App.js, though you are free to modify any HTML/CSS/JS file. create-react-app will take care of converting JSX to JS, ES6 to ES5, packing & minifying for production, and even sets up npm start for a quick development cycle (your app updates just by saving changes). No need to deal with grunt/gulp/brunch, etc.

So our goal is to be able to use create-react-app without React. We aim to achieve this by rewriting the library functions above to be smaller, albeit less featureful. You can see our currently supported list of React features in our github repository.

So let’s see how costly (in bytes) it is to use React, and if necessary, we can bring that size down to a size we’re happy with. Being a JS framework, we predict it’s larger than any of our pages, and as we hope to replace handlebars with JSX, we hope we can rewrite it with minimum functionality at a size smaller than handlebars’s runtime.

We start our experiments by using create-react-app and seeing how big the demo app is. create-react-app makes this easy for us when we ‘build’ the app by printing out the sizes for us.

Using create-react-app@1.3.0 and react@15.5.4:

  46.86 KB  build/static/js/main.3c4a7db0.js
  289 B     build/static/css/main.9a0fe4f1.css

Now let’s set up a baseline for what the absolute minimum would be if React was almost 0 bytes. Note that this wouldn’t actually create a usable webpage. We’ve just removed the import statements from the demo app’s .js files and used dummy functions that don’t do anything:

  5.12 KB  build/static/js/main.171258de.js
  289 B    build/static/css/main.9a0fe4f1.css

41.74 KB (46.86 KB – 5.12 KB) for React. Being bigger than jquery 1.9.1 (32714 bytes), this makes React bigger than any of our pages, before adding our content.

People don’t usually use React by itself, though. So let’s see the sizes if we use React and some common libraries.

Adding in react-router-dom@4.1.1 (Router, Route, and Link):

  57.52 KB  build/static/js/main.713de4d6.js
  289 B     build/static/css/main.9a0fe4f1.css

Adding in redux@3.6.0 (createStore):

  49.2 KB  build/static/js/main.fc7fd213.js
  289 B    build/static/css/main.9a0fe4f1.css

Adding in both (without react-router-redux):

  59.68 KB  build/static/js/main.899cc3c0.js
  289 B     build/static/css/main.9a0fe4f1.css

As above, the demo app without React is only 5.12 KB. We’re already using handlebars at 6656 bytes, so if we can use JSX with fewer bytes, that’ll be great. Even if it’s about the same number of bytes, we really like having the JSX inline in our javascript files instead of templates in separate files like we have now. Also we don’t have to create our own process for having the templates in the javascript files, as create-react-app has taken care of that for us.

So our target is 11776 bytes (5.12 KB * 1000 bytes / KB + 6656 bytes). We just have to go forth and rewrite a few React functions, like React.createElement and ReactDOM.render. While we have found a few versions of React.createElement, we usually prefer to write our own. Also, the ones we’ve found only handle basic HTML elements, and not React components.

Below we provide readable versions of the files src/react.js and src/react-dom.js but in our github repo, we keep the version we used for calculating the size below.

src/react.js:

function isString(string) {
    return typeof(string) === 'string' || string instanceof String;
}

function isComponentClass(componentClass) {
    return componentClass.prototype instanceof Component;
}

function createNode(elementOrString) {
    if (isString(elementOrString))
    // if it's a string, we need to create a node
        return document.createTextNode(elementOrString);
    else
    // if it's an element, then it's already a node
        return elementOrString;
}


function createElement(tagName, attributesObject, children) {
    const element = document.createElement(tagName);
    const attributes = Object.entries(attributesObject);

    attributes.forEach(setAttribute);
    children.map(createNode).forEach(element.appendChild.bind(element));

    return element;

    function setAttribute(attribute) {
        element.setAttribute(attribute[0].replace("className", "class"), attribute[1]);
    }
}

class React {
    static createElement(componentClassOrTagName, attributesObject) {
        if (isComponentClass(componentClassOrTagName)) {
            // if it's a component, render the component
            return new componentClassOrTagName().render();
        } else if (isString(componentClassOrTagName)) {
            // if it's a string, then create an element
            const children = [].slice.call(arguments, 2);
            return createElement(componentClassOrTagName, attributesObject, children);
        } else {
            throw new TypeError("Don't know how to create element from " + componentClassOrTagName);
        }
    }
}

class Component {
}

export default React;
export {Component};

src/react-dom.js:

class ReactDOM {
  static render(child, parent) {
    parent.appendChild(child);
  }
}

export default ReactDOM;

So what’s the verdict? How big is the build? How much did it cost us?

  5.53 KB  build/static/js/main.21e4aa97.js
  289 B    build/static/css/main.9a0fe4f1.css

0.41 KB (5.53 KB – 5.12 KB)! That’s much smaller than our target, so we’re pretty happy to start rewriting all our HTML in JSX, and using create-react-app to speed up our development. We’re planning a new design anyway, so now is probably a good time to drop handlebars and our custom frontend build system (based on grunt) and switch to create-react-app. Be on the lookout for a new design on finderful.com. We’ll be updating the github repo as we build the new design, so be sure to watch it for improvements.

Memory-limited Java 8 parallelStream forEachOrdered

How can we process a stream in parallel and in order?  Sadly there’s no way to guarantee order without waiting for each process to finish before starting the next one.  So what can we do when the first part of the process is time consuming and can be done in parallel, and there’s a second part which is quick, but needs to be done in order?

Furthermore, what if using all available threads would cause you to run out of memory sometimes?  How can you process as much as possible as fast as possible, but still keep things in order for the second part?

Below we describe a method for achieving this, but it is not 100% foolproof, and should be avoided if at all possible. This method is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Before we describe our method, I’d like to suggest some alternatives.

Alternative #1: parallelStream forEach

Find a way to handle the unordered output.  Use an approximation if you have to.

Alternative #2: forEachOrdered

Forgo the parallelization.

Alternative #3: Array

Use an array to store the intermediate results. Do not use a thread-unsafe growing collection like ArrayList. This only works if all your intermediate results can fit in memory.  This requires you to have an integer index with your input.  If you need to maintain your own index, you’ll also need to maintain your own Executor.  Something like this:

int counter = 0;
final IntermediateResult[] intermediateResults = new IntermediateResult[sizeOverestimate(iterableInput)];
for(Element element : iterableInput) {
  final int index = counter++;
  executor.submit(() -> {
    intermediateResults[index] = processPart1(element);
  });
}
executor.shutdown();
executor.awaitTermination();
for(IntermediateResult intermediateResult : intermediateResults) {
  processPart2(intermediateResult);
}
What if I have memory limitations?

Then use a batched version of the array method, where you run as much of part 1 as you can, then run part 2, then part 1 again, part 2, …

What if I need to run part2 while part1 is running?

You can use a Semaphore to keep track of how much memory is being used/freed. It gets a little more complicated though to keep part 2 running in order, though.

This is overly complicated and possibly has bugs. It definitely may deadlock. Use at your own risk.

    /**
     * @param part1      Time-consuming parallelizable part.  Should never return null.
     * @param part2      Fast sequential part.  Should be called in order.
     * @param permits    Number of permits you can fit in memory.  One permit could be for one intermediate, or could be for
     *                   one permit for each byte of memory an intermediate needs (e.g. file contents).
     * @param getPermits For when you have dynamic-sized Intermediates. (e.g. file contents).  If unknown, over-estimate so
     *                   you don't run out of memory.
     */
    public static <Input, Intermediate> void forEachOrdered(final List<Input> inputs,
                                                            final Function<Input, Intermediate> part1,
                                                            final Consumer<Intermediate> part2,
                                                            final int permits,
                                                            final ToIntFunction<Input> getPermits) {
        // This keeps track of how much memory is available for the intermediate results, so we don't start part1(input) if we
        // won't have enough memory for the Intermediate result.  It has to be Fair, meaning it doesn't allow a
        // less-memory-dependent intermediate to start before a more-memory-dependent intermediate
        final Semaphore semaphore = new Semaphore(permits, true);
        // This keeps track of the number of permits needed for each Input.  To save us from calling getPermits repeatedly.
        final List<Integer> ps = new ArrayList<>();
        // This keeps track of the submitted part1's and their order
        final List<Future<Intermediate>> futures = submitPart1(inputs, part1, getPermits, semaphore, ps);

        // part1 should now be running in parallel, and everything should be set to run part2 in order
        try {
            for (int i = 0, inputsSize = inputs.size(); i < inputsSize; i++) {
                // each Future with index i should correspond to running Input i and returning Intermediate i
                // also clear it to free memory
                final Future<Intermediate> future       = futures.set(i, null);
                final Intermediate         intermediate = future.get();
                // run part2!
                part2.accept(intermediate);
                // release the permits so more part1's can run
                semaphore.release(ps.get(i));
            }
        } catch (InterruptedException | ExecutionException e) {
            throw new RuntimeException(e);
        }
    }

    /**
     * Submits part1 to run in parallel, but only as many as are permitted can run simultaneously.
     * Also saves in the number of permits per input.
     *
     * @param part1      Time-consuming parallelizable part.  Should never return null.
     * @param getPermits For when you have dynamic-sized Intermediates. (e.g. file contents).  If unknown, over-estimate so you
     *                   don't run out of memory.
     * @param semaphore  This keeps track of how much memory is available for the intermediate results, so we don't start
     *                   part1(input) if we won't have enough memory for the Intermediate result.  It has to be Fair, meaning it
     *                   doesn't allow a less-memory-dependent intermediate to start before a more-memory-dependent intermediate
     * @param ps         This keeps track of the number of permits needed for each Input.  To save us from calling getPermits
     *                   repeatedly.
     * @return This keeps track of the submitted part1's and their order
     */
    @Nonnull
    private static <Input, Intermediate> List<Future<Intermediate>> submitPart1(final List<Input> inputs,
                                                                                final Function<Input, Intermediate> part1,
                                                                                final ToIntFunction<Input> getPermits,
                                                                                final Semaphore semaphore,
                                                                                final List<Integer> ps) {
        // This keeps track of the submitted part1's and their order
        final List<Future<Intermediate>> futures = new ArrayList<>();
        // We need to manage our own thread pool.  Use all available processors.
        final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        for (final Input input : inputs) {
            final int p = getPermits.applyAsInt(input);
            ps.add(p);
            futures.add(executor.submit(() -> {
                // This is not guaranteed to be called in input order, but it works for me.
                // If it's too much out-of-order, we could fail to acquire the lock for an earlier input and be stalled.
                semaphore.acquire(p);
                return part1.apply(input);
            }));
        }
        executor.shutdown();
        return futures;
    }
What if I have a part 3 which can be parallel like part 1?

As if the above code wasn’t complicated enough? Let’s call part 3 “part 2b”, and the sequential part “part 2a”. You still need part 2b to finish so you can release memory, I get it. So what we need to do is collect as much of part 1 as we can and send it in a batch to part 2, then part 2 can run part 2a sequentially and submit part 2b to a pool of threads:


    /**
     * For when batching part2 is good.
     *
     * @param part1        Time-consuming parallelizable part.  Should never return null.
     * @param part1Default Default intermediate, in case part1 throws an exception.
     * @param part2        Fast sequential part.  Should be called in order.
     * @param permits      Number of permits you can fit in memory.  One permit could be for one intermediate, or could be for
     *                     one permit for each byte of memory an intermediate needs (e.g. file contents).
     * @param getPermits   For when you have dynamic-sized Intermediates. (e.g. file contents).  If unknown, over-estimate so
     *                     you don't run out of memory.
     */
    public static <Input, Intermediate> void forEachOrdered(final List<Input> inputs,
                                                            final Function<Input, Intermediate> part1,
                                                            @NotNull final Intermediate part1Default,
                                                            final Consumer<List<Intermediate>> part2,
                                                            final int permits,
                                                            final ToIntFunction<Input> getPermits) {
        // This keeps track of how much memory is available for the intermediate results, so we don't start part1(input) if we
        // won't have enough memory for the Intermediate result.  It has to be Fair, meaning it doesn't allow a
        // less-memory-dependent intermediate to start before a more-memory-dependent intermediate
        final Semaphore semaphore = new Semaphore(permits, true);
        // This keeps track of the number of permits needed for each Input.  To save us from calling getPermits repeatedly.
        final List<Integer> ps = new ArrayList<>();
        // This keeps track of the submitted part1's and their order
        final List<Future<Intermediate>> futures = submitPart1(inputs, part1, getPermits, semaphore, ps);

        // part1 should now be running in parallel, and everything should be set to run part2 in order
        try {
            // each Future with index i should correspond to running Input i and returning Intermediate i
            for (int i = 0, inputsSize = inputs.size(); i < inputsSize; i++) {
                // wait until this one is done
                futures.get(i).get();
                // find the next one which is not yet done
                int f = i + 1;
                while (f < inputsSize && futures.get(f).isDone()) {
                    f++;
                }
                // run part2!
                part2.accept(IntStream.range(i, f).mapToObj(future -> {
                    try {
                        // each Future with index i should correspond to running Input i and returning Intermediate i
                        // also clear it to free memory
                        return futures.set(future, null).get();
                    } catch (InterruptedException | ExecutionException e) {
                        return part1Default;
                    }
                }).collect(Collectors.toList()));
                // release the permits so more part1's can run
                semaphore.release(ps.subList(i, f).parallelStream().mapToInt(Integer::intValue).sum());
                // skip the done ones
                i = f - 1;
            }
        } catch (InterruptedException | ExecutionException e) {
            throw new RuntimeException(e);
        }
    }
What if I need to guarantee no deadlock?

In that case, you need to guarantee that the semaphore’s permits are acquired in order. You can probably do this if you can guarantee that only one thread is trying to acquire the permits and also guarantee that the thread trying to is the earliest one you queued. Something like this:

    /**
     * Submits part1 to run in parallel, but only as many as are permitted can run simultaneously.
     * Also saves in the number of permits per input.
     *
     * @param part1      Time-consuming parallelizable part.  Should never return null.
     * @param getPermits For when you have dynamic-sized Intermediates. (e.g. file contents).  If unknown, over-estimate so you
     *                   don't run out of memory.
     * @param semaphore  This keeps track of how much memory is available for the intermediate results, so we don't start
     *                   part1(input) if we won't have enough memory for the Intermediate result.  It has to be Fair, meaning it
     *                   doesn't allow a less-memory-dependent intermediate to start before a more-memory-dependent intermediate
     * @param ps         This keeps track of the number of permits needed for each Input.  To save us from calling getPermits
     *                   repeatedly.
     * @return This keeps track of the submitted part1's and their order
     */
    @Nonnull
    private static <Input, Intermediate> List<Future<Intermediate>> submitPart1(final List<Input> inputs,
                                                                                          final Function<Input, Intermediate> part1,
                                                                                          final ToIntFunction<Input> getPermits,
                                                                                          final Semaphore semaphore,
                                                                                          final List<Integer> ps) {
        // This keeps track of the submitted part1's and their order
        final List<Future<Intermediate>> futures = new ArrayList<>();
        // We need to manage our own thread pool.  Use all available processors.
        final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
        // used to keep track of which one should be allowed to start running
        final AtomicInteger   current  = new AtomicInteger(0);
        // used to make sure the current one is the one that is permitted to run
        final Object          lock     = new Object();
        for (int i = 0, inputsSize = inputs.size(); i < inputsSize; i++) {
            final int   finalI = i;
            final Input input  = inputs.get(i);
            final int   p      = getPermits.applyAsInt(input);
            ps.add(p);
            futures.add(executor.submit(() -> {
                // keep checking if we are the current one
                boolean isCurrent;
                do {
                    // make sure we are the only one checking, so if we are permitted to run, we get the permits
                    synchronized (lock) {
                        // are we permitted to run?
                        isCurrent = current.compareAndSet(finalI, finalI + 1);
                        if (isCurrent)
                            // get the permits
                            semaphore.acquire(p);
                    }
                } while (!isCurrent);
                // run part1!
                return part1.apply(input);
            }));
        }
        executor.shutdown();
        return futures;
    }

iOS Safari 10 address autofill

Why does the iOS Safari 10 address autofill not work well with autocomplete?

In an HTML form with a text input element for an address, you may find that Safari on iOS 10 automatically prompts the user to enter one of the addresses from their contact info, without needing the user to enter a single letter.  After tapping on one of the addresses, Safari fills the field with the address, and Safari seems to assume the user will not enter any more text into this field, and then removes the focus from the input field.  This prevents autocompleters from displaying completions as it appears the user has lost focus.

To replicate this, you need a contact in iOS 10 for yourself, with an address set:

ios contact with work address
ios contact with work address

You’ll also need to tell iOS that this is you (search Settings for “My Info”):

My Info in iOS Settings
My Info in iOS Settings

Now when you focus on an address text input field in Safari, it’ll suggest autofill with your contact address(es):

autofill address suggestion
autofill address suggestion

If you tap on the address, you’ll see that your address is autofilled without you needing to enter another single thing.  Unfortunately, on our site, you need to select an address from the autocomplete list in order for it to work.

If you connect desktop Safari’s Web Inspector, record a Timeline, and look at the JavaScript & Events, you’ll see that the input event is being triggered. So why isn’t the autocomplete showing? We couldn’t find any information on the web about it (maybe we’re using the wrong keywords), so we set about diagnosing and working around it ourselves.

We use two autocompleters on our site (at the moment): jQuery UI 1.10’s widget and Awesomplete. jQuery UI wouldn’t trigger the source function, even if we set delay to 0 and when the list is ready.  Awesomplete checks that the the text input element is still active (in focus). If the list was ready synchronously with the input event, then Awesomplete might show it, but we prepare the list asynchronously, so by the time we give it to Awesomplete, Safari has already taken away the focus (blur event).

Our fix for jQuery UI was two-fold:

  1. force the search when the input event is triggered:
    $addressInput.on('input', function () {
      $addressInput.autocomplete('search', this.value);
    });
    
  2. override the response so the search is is un-cancelled:
     $.widget('custom.autocomplete', $.ui.autocomplete, {
      __response: function(content) {
        this.cancelSearch = false;
        return this._super(content);
      }
    });
    

Our fix for Awesomplete was a little more intrusive, to comment out the check that the input element is still active (yes, we know we can just call evaluate after setting the list):

// if (document.activeElement === this.input) {
   this.evaluate();
// }

So now, if the user appears to have lost focus (either because they have, or because Safari blurs the field), the autocomplete will still display:

autofilled address with autocomplete
autofilled address with autocomplete

check if point is in polygon algorithm javascript test

Recently we needed a point-in-polygon algorithm in JavaScript for finderful.com.  There are many implementations out there, and there are more efficient ways of doing it, but we find the ray-casting method to be the simplest.

We recently submitted our JavaScript implementation to Rosetta Code, and you can find a commented version below:

/**
 * @return {boolean} true if (lng, lat) is in bounds
 */
function contains(bounds, lat, lng) {
    //https://rosettacode.org/wiki/Ray-casting_algorithm
    var count = 0;
    // iterate over the segments of the polygon and count how many times the point is west of the segment
    for (var b = 0; b &lt; bounds.length; b++) {
        var vertex1 = bounds[b];
        var vertex2 = bounds[(b + 1) % bounds.length];
        if (west(vertex1, vertex2, lng, lat))
            ++count;
    }
    // if it is west of an even number of segments, then point is outside the polygon 
    return count % 2;

    /**
     * @return {boolean} true if (x,y) is west of the line segment connecting A and B
     */
    function west(A, B, x, y) {
        // make it so segment point A is not above segment point B
        if (A.y &lt;= B.y) {
            // if the point is outside the y-range containing the segment then the point is definitely not west of the segment
            // or if the point is east of the x-range containing the segment then the point is definitely not west of the segment
            if (y &lt;= A.y || y &gt; B.y ||
                x &gt;= A.x &amp;&amp; x &gt;= B.x) {
                return false;
            } else
            // if the point is west of the rectangle containing the segment then the point is definitely west of the segment
            if (x &lt; A.x &amp;&amp; x &lt; B.x) {
                return true;
            } else
            // compare the slope of the line segment from A to the point and the slope of the segment from A to B
            // the first slope is higher if-and-only-if the point is west of the A-B segment
            {
                return (y - A.y) / (x - A.x) &gt; (B.y - A.y) / (B.x - A.x);
            }
        } else {
            return west(B, A, x, y);
        }
    }
}

Fast & efficient k-nearest neighbors (knn) search: algorithm analysis & example

Question:

What data structure (if any) is most efficient (fastest) for performing k-nearest neighbors (knn) search for finderful.com?

Experiment:

On a dataset of about 7500 WGS 84 Web Mercator coordinates, we attempt to find the k-nearest neighbors (where k=32) of each coordinate.  Each coordinate represents an available home in Australia.  Although we do not search for the neighbors of only these coordinates, it is a reasonably accurate estimate of our use case.  We time how long it takes to search for the neighbors of all coordinates and record our findings below.

We test the implementations found in weka 3.9.0: LinearNNSearch, CoverTree, BallTree, and KDTree.  LinearNNSearch appears to use a heap.  We also test our own implementation, which uses Java 1.8’s TreeMap (red-black trees).

We use Euclidean distance in degrees as our measure, and do not attempt to approximate meters.  For the weka NearestNeighbourSearches, we use weka’s EuclideanDistance class, and for our implementation we use our own Comparator.

Also for the weka NearestNeighbourSearches, we include the time of converting to and from Instances.  We repeated the weka implementations a few times.  We did not repeat our implementation.

Results:

Analysis:

Unsurprisingly, the Ball Tree and the KD-Tree did quite well, as they trim the search space significantly before they perform the search.

The Linear kNN Search iterates over the 7500 coordinates, calculates the distance, then stores the distance and coordinate in a heap. weka keeps the heap around size k by keeping track of the furthest coordinate in the heap, so that the heap always contains the k-nearest neighbors seen so far. The resulting algorithm complexity is O(n log(k)), and the distance is calculated once for each of the n coordinates.

Our own custom implementation is very simple. We simply dump the coordinates into a TreeMap and pull out the first 32. The Tree is sorted by distance using a custom comparator. As you can see from the results, this simple method does quite poorly. One reason is that the algorithm complexity is O(n log(n)) and another is that we calculate the distance twice for each comparison.