Not a fan of Musk at all, but Lidar is quite expensive. A 64 line lidar with 100m+ range was about 30k+ a few years ago (not sure how prices have changed now). The long range lidar on the top of the Waymo car is probably even higher resolution than this. It’s likely that the sensor suite + compute platform on the waymo car costs way more than the actual Jaguar base vehicle itself, though waymo manufactures it’s own lidars. I think it would have been impossible to keep the costs of Teslas within the general public’s reach if they had done that. Of course, deploying a self driving/L2+ solution without this sensor fidelity is also questionable.
I agree that perception models will not be able deal with this well for a while. They are just not good enough at estimating depth information. That being said, a few other companies also attempted “vision-only” solutions. TuSimple (the autonomous trucking company) argued at some point that lidar didn’t offer enough range for their solution since semi trucks need a lot more time to slow down/react to events ahead because of their massive inertia.
I work in a related field to this, so I can try to guess at what’s happening behind the scenes. Initially, most companies had very complicated non-machine learning algorithms (rule-based/hand-engineered) that solved the motion planning problem, i.e. how should a car move given its surroundings and its goal. This essentially means writing what is comparable to either a bunch of if-else statements, or a sort of weighted graph search (there are other ways, of course). This works well for say 95% of cases, but becomes exponentially harder to make work for the remaining 5% of cases (think drunk driver or similar rare or unusual events).
Solving the final 5% was where most turned to machine learning - they were already collecting driving data for training their perception and prediction models, so it’s not difficult at all to just repurpose that data for motion planning.
So when you look at the two kinds of approaches, they have quite distinct advantages over each other. Hand engineered algorithms are very good at obeying rules - if you tell it to wait at a crosswalk or obey precedence at a stop sign, it will do that no matter what. They are not, however, great at situations where there is higher uncertainty/ambiguity. For example, a pedestrian starts crossing the road outside a crosswalk and waits at the median to allow you to pass before continuing on - it’s quite difficult to come up with a one size fits all rule to cover these kinds of situations. Driving is a highly interactive behaviour (lane changes, yielding to pedestrians etc), and rule based methods don’t do so well with this because there is little structure to this problem. Some machine learning based methods on the other hand are quite good at handling these kinds of uncertain situations, and Waymo has invested heavily in building these up. I’m guessing they’re trained with a mixture of human-data + self-play (imitation learning and reinforcement learning), so they may learn some odd/undesirable behaviors. The problem with machine learning models is that they are ultimately a strong heuristic that cannot be trusted to produce a 100% correct answer.
I’m guessing that the way Waymo trains its motion planning model/bias in the data allows it to find some sort of exploit that makes it drive through crosswalks. Usually this kind of thing is solved by creating a hybrid system - a machine learning system underneath, with a rule based system on top as a guard rail.
Some references:
(Apologies for the very long comment, probably the longest one I’ve ever left)