I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
139 comments and no one addresses his use of a slur.
Because that’s really just to be expected at this point, and what his audience would want…
Better to focus on constantly poking at him for being dumb, which he and his fans hate, rather than give them what they want, ie being upset at their hateful language
it seems that nobody really cares about the word retard anymore, it’s quite funny how it went from super common language, to being less common, to people just saying it again now.
I’m curious how many people actually consider the word a slur, and how many people even care these days.
He could also refer to the mere possibility of having duplicates which does not mean there are duplicates. And even then it could be by accident. Of course db design could prevent this. But I guess he is inflating the importance of this issue.
He’s just a permanent petulant child.
TIL Elon doesn’t know SQL or have any basic human decency.
J/K, I already knew he doesn’t have basic human decency.
If he knew anything about SQL, he could have run a quick search to see whether any SSNs are actually duplicated. (spoiler alert: they’re not, he’s just stupid).
As a data engineer for the past 20+ years: There is absolutely no fucking way that the us gov doesnt use sql. This is what shows that he’s stupid not only in sql but in data science in general.
Regarding duplications: its more nuanced than those statements each side put. There can be duplications in certain situations. In some situations there shouldnt be. And I dont really see how duplications in a db is open to fraud.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
It doesn’t matter without scope. Are we looking at a database of SSNs? tax records? A sign in log? The social security number database might require uniques in some way, but tax records could be the same person over multiple years. A sign in gives a unique identifier but you could be signing in every day.
It’s like saying a car VIN shows up multiple times in a database. Where? What database? Was it sold? Tickets? Registered every year?
This is nothing more than a “assume I mean immigrants or tax fraud and get mad!” inflammatory statement with no proof or reason.
If it’s used as an identifier to link together rows from different tables. Also known as “joining” tables. SSN (with birthdate) is a unique identifier, and so it’s natural to choose as a primary/foreign key.
It really is baffling trying to make sense of what he is saying. It’s like the only explanation that makes any sense at all is that he has no idea what he is talking about. Even if he knew just cursory knowledge about database cardinality you wouldn’t say stuff so stupid.
Well we heard what the Whitehouse press secretary has to say about the fraud they found 2 days ago. They found massive amounts and she brought receipts! All of them were examples of money being spent that disagree with Trump’s new policies. Like money spent on DEI intuitives and aid sent to countries in Africa to help slow the spread of HIV. That receipt was for a laughable $57,000.
Then when asked how any of it was fraud she said, well they consider that fraud because it wasn’t used to help Americans.
So the 27 year old married to a billionaire 32 years older than her is complaining that the money wasnt directly spent on her gold digging ass, and if it’s not spent directly on her, it’s fraud.
Biggest disgrace of a government that has ever existed.
Musk is the walking Dunning-Krueger, he is too stupid to realize how terrible he sounds.
I saw a comment about this in the last couple of days that was really interesting and educational. Unfortunately I can’t seem to find it again to link it, but the gist of it was that there would be two things wrong with using SSNs as primary keys in a SQL database:
- You should not use externally generated data as primary keys
- You should not use personally identifying data as primary keys
Using SSNs as keys would violate both.
I went looking for best practices regarding SQL primary keys and found this really interesting post and discussion on Stack Overflow:
https://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables
My first thought was that people’s SSNs can and do change, and sometimes (rarely?) people may have more than one SSN. Like someone mentions in that link, human error would be another reason why you would not want to use external data and particularly SSNs as primary keys.
From what I’m seeing in other comments, it seems SSNs aren’t used as primary keys, but they are part of generating the primary key. I haven’t seen anyone directly say it, but it sounds like the primary key is a hash of SSN + DOB (I hope with more data to add entropy, because thats still a tiny bit of data to build a rainbow table from).
Still, assuming we haven’t begun re-using SSNs, it seems concerning to me that a SSN is appearing multiple times in the database. It seems a safe assumption that the uniqueness of a SSN should make the resultant hash unique, so a SSN appearing as associated to multiple primary keys should be a concern, right?
Other comments have led me to believe the “duplicate SSNs” are probably appearing in “different fields” (e.g. a dead man’s SSN would appear directly associated to him, but also as a sort of “collecting payments from” entry in his living wife’s entry). That would a misrepresentation of the facts (which we know Vice Bro, Elon Musk the Wise and Honest would never do). Occam’s Razor though has me leaning in that direction.
I think the thing that’s catching you up the most is that you’re assuming Elon has the slightest clue what he’s talking about about. In your mind, you’ve read the words “the social security database” from his post and have made assumptions about what that means.
I’ve worked with databases for 20+ years, several of those being years working on federal government systems. Each agency has dozens or possibly hundreds of databases all used for different purposes. Saying “the social security database” is so fucking general that it’s basically nonsensical. It’d be like saying “Ford’s car database”.
Elon clearly heard someone technical talking about something, then misinterpreted it for his own purposes to justify what he is doing by destroying our government institutions. His follow up of saying the government doesn’t use SQL just reinforces that point.
Trying to logically backtrack into what he actually meant - and what the primary keys should be - is just sane washing an insane statement.
That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?
I can imagine an SSN existing in more than one primary key due to errors. If they use SSNs in the primary key at all, but combined with something else, that leads me to believe that the designers felt that SSNs were reliable for being a pure primary key.
I agree with you about Occam’s Razor. The guy has demonstrated multiple times that he’s a dishonest moron.
I’m not familiar with cases where someone’s SSN could change. Could you link to resources on when that would happen?
I don’t have any resources handy, but I do know someone who this happened to: they were an immigrant who got an SSN the first time they migrated to the US, went back to live in their country for a number of years, then returned to the US and I guess applied for an SSN again. Voilá, two SSNs and a mess.
Yeah, I can imagine thats be an administrative headache. I do not envy them the opportunity of sorting that out.
Thanks for the example though. That makes sense.
I don’t envy either party either. You’re welcome!
That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?
Yes, in the case of duplicate SSN assignments for two people (rare) l you would need to change their records to align with the new SSN while not changing the records that go the the person who keeps the SSN. We do it with state identifiers and it is a gigantic pain in the ass.
If two numbers are assigned to the same person merging them to one of the two is far easier.
I can definitely imagine all that. Thanks!
It may be bad practice to use SSN as a primary key, but that won’t deter thousands of companies from doing exactly that.
Oh, I hear you!
He is saying the US government doesn’t use structured databases.
At least 90% of all databases have a structure.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
As someone explained in another comment, you often duplicate information due to rules around cardinality to gain improvements in retrieval an. structure. I would be pretty worried if SSSNs were being used as a a widepread primary key in any set of tables - those should generally be UUIDs that can be optimized for gashing while avoiding collisions.
Even if we are being generous to Elon, we could assume that social security payments are processed on mainframes given how many have to go out and the legacy nature of the program. Most mainframe shops I know have adapted an SQL interface for records in some capacity, but who knows what he is looking at.
Government federal IT is done at a per agency basis. I would say oracle database is pretty much the most licensed piece of software the government does use outside of Redhat Linux and windows desktop.
Hanlon’s razor. He’s obviously referring to himself lol.
Clearly the solution is to just use a big Excel spreadsheet.
In our company I’m friends with one of the lead devs. He once told me “no matter what way you look at it, excel is never the answer” lol I’m sure he was a bit biased, but I’ve seen my fair share of macro-ridden abominations over the years
It makes a pretty good calculator. 🧮
Excel is accounting workbook software, it is not suitable for data storage. Although people certainly use it that way.
It’s an amazing tool if only one person is updating / maintaining the file. The moment collaboration starts, you’re all fucked. I’m currently maintaining one that I inherited that is at least 10 years old and comes with a 50 page instruction manual on how to run it every month… that then gets posted to a shared drive where anyone can edit.
And then the rest of the month is spent explaining to the end users how they fucked it up this time.
On the flip side, I’ve also built sheets that could parse data between Nav, MySQL, and SQL ERP systems with tables of over 5million rows each on a single button refresh that ran flawlessly for years… because I was the only maintainer and the sheets were locked from accepting changes from other users.
I think a lot of comments here miss the mark, it’s not really just about stating the gov does not use SQL.
Deduplication is generally part of a compression strategy and has nothing to do with SQL. If we’re being generous he may have been talking about normalization, but no one I have ever met has confused the two terms (they are distinctly different from an engineering perspective).
There are degrees of normalization too, so it may make total sense to normalize 3NF (third normal form) rather than say 6NF.
Thats interesting. I didn’t know anything about normal forms, but a quick glance at G4G has some interesting information. I don’t have the time to go through their full article at the moment, but its been added to my to do list.
Link for the lazy: https://www.geeksforgeeks.org/types-of-normal-forms-in-dbms/
This is it, relational databases are normalized under forms, deduplicate is usually a term used when talking about a concrete data set from data sources like a database, not the relational data model in the database itself.
It doesn’t matter anymore to the trumpers. They are eating this shit up like it’s thanksgiving
The ignorance of Elon is truly concerning, but somehow the worst part to me is Elon calling someone a retard for pointing that out.
Ableist, racist white supremacist doing their ableist-racist-white-supremacist thing.
He called a rescuer a pedophile for trying to rescue children…
The US government pays lots of money to Oracle to use their database. And it’s not for BerkleyDB either. (Poor sleepy cat). Oracle provides them support for their relational databases… and those databases use… SQL.
Now if Musk tries to end the Oracle contracts, then Oracle’s lawyers will go after his lawyers and I’m a gonna get me some popcorn. (But we all know that won’t happen in any timeline… Elon gotta keep Larry happy.)
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database
formally, changing the identity of someone would have a very explicit reason to keep a “duplicate” ssn entry, if purely for historical reasons for example. I’m sure there are a myriad of technical reasons to be doing this.
It’s an insanely idiotic thing to say. Federal government IT is myriad, and done at a per agency level. Any relational database system, which the federal government uses plenty of, uses SQL in one way or another. Elon doesn’t know what he is talking about at all, and is being an ultimate idiot about this. Even in the context of mainframe projects thatif we are giving elong the benefit of doubt about referring to, most COBOL shoprbibknow have adapted to addressing internal data records using an SQL interface, although obviously in that legacy world it is insanely fractured and arcane.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Another commentor pointed out a legitimate use case, but it’s not even worth thinking about that much. De-duplocated is usually a word you use in data science to talk aboutakong sure your dataset is “hygienic” and that you aren’t duplicating data points. A database is much different because it is less about representing data, and more about storing it in a way that allows you to perform transactions at scale - retrieval, storage, modification, etc. Relational databases are analyzed in terms of data cardinality which essentially describes tradeoffs in representation between speed of retrieval (duplications good) vs storage efficiency (duplications bad).
The issue is that Elon is so vague and so off the mark that it is very hard to believe that he even has the first clue about what he is a talking about. Even you are confused just by reading it. It is all a tactic to convince others that he is smarter than he is while doing extreme damage to the hardworking people that actually make this stuff possible. Have you noticed that the man has never come to a conclusion that wasn’t in his interests? This is not honest intellectualism, or discussion based on technical merit. It’s self serving propaganda.
Well, if someone changes their name you’d add a new record with the same SSN to hold their new name, that way it keeps the records consistent with the paperwork; old papers say their old name and reference the retired record, new papers use their new name and reference the new record.
You can use the SSN as the key to find all records associated with a person, it doesn’t have to be a single row per SSN, in fact that would make the data harder to manage and less accurate.
E.g. if someone changes their last name after getting married, it could be useful to be able to have their current and former name in the database for reference.