I should preface this by saying that I love Project Euler--I spent a ton of time there while learning to program. I also am impressed by anyone who volunteers to create something for the community, and invests effort in maintaining it.
However, not storing emails, and thereby giving up account recovery with the explanation that it's about security is a shit sandwich.
My email is <myfirstname>.<mylastname>@gmail.com, a pattern I share with millions of people. This is public information. I could spray paint my email address on local bridges without in any way making my email less secure (cops might complain, though).
I understand that some people have reasons to have private email addresses that they don't want released (they'll give them to family, but not the general public). They should never sign up for anything with those email addresses, because the moment you sign up for things, you will almost certainly be entered in a database somewhere, and eventually be spammed or subjected to whatever other bad consequences you're concerned about.
Account recovery is a basic feature of a website (except those that contain data too sensitive to have account recovery), and they're giving it up for phantom security.
It's one thing to decide not to store emails (sure, why not?) but account recovery shouldn't even require one to store email addresses.
Check the the email provided by user via the recovery form against a hash of the email saved during registration, if it matches send the reset link. This way when data is breached, figuring out what the original email should be hard (if not impossibly hard, depending on how they hash it).
A salted hash would completely eliminate any ability to look up accounts by email address, since you would have to hasn't the email against the salt for every account in the database until you landed on the correct one.
You could have one of two approaches.
1. Don't use a per user salt and just use a global salt. You can counteract this decrease (a bit) in security by increasing the key stretching part of your hashing algorithm.
or
2. Require the user to submit their email address AND username and store the salt in the user's record.
I agree with someone up there that email address != password. It's refreshing to see someone that gives a crap about my privacy though.
I had a visceral reaction to "global salt" (I've always heard this called: "pepper") because it's so insecure for passwords, but I guess it's not as bad for email addresses. In the case of passwords, we find that a global salt is fairly ineffective because too many people use common (stupid) passwords like "password." If 10% of the hashes are the same, you can probably figure out what that hash means pretty quickly. We don't have this overlap problem with emails so it's less scary.
Still, simply storing the create time or a randomly generated salt right on the user table is more secure than using a global salt.
Right but in this case you would essentially have to use every salt from every user to hash the submitted email and say "aha user 123,456 has a salt and hashed email which matches the submitted email of foo@example.com" which is why I suggested method 2.
I am user "Arnor" and my email is "foo@example.com". Ok, your salt is #@$%@#$%DGDFfdgdawer.
If your hashing algorithm is appropriately "expensive" the scan all user salts would not work.
This thread is dishearteningly full of cargo-culted cryptography. You would think HN would do better. A global salt is not a thing. It's a misapplication of a cryptographic component. The appropriate tool when you think you need something like this is an HMAC and a secret HMAC key.
Once again we are talking about having an email retrieval function while not storing the email address in plaintext. A "global salt" or "pepper" (as we are apparently calling it now) just prevents enormous pre-generated rainbow tables but admittedly in today's gpu dominated cracking environment it probably doesn't get you much.
Personally I think not having a password retrieval function while simultaneously forcing all of your users to reset their password is a pretty user unfriendly tactic for the protection of an ostensibly public piece of information.
Hello? Lets say you have 5 billion accounts, in salted sha1. According to openssl speed sha1, you'd take less than 38 minutes on my ancient desktop to look up an email, on average half that. If the shoe fits, send the email, if not don't?
Sure having another field to match on (eg: username) for locating the correct salt would be good -- but it's certainly not infeasible to do a brute force search (probably want to queue up password request requests, though). Now, if you went with bcrypt or scrypt -- things would, by design, break down a bit. I still think you'd be able to send a reset mail within 24 hours for most reasonable configurations and number of users...
Taking unnecessarily long to handle a lookup request might leave server very vulnerable to DDoS attacks leveraging this "account recovery" option, I think.
Even worse, an invalid email would take the longest possible time, every time.
And since this is only an email address we are talking about, a global salt + more stretching (like runamok mentioned above) could be secure enough while still providing faster lookups.
Of course, you could protect from the DDoS by maintaining a secondary application server which connects to a slave database. Then the requests for account recovery wouldn't impact the rest of the system. :)
That's why I suggested a queue, so you'd only ever need to have a maximum of <total number of accounts> pending. I missed the part about using this for login as well as recovery though (but also, note that numbers are for 5 billion accounts, an scales linearly with accounts -- so divide by 2000 for half a million accounts).
38 minutes to log into your account seems excessive. Remember, this isn't just for password resets — it's to look up an account in the database by email address.
Ah, fair point. So we agree they need a (possibly not unique) user name too. [edit: note this is for 5 billion accounts, on a 10 year old cpu. So for 500.000 accounts, it would be ~1 seconds (average 0.5). Still probably too long for log-in (or particularly log-in failure feedback)).]
You do not appear to understand the purpose of a salt. A salt should never be derived from the data it is to be used with. The entire purpose of a salt is to cause two identical inputs to a hash function to produce distinct outputs.
> figuring out what the original email should be hard (if not impossibly hard, depending on how they hash it)
I mean, passwords are way more sensitive than emails, especially given that many people re-use them. So, how you hash passwords is more critical than how you hash emails (which is rarely done, I guess).
On the other hand, there is no reason to not have the same level of protection for emails, if you are already following best practices for passwords anyway (PBKDF2, bcrypt, scrypt etc.).
A little off topic, but is there any reason to still be talking about salted hashes when we have bcrypt and scrypt these days? Seems like an anachronism.
Of course you're right, and I can't believe I didn't realize that. I think my point still stands that it's a bit silly to worry about storing emails, but you're right that you can even avoid that risk by encrypting them if desired.
I think this is the key point. Security for relatively obscure and unimportant sites isn't really about those sites, it's about other sites. People reuse passwords a lot. They shouldn't, of course, but you can shout that from the rooftops all week and it won't change the fact that a lot of people do. If you suffer a significant breach, then a decent percentage of users will have their bank accounts put at risk from it. You can simply put the blame on the dumb users who reused passwords, but it's reasonable to want to do more.
This is a good point, and a reason to do the right thing with regard to emails--which is to store a safe version of them (bcrypt).
Because while an email and a password is not public information, a username and a password isn't public information either. If you don't trust yourself to store the former, you shouldn't trust yourself to store the later much either.
Using bcrypt on email addresses is pants-on-head retarded. Please stop cargo-culting cryptography.
How do you propose to look up accounts by email address if they use a salted hash? You would have to bcrypt the email against every row in the database until you found the correct one. If you use a username to do the lookup instead, why store the email address at all? You can't use it for anything.
You're right and wrong. Right because it's a crazy idea.
Wrong, because it's the logical conclusion of the belief that emails must be treated with as much care as passwords. If you really think that, then you need to encrypt them, and therefore you have to give up the ability to look up user accounts by email address. All you could do is verify that a user-submitted email is associated with a user-submitted account. That's where you end up when you have that sort of paranoia about email addresses.
But that conclusion is, like you said, absurd, and I never should've implied otherwise. I wasn't thinking when I wrote it.
Particularly, the same kind of threat to other accounts belonging to the same person exists with username/password combinations as with email/password combinations; after all, people reuse username/password combos as much as email/password combos.
So, the same general class of people who you endanger by not storing email/password securely are endangered if you stop storying email and just have username/password.
And a lot of that class will have emails that can be quickly guessed by appending one of "outlook.com", "gmail.com" or some other popular free-webmail provider to the username, because if they reuse usernames and passwords, its quite likely they do it on their mail site and that they have a webmail provider. So while what Euler has done clearly has a significant convenience impact, it has negligible security impact.
Hm... Couldn't you just sign in by using only your email without any password or any other extra stuff? I mean, it's not like there's any sensitive information there. Could lead to some trolling, but I think trolls and Project Euler don't have much overlap. In some cases I think it's valid to ask "why security?".
A site that purports to teach is incapable of learning of how to strike a balance between securing confidential information and making it possible to recover an account. This is a solved problem. If my bank can have a password recovery system, a site about numbers can have one too.
> " With respect to this issue it is quite possible that some members will have genuinely forgotten passwords."
To be fair, your bank likely has a bit more money to throw at this problem.
I would think in this case the entire point is not so much to help them secure stuff, but an attempt to remove them as a target for hacking in the first place.
> but an attempt to remove them as a target for hacking in the first place.
This is very short sighted. As long as you have a popular site you're a target for defacement. And the convenience expense is enormous. As others have mentioned oauth or a twitter or facebook login alternative would have been a sane choice, what they've decided wasn't sane, it's embarrassing for them and frustrating for users who trusted the site.
Inconveniencing users to this degree is probably causing the hackers to laugh, this is in effect a huge win for them they can go brag about now in addition to accessing sensitive information.
I think you are at least slightly overstating how inconvenient this is. I mean, yes, I could wish it was easier. No, this isn't going to stop me from getting back on the site.
And how many answers did you lose? Because I lost a bunch. I'm not overstating anything, I'm honestly frustrated and dispirited because of a high degree of incompetence and bad judgment.
1) had solved a bunch of Project Euler problems, but fewer than 200 (account recovery is still available for those folks),
2) lost/forgot your signon information, and
3) lost/deleted all the code you used to find the answers?
You, sir, are in a very small boat. A frustrating boat, to be sure, but I suspect that virtually none of their users share your fate.
I'm in pretty much the same boat. The actual problems I don't really mind (I have to code to some, and it wouldn't hurt to revisit the rest), but I'd very much like to have my username back.
OK, so it's an extremely minor issue, but given that the reason for it is so silly, it's still kinda irritating.
You mean you didn't save all your results? Don't a lot of the problems build on previous results? Why would one lose anything? FYI I was interested but did not start down the projecteuler rabbit hole myself, so perhaps I'm missing something.
> To be fair, your bank likely has a bit more money to throw at this problem.
I suppose they could charge 1 USD for (lifetime) membership and store the last four digits of your credit card in lieu of a username, so that they could easily look up the salt that gives the salt with witch they've hashed your email... ;-)
(Would require that you could supply the last digits of your possibly expired credit card, when you lost the password ten years hence ...)
The last four digits of your credit card should not be considered secure information. It's printed on all of your receipts. You carry it on your person in plain text. Many of your online accounts will display it in your account settings without an additional login. It's probably in both your mail and your email.
Once someone has it, they can use it for years for recovery on any service that accepts it, and I know some will allow full account recovery using it alone.
> [...] because the moment you sign up for things, you will almost certainly be entered in a database somewhere, [...]
Oh, you've lost the game long before that. Grandma's email chain? Welcome to the database as soon as anyone on that list gets their email compromised. Apologies to all the grandmothers out there who know how to use the BCC field.
As far as I'm aware, Project Euler doesn't make any actual money, so you have to give the team behind it a lot of kudos for actually taking the time to get it back up and running.
Must have been really tempting to just sack it off as a bad job. Congrats to the team!
If Project Euler is trying to make itself less interesting to hackers/less vulnerable by storing less information(email), why don't they consider OAuth for login?
I know OAuth has it's own warts, but isn't part of the point to offload the burden of authentication to someone else?
Also, feel free to replace OAuth with Mozilla Persona or OpenID.
[edit] - s/storing less password/storing less information\(email\)/
Also, I've been thinking of this for a while, but Project Euler needs to be open-sourced. I think this would help people who don't necessarily want to contribute money. I thought rather than just making suggestions, I could make a pull request for implementing OAuth/Persona/OpenID login -- then I realized it wasn't open source...
I've been keeping this idea close to the chest, mostly because it's something I want to do, but Project Euler could easily become a great training tool, an easy-to-install packaged django application(I mention django for it's nice out-of-the-box admin interfaces, doesn't matter what it is as long as it's easy to manage for admins and users)
> The decision to no longer store any private/personal information in no way reflects a lack in confidence of the steps we have taken to make the new website secure, but if history teaches us one thing it is that for every "unsinkable" Titanic built there will always be icebergs.
I love PE and I don't intend this question snarkily at all, but am genuinely curious why securing a database of emails for a site as simple as PE would be such a perilous problem? I know security in general is always more difficult that it appears, but in this case I would have thought we were dealing with a solved problem. I'd love to hear about why my assumptions are wrong.
Security is never a 'solved problem'. There is always a trade-off between usability, performance and security.
Think of it this way: You run a server on hardware that no single human understands fully. On top of that, you have some devices for which pretty much the same applies. On top of that you have an operating system consisting of millions of lines of code, and again, nobody can fully grasp everything. On top of that, you have your webstack which adds even more complexity. You are hooked up 24/7 to a network filled with criminals.
Security is not just intrusion prevention, it's also detection and recovery. PE chose to reduce the negative effects of a successful intrusion.
Not storing personal information now removes a whole class of work you have to do. While you store personal information, you need to spend time keeping on top of security patches and issues. It means you need to worry about legal obligations.
If you just don't store it, you don't have to worry about that. You need to spend less time per month maintaining the project. If you're a volunteer project, you might not have the time available to keep on top of it.
Unfortunately security is never a solved problem as there are always new attack vectors beings discovered. I think the statement just reflects their desire to take zero risk. The only way to do so is to avoid storing this information entirely.
I'd have to agree, PE isn't a huge website and it isn't really storing sensitive data. I don't see why they're acting like it's impossible to secure their data.
I'd say 100 solved problems is pretty impressive. I have about 75-ish done with no math background (other than a few classes for my CS degree).
The first 50 should be doable for most people in my opinion. After that you need to start being really clever or actually going and researching the problem at hand.
It's easier than you might think. Only read this comment as far as you need to make a little more progress. These hints only apply if you're starting with a dynamic programming approach. Also, they're only helpful if you do the work, so I don't think they violate the spirit of Project Euler.
Is your dynamic programming table 2D? Maybe it should be.
Stop reading if that's progress.
Have you looked closely at the table content?
Stop reading.
Have you noticed how similar many parts are, under perhaps simple transforms?
Stop reading.
Try adding or subtracting your coordinates, to see more of the pattern.
Stop reading.
There's not just a within-row pattern.
Stop reading.
In the end, dynamic programming might not be the main trick. But you can make your own way from here.
It's really tough to say. Some people have awesome math degrees, some people only do it casually, some don't do it at all.
Keep doing the problems until you get stuck. Then learn, then solve a few more. Repeat as long as you're happy with your progress. I wouldn't put a number on it.
In fact, do some easy ones in a completely new language!
Neither the news page, nor the "about" page, nor the front page of "Project Euler" care to explain what this website is all about. Of course, I can guess that it has to do with mathematical problems of some sort.
It is sad if you have to turn to Wikipedia to find out the basic details about a website. A sentence or two of introduction would have made everything better :-)
The site used to have a front page that clearly explained what it was all about. I assume that it will be restored along with the rest of the functionality on Saturday.
On the off chance that other comments here haven't made it clear what it's about, or that you haven't already looked it up, they apparently have a Wikipedia article about them:
"""
Project Euler (named after Leonhard Euler) is a website dedicated to a series of computational problems intended to be solved with computer programs.... Problems are of varying difficulty but each is solvable in less than a minute using an efficient algorithm on a modestly powered computer. A forum specific to each question may be viewed after the user has correctly answered the given question.
"""
There are also several Github repos out there that have both the problems and hashes of the answers. (Some have the actual answers, as well, or used to in the git history, but presumably anyone interested in solving the problems is more interested in the process than the score.)
I created an account but couldn't log in.
As I've had the same happen before, I tried using only the first 32 characters of my password when logging in. That worked.
Remember kids: Most software development isn't about puzzle solving and algorithms, it's about making stuff like forms work properly.
Of course the puzzles and algorithms are fun, which is why I'm signing up for PE again!
Bourbaki was a French collective of mathematicians, working/publishing together pseudonymously. I would therefore guess that the same kind of structure is in place at Project Euler: It's not one guy, but a joint effort operating under one name.
Btw, anyone has a list of subset tasks on this Project Euler more related to pure CS/Algorithms rather than Math? Preferably mentioned the level of experience. So far, as I can see, it is aimed for very beginners, right?
Anyone interested in creating an open source version? It could have more features - such as running the code online, and more topics - such as non-math challenges.
I created http://www.learneroo.com which lets people solve programming challenges (and other challenges) online. It's not currently open-source, though if there was interest I would consider open-sourcing it. (I would first need to clean up some code that I didn't think anyone would see!)
There's a number of sites that are not open source that use coding tasks like this and tie them with leaderboards and tie-ins to recruiters. As well as hacckerrank, there is codeeval, and there are some others whose names escape me at the moment.
But that doesn't really address a question about putting together an open-source one.
There's also at least one similar-to-Euler one -- rosalind.info (like Euler, but bioinformatics focus) -- which might be closer to responsive, since even though its not open source, their FAQ says they intend to open-source it...
Well, I'm glad I have a git repo of all my solutions, so I can get back up to my original 102 problems solved. And then go back to not doing it again because it is too hard now.
It's a site full of problems, which generally require some mixture of math and programming skills to solve. You can trade off between the two. If your math is good enough, some problems can be solved with pencil and paper. If your programming is decent, some can be solved by brute force search. There's no time limit, you don't show anyone your code, you just type a brief answer into a text field, so the only constraint on the efficiency of your code is how long you're willing to leave it running.
It's a lot of fun; the math involved can get pretty advanced on some of the problems.
However, not storing emails, and thereby giving up account recovery with the explanation that it's about security is a shit sandwich.
My email is <myfirstname>.<mylastname>@gmail.com, a pattern I share with millions of people. This is public information. I could spray paint my email address on local bridges without in any way making my email less secure (cops might complain, though).
I understand that some people have reasons to have private email addresses that they don't want released (they'll give them to family, but not the general public). They should never sign up for anything with those email addresses, because the moment you sign up for things, you will almost certainly be entered in a database somewhere, and eventually be spammed or subjected to whatever other bad consequences you're concerned about.
Account recovery is a basic feature of a website (except those that contain data too sensitive to have account recovery), and they're giving it up for phantom security.