#archiveteam 2014-02-02,Sun

โ†‘back Search

Time Nickname Message
00:23 ๐Ÿ”— Nemo_bis no repo to clone? http://archiveteam.org/index.php?title=AOL_Music
02:27 ๐Ÿ”— chfoo i added a link to an archive on that aol music page. not sure if the project is finished though...
02:29 ๐Ÿ”— chfoo i also have a list of 50 items in archive.org that aren't in the archiveteam collection too
08:52 ๐Ÿ”— arkiver hmm
08:52 ๐Ÿ”— arkiver wallbase.cc is harder then I thought
08:52 ๐Ÿ”— arkiver I can quickly download 100 pages in the beginning, but then it slows down
08:53 ๐Ÿ”— arkiver I think they have some kind of security which makes it slow down... :(
08:55 ๐Ÿ”— aggrosk Yeah. You'll have to apply some cleverness. The admin has put at least some work into making scraping difficult. I remember when I was trying to grab some wall papers from a specific search, that the server seems to check the Referrer header value to make sure you got to the raw link from the respective page and wouldn't return the image otherwise. In any case, you'd want to rate limit your script.
09:00 ๐Ÿ”— arkiver yeah...
09:00 ๐Ÿ”— arkiver well
09:00 ๐Ÿ”— arkiver it is going
09:01 ๐Ÿ”— aggrosk Is wallbase shutting down? Or is this just a pre-emptive grab?
09:02 ๐Ÿ”— arkiver well
09:02 ๐Ÿ”— arkiver their forums have shut down
09:03 ๐Ÿ”— arkiver they say theior website is staying online, but... yeah
09:03 ๐Ÿ”— aggrosk Which might not bode well for the rest of the service. Huh.
09:03 ๐Ÿ”— arkiver just in case
09:03 ๐Ÿ”— arkiver I'm now doing 58 per minute
09:03 ๐Ÿ”— arkiver links
09:03 ๐Ÿ”— aggrosk Is there a repo up or a page on the wiki?
09:03 ๐Ÿ”— arkiver 64 per minute
09:04 ๐Ÿ”— arkiver I don't know...
09:04 ๐Ÿ”— arkiver you can check for that
09:04 ๐Ÿ”— aggrosk Don't see anything in either spot, though it does look like there was a reason "panic" grab here: https://archive.org/details/wallpapers.wallbase.cc-rozne-wallpaper-jpg-1-to-100000-20140130
09:04 ๐Ÿ”— arkiver yes
09:04 ๐Ÿ”— arkiver that one was done godane
09:05 ๐Ÿ”— arkiver just a small portion of the website
09:05 ๐Ÿ”— arkiver I'mn doing everything
09:05 ๐Ÿ”— aggrosk Cool. I'll add something to the wiki at least.
09:05 ๐Ÿ”— arkiver :)
09:05 ๐Ÿ”— arkiver thank you!
09:08 ๐Ÿ”— arkiver yipdw: zoom works!!! :D
09:24 ๐Ÿ”— aggrosk http://archiveteam.org/index.php?title=Wallbase ; consider that a starting point. Y'all can add what you need to it.
09:31 ๐Ÿ”— arkiver Thank you aggrosk
09:32 ๐Ÿ”— aggrosk Np. Just updated with some info from the FB and twitter pages.
09:32 ๐Ÿ”— aggrosk Looks like the owner is MIA.
09:33 ๐Ÿ”— arkiver average of 59 pages per minute now
09:34 ๐Ÿ”— aggrosk You ought to upload your code to the wiki at least. Or at least link to it.
12:41 ๐Ÿ”— Nemo_bis https://archive.org/post/1010731/check-for-hash-md5-or-sha1-to-search-item-or-verify-s3-upload
13:02 ๐Ÿ”— joepie91 Nemo_bis: please update me on that, if there's any responses
13:02 ๐Ÿ”— joepie91 :p
13:02 ๐Ÿ”— joepie91 I kinda need hash search
13:06 ๐Ÿ”— arkiver would be very helpfull indeed
13:15 ๐Ÿ”— Nemo_bis joepie91: add a comment and you'll be notified by email if there are replies, no? ;)
13:31 ๐Ÿ”— joepie91 Nemo_bis: oh, no idea, not familiar with how IA forums work
13:31 ๐Ÿ”— joepie91 :P
13:34 ๐Ÿ”— joepie91 Nemo_bis:
13:34 ๐Ÿ”— joepie91 "Iรƒยขร‚ย€ร‚ย™d like to receive email when someone responds to this post"
13:34 ๐Ÿ”— joepie91 seems to only apply to replies to me
13:34 ๐Ÿ”— joepie91 not to eht thread
13:34 ๐Ÿ”— joepie91 the *
14:13 ๐Ÿ”— arkiver I will be able to do aroud 40000 link every 12 hours from wallbase.cc
14:13 ๐Ÿ”— arkiver that's because of their "limit"
14:15 ๐Ÿ”— Dud2 What size will that archive be in total? and how many links are there in total?
15:56 ๐Ÿ”— Nemo_bis hashlib in python seems awfully slow
15:57 ๐Ÿ”— Nemo_bis Not that I'm surprised, but I hoped at most one order of magnitude slower than md5sum.
17:26 ๐Ÿ”— Nemo_bis :( <Error><Code>SlowDown</Code><Message>Please reduce your request rate.</Message><Resource/><RequestId>5a9f5f99-5658-4959-ab01-ac958cb203eb</RequestId></Error>
17:31 ๐Ÿ”— joepie91 Nemo_bis: aws?
17:32 ๐Ÿ”— Nemo_bis joepie91: no, IA
17:32 ๐Ÿ”— joepie91 ah
17:32 ๐Ÿ”— joepie91 similar
17:32 ๐Ÿ”— joepie91 didn't know they had rate limiting though
17:32 ๐Ÿ”— joepie91 :P
17:36 ๐Ÿ”— Nemo_bis Not the first time I hit their limit, but it's rare enough.
17:38 ๐Ÿ”— Nemo_bis Most often, it happens because there are too many waiting tasks on the item in question.
17:39 ๐Ÿ”— Nemo_bis The queue is half of what it was a couple days ago, why complain? :P https://archive.org/~tracey/mrtg/derivesg.html
18:32 ๐Ÿ”— Nemo_bis joepie91: there's already an answer by the awesome Jeff. :) Though I'm not sure the docs he linked contain an answer to the first question.
18:32 ๐Ÿ”— joepie91 Nemo_bis: thanks!
19:31 ๐Ÿ”— arkiver Dud2: I'm not sure how big it will be... Maybe 300GB to 1 TB?
19:31 ๐Ÿ”— arkiver something around that I think
19:31 ๐Ÿ”— arkiver but I'll make it
19:31 ๐Ÿ”— arkiver they say they don't have plans to shut the website down
19:31 ๐Ÿ”— arkiver so there should be some time
19:32 ๐Ÿ”— Dud1 Okay, I was going to offer to help, but with that size I don't think I would be able.
19:42 ๐Ÿ”— arkiver Dud1: ah, thank you anyway! :)
19:42 ๐Ÿ”— arkiver you are experienced with making crawls?
19:44 ๐Ÿ”— Dud1 Nope, but willing learn.
20:34 ๐Ÿ”— Nemo_bis eek, by checking hash I found 123 wikis where upload had failed. :(
20:43 ๐Ÿ”— Nemo_bis Ouch, many of those are readonly servers.

irclogger-viewer