#archiveteam 2014-06-13,Fri

↑back Search

Time Nickname Message
01:41 🔗 dashcloud well, the answer to that question is yes- which sucked for me
01:42 🔗 dashcloud also, apparently you can't do --output-document and --truncate-output together- wget doesn't recognize the combination, but just fails as if you left off the URL
01:42 🔗 dashcloud (for wget that is)
06:28 🔗 Nemo_bis Uh, still no updates for http://archiveteam.org/index.php?title=Ancestry.com ? Does it have a channel or something?
06:29 🔗 joepie91 if not, I propose #gravedigging as channel name
06:46 🔗 joepie91 goddamnit
06:46 🔗 joepie91 anybody have a copy of HipHop, that music thing?
06:46 🔗 joepie91 site is dead, repo is dead
06:46 🔗 joepie91 no forks
06:47 🔗 joepie91 AUR package refers to source tarball on (now-defunct) site
06:47 🔗 joepie91 no matches in wayback
06:47 🔗 joepie91 well the site is in wayback, but the tarballs are not
06:50 🔗 joepie91 ffs
06:50 🔗 joepie91 how can there be no copies of this thing
06:56 🔗 ivan` https://github.com/rhumlover/hiphop
06:57 🔗 ivan` https://github.com/liamja/hiphop
06:58 🔗 ivan` https://encrypted.google.com/search?q=%22forked%20from%20hiphopapp%2Fhiphop%22%20site%3Agithub.com
07:08 🔗 joepie91 ivan`: wtf? github search gave me 0 results :|
07:08 🔗 joepie91 anyway, thanks
07:08 🔗 yipdw ivan` has Github Search+
07:09 🔗 midas he paid for fastlane
07:09 🔗 trs80 how did hiphop actually work? youtube?
07:10 🔗 midas oh wait, the ISP's dont like that name, you have fastlane, but for faster you need hyperspeedlane
07:11 🔗 joepie91 trs80: node-webkit application, searched on youtube, then (presumably) grabbed cover art for it
07:11 🔗 joepie91 and displayed the list
07:11 🔗 joepie91 as far as I can tell
07:11 🔗 joepie91 so nothing too terribly new (because Tomahawk) but still
07:11 🔗 joepie91 also Tomahawk is buggy as shit, so yeagh
07:11 🔗 joepie91 yeah *
07:12 🔗 yipdw the only Tomahawk I know is the BT + Adam K track, which makes that sentence really funny
07:12 🔗 joepie91 lol
07:12 🔗 joepie91 yipdw: http://www.tomahawk-player.org/
07:12 🔗 yipdw oh
07:12 🔗 trs80 yeah, I could see cover art came from itunes/last.fm
07:13 🔗 joepie91 Tomahawk is a great idea, but last time I tried to use it, I gave up after an hour of chasing inexplicable bugs
07:16 🔗 yipdw huh, Tomahawk looks pretty neat
10:55 🔗 schbirid http://www.minuszerodegrees.net/manuals/
11:01 🔗 midas https://archive.org/post/1018340/add-website
11:01 🔗 midas should I? yes i should
11:16 🔗 nico lol
11:21 🔗 midas archivebot for the win right :p
12:37 🔗 Nemo_bis Ah it's nothing new. :( http://www.reddit.com/r/DataHoarder/comments/27y8ux/standing_up_40tbs_of_data_for_fun_times/ci601wk
13:33 🔗 ohhdemgir midas, what would the archivebot get if you set it off on http://hardforum.com/
13:33 🔗 ohhdemgir off site images from posts included or not?
13:38 🔗 ivan` ohhdemgir: embedded images are grabbed from all domains
13:39 🔗 ivan` links are not
13:39 🔗 ohhdemgir wonder how big hf is
13:42 🔗 ivan` Google thinks 3.5M pages
13:43 🔗 ivan` # of threads adds up to 949,580
13:44 🔗 ohhdemgir what command does the bot run to do the grab?
13:46 🔗 ivan` see https://github.com/ArchiveTeam/ArchiveBot/blob/master/pipeline/pipeline.py#L94
13:59 🔗 Arkiver2 Is the upload speed to the IA very slow today?
13:59 🔗 Arkiver2 It is uploading very slow right now
14:00 🔗 ohhdemgir Arkiver2, same here, very
14:00 🔗 ohhdemgir midas, ^^
14:00 🔗 ohhdemgir random pickups here and there but generally it's been silly slow
14:00 🔗 Arkiver2 Will be fixed in a bit of time probably
14:01 🔗 Arkiver2 I just had to make sure it wasn't anything with me
14:01 🔗 ohhdemgir where are you uploading from?
14:01 🔗 Arkiver2 The Netherlands
14:05 🔗 Nemo_bis Arkiver2: do a traceroute to s3.us.archive.org?
14:08 🔗 Nemo_bis eek, I have 350 ms ping from Amsterdam via HE but 170 ms from Helsinki via ISC. Usual stuff.
14:11 🔗 Arkiver2 yeah
14:11 🔗 Arkiver2 got the same here
14:11 🔗 Arkiver2 but I think it will be fixed in some hours, so I'll just wait
14:14 🔗 Arkiver2 Nemo_bis: ah look: https://monitor.archive.org/weathermap/weathermap.html
14:14 🔗 Arkiver2 HE uploading is 85 to 100%
14:15 🔗 joepie91 damn, congested
14:15 🔗 Co2e35_ 365ms delay here also netherlands
14:19 🔗 Arkiver2 joepie91: could it be some kind of hacking thing?
14:19 🔗 joepie91 Arkiver2: what?
14:19 🔗 midas Arkiver2: from online,bet is slow, OVH was hitting 11MB/s
14:20 🔗 midas s/bet/very
14:20 🔗 Arkiver2 joepie91: nah, nevermind
14:21 🔗 Nemo_bis Arkiver2: that's downloading
14:21 🔗 Nemo_bis but maybe they cap the sum of up/down or something
14:22 🔗 Arkiver2 Nemo_bis: yeah, just maybe someone is flooding it with download requests
14:22 🔗 Nemo_bis nah, IA via HE.net is always slow for me
14:23 🔗 Nemo_bis try to route via Japan if necessary :P to avoid HE
14:23 🔗 joepie91 the only statement more insane than that would be "try to route via Australia, to get better speeds"
14:23 🔗 joepie91 :P
14:24 🔗 Arkiver2 I'll just wait for some hours, speed will be better then hopefully
14:37 🔗 underscor Yeah, our HE uplink is generally pretty saturated
14:37 🔗 joepie91 hai underscor
14:37 🔗 underscor https://monitor.archive.org/pubaccess/graph_2064.html
14:38 🔗 joepie91 damn
14:38 🔗 joepie91 HE must either love or hate you
14:38 🔗 underscor haha
14:38 🔗 Arkiver2 underscor: but it will get back to normal in a while right?
14:38 🔗 joepie91 depending on what you pay :P
14:38 🔗 underscor well, I mean, we just pay for the 10gbit port unmetered, so they agreed to it originally
14:38 🔗 underscor haha
14:39 🔗 underscor Arkiver2: Nothing looks particularly busier than it usually is; how long has it been slow?
14:39 🔗 Arkiver2 not sure
14:39 🔗 Arkiver2 a few hours at least
14:39 🔗 Arkiver2 I was not home
14:40 🔗 Arkiver2 When I came back home it was very slow and according to the total uploaded bytes it was slow for a few hours already
14:40 🔗 joepie91 underscor: heh, probably hate then
14:40 🔗 joepie91 "10gbps unmetered? whatever, they'll never use all of it anyway"
14:40 🔗 Arkiver2 yesterday I it was around 8 to 10 times as fast
14:40 🔗 joepie91 ".... oh...."
14:40 🔗 underscor :D
14:40 🔗 underscor that's basically what happened with isc, haha
14:40 🔗 joepie91 lol, really?
14:40 🔗 underscor although in the end it netted them some really sweet peering agreements
14:41 🔗 underscor yeah, we like, quadroupled their internet footprint XD
14:41 🔗 joepie91 hahaha
14:42 🔗 underscor Arkiver2: Looks like the intake racks are a bit busy today (ia9025) which is probably why it's slow at the moment
14:42 🔗 underscor That's the only thing I see that explains it; catalog seems to be operating normally and the link saturation is in the other direction
14:42 🔗 Arkiver2 underscor: ah, thank you!
14:43 🔗 underscor If you give me your IP I can do a traceroute back to see what route your return traffic is taking, also
14:43 🔗 Arkiver2 hmm
14:43 🔗 Arkiver2 I will be back here in 2 hours ok?
14:43 🔗 underscor (or one near your netblock)
14:43 🔗 underscor sure, just pm me :)
14:43 🔗 Arkiver2 thank underscor!
14:44 🔗 Arkiver2 see you guys later
14:46 🔗 joepie91 sorry, this live CD has a very dubious web browser
14:46 🔗 joepie91 underscor: what did you say>?
14:47 🔗 underscor lol
14:47 🔗 underscor <underscor> Arkiver2: Looks like the intake racks are a bit busy today (ia9025) which is probably why it's slow at the moment
14:47 🔗 underscor <underscor> If you give me your IP I can do a traceroute back to see what route your return traffic is taking, also
14:47 🔗 underscor was basically it
14:47 🔗 joepie91 I se
14:47 🔗 joepie91 see *
14:48 🔗 joepie91 also, boot repair just finished
14:48 🔗 joepie91 so, time to reboot, and probably watch it not boot
14:48 🔗 underscor gl;hf
14:48 🔗 joepie91 not going to be much fun here
14:48 🔗 joepie91 lol
14:48 🔗 joepie91 I hate suicidal bootloeaders
14:48 🔗 underscor haha
14:48 🔗 joepie91 bootloaders *
14:48 🔗 joepie91 :(
14:48 🔗 underscor boatleaders
14:49 🔗 underscor buttloafers
14:49 🔗 joepie91 okay
14:49 🔗 joepie91 time to reboot
14:50 🔗 joepie91 back in anywhere between 5 minutes best case to 5 days worst case
14:50 🔗 joepie91 :P
14:50 🔗 underscor haha
14:53 🔗 Co2e35_ guys whats cacti?
14:54 🔗 underscor a tree based monitoring/graphing application named after the infamous desert plant
14:55 🔗 underscor For router graphs and stuff
14:58 🔗 joepie91 well, that was quite clearly a failure.
14:58 🔗 SN4T14 Co2e35_, plural of cactus.
15:15 🔗 Co2e35_ thanks @underscor
15:25 🔗 Co2e35_ joepie91 u are dutch right?
15:25 🔗 joepie91 Co2e35_: correct
15:25 🔗 Co2e35_ great can u check on #angerthehyve?
15:26 🔗 joepie91 Co2e35_: um, what needs checking?
15:26 🔗 joepie91 because I'm kinda trying to recover my bootloader :P
15:27 🔗 Co2e35_ for what os?
15:27 🔗 joepie91 Co2e35_: openSUSE
15:27 🔗 joepie91 my GRUB killed itself (again)
15:28 🔗 Co2e35_ cant u run it on 2 drives and copying the working one to the broken one sorry my programmer english is horrible
15:28 🔗 Co2e35_ xD
15:29 🔗 joepie91 :P
15:29 🔗 joepie91 Co2e35_: that's implying I have two drives
15:29 🔗 joepie91 I only have one usable HDD
15:29 🔗 Co2e35_ virtual drive?
15:29 🔗 joepie91 my old HDD is 40GB
15:29 🔗 joepie91 and the setup on that is -definitely- broken
15:29 🔗 Co2e35_ let me guess to small :P
15:29 🔗 joepie91 well yes, and it has a broken opensuse and bootloader on it
15:29 🔗 joepie91 let's just say that I was a bit too careless while cloning my disk
15:30 🔗 joepie91 when I got a new HD
15:30 🔗 joepie91 HDD *
15:30 🔗 Co2e35_ xD
15:30 🔗 joepie91 and since then, I've had no end of problems with bootloaders :p
15:30 🔗 joepie91 but yeah, time to reboot and see if my stuff works now
15:30 🔗 joepie91 so, brb
15:30 🔗 Co2e35_ k gl
15:47 🔗 Co2e35_ keep getting
15:47 🔗 Co2e35_ https://monitor.archive.org/weathermap/weathermap.html
15:47 🔗 Co2e35_ no item received
15:49 🔗 yipdw Co2e35_: for hyves, that project ended a long time ago
15:52 🔗 Co2e35_ no for project justin
15:53 🔗 Co2e35_ about hyves i was wondering if i can get a specific profile in the archive
15:57 🔗 yipdw Co2e35_: yeah, there's no items currently out for justin.tv either
16:01 🔗 Co2e35_ ok
17:28 🔗 schbirid http://blog.earbits.com/online_radio/earbits-will-be-shutting-down-june-16th/
17:28 🔗 schbirid 4 days notice
17:30 🔗 Co2e35_ and did it work joepie?
17:34 🔗 schbirid started a wget from http://www.earbits.com/artists/
17:35 🔗 schbirid pretty slow
17:41 🔗 schbirid looks like indie music
17:41 🔗 schbirid worth trying to save music?
17:48 🔗 joepie91 Co2e35_: oh, yes, it did
17:48 🔗 joepie91 after approximately 4290938 attempts
17:48 🔗 joepie91 and then figuring out that my kernel was half-installed, which I suppose is a pretty valid reason for not booting
17:48 🔗 schbirid it just threw json with the mp3 urls at me but only for a minute or so
17:48 🔗 joepie91 but further discussion of that belongs in #archiveteam-bs ;)
17:49 🔗 schbirid streaming is served like http://media-http-prod-0.earbits.com/0be2946d74dbf54b808ce58333bf1bcc.mp3
17:49 🔗 schbirid that's for http://www.earbits.com/collections/indie-rock/tracks/50243b8680eb5b00020015f6
17:57 🔗 schbirid ok there is some kind of api, but it requires a "Cookie: client_token"
18:01 🔗 schbirid on it
18:02 🔗 schbirid is there a good json parser on debian? i need to extract specific fields
18:02 🔗 schbirid grepping json_pp output works but ...
18:03 🔗 ivan` jq?
18:08 🔗 is4 schbirid: beat my to it
18:09 🔗 schbirid i am writing a script to download tracks per artist
18:09 🔗 schbirid my warc will be errorneous, cant hurt to redo it :)
18:20 🔗 schbirid sometimes i hate bash
18:21 🔗 schbirid https://gist.github.com/SpiritQuaddicted/ebc149573e68ef2da33a
18:21 🔗 schbirid i need to have quotes for curl's -H parameters in the end but of course they get "stripped"
18:22 🔗 schbirid there are varaibles inside, so no ' '
18:22 🔗 * schbirid feels like a noob
18:23 🔗 schbirid the "while read trackid" is failing, before it is fine
18:23 🔗 yipdw use python, save your lifespan
18:23 🔗 yipdw there are many reasons why the Warrior code moved from bash to python :P
18:24 🔗 schbirid :)
18:24 🔗 schbirid i use "\", works alright
18:34 🔗 schbirid i give up, can't find the proper trackids to query the API with. they are not the ones in the json
18:40 🔗 schbirid probably some md5
18:49 🔗 Arkiver2 underscor: the upload speed is slowly getting beter now...
18:50 🔗 schbirid if someone wants to help with earbits: https://gist.github.com/SpiritQuaddicted/af2e2479aec6aa4eda25
18:53 🔗 SN4T14 schbirid, could you give me an example link with an ID?
18:53 🔗 SN4T14 And do they change if you play the same track twice?
18:54 🔗 schbirid good question
18:54 🔗 schbirid pmed you a URL
18:55 🔗 SN4T14 Will take a look at it in a bit
18:56 🔗 schbirid hm, can't re-trigger the website calling the API. it seems to remember what it looked up
18:56 🔗 Arkiver2 the songs are just downloadable
18:56 🔗 Arkiver2 here is a list with the songs:
18:56 🔗 Arkiver2 http://media-http-prod-0.earbits.com/
18:56 🔗 schbirid maybe https://chrome.google.com/webstore/detail/earbits-radio-free-music/mgkjffcdjblaipglnmhanakilfbniihj/details will help
18:57 🔗 schbirid lol
18:57 🔗 schbirid dumbest trick in the book, thanks
18:57 🔗 joepie91 bahaha
18:57 🔗 Arkiver2 :P
18:57 🔗 joepie91 been a while since I've seen a site do that one wrong
18:57 🔗 Arkiver2 glad I could help
18:57 🔗 Arkiver2 lol yes
18:57 🔗 joepie91 amazed
18:57 🔗 joepie91 misconfigured s3, probably
18:57 🔗 Arkiver2 they aren't the best at security...
18:57 🔗 schbirid i love those
18:57 🔗 joepie91 :P
18:57 🔗 joepie91 schbirid: goodies!
18:57 🔗 Arkiver2 hmm
18:58 🔗 Arkiver2 is there only http://media-http-prod-0.earbits.com/
18:58 🔗 Arkiver2 or also others?
18:58 🔗 Arkiver2 there is no http://media-http-prod-1.earbits.com/
18:58 🔗 schbirid only saw that one so far
18:58 🔗 Arkiver2 hmm
18:58 🔗 Arkiver2 going to create a list now of the mp3
18:59 🔗 schbirid hm, just 800 files
18:59 🔗 joepie91 https://www.google.com/search?q=%22media-http-prod-*.earbits.com%22&oq=%22media-http-prod-*.earbits.com%22&aqs=chrome..69i57.2247j0j4&sourceid=chrome&es_sm=0&ie=UTF-8
18:59 🔗 joepie91 helpful google is helpful
19:00 🔗 Arkiver2 there are also log files in the list
19:00 🔗 Arkiver2 http://media-http-prod-0.earbits.com/003b0478d0d4465f2a3da911e8b9f2ee.log
19:00 🔗 Arkiver2 not sure if it's useful
19:00 🔗 Arkiver2 but I thought I'd post it
19:01 🔗 joepie91 https://www.google.com/search?q=site%3A%22*.earbits.com%22+-site%3Awww.earbits.com+-site%3Aemail.earbits.com+-site%3Ablog.earbits.com+-site%3Ahelp.earbits.com&oq=site%3A%22*.earbits.com%22+-site%3Awww.earbits.com+-site%3Aemail.earbits.com+-site%3Ablog.earbits.com+-site%3Ahelp.earbits.com&aqs=chrome..69i57j69i58.647j0j9&sourceid=chrome&es_sm=0&ie=UTF-8
19:01 🔗 joepie91 not much interesting otherwise
19:02 🔗 schbirid http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html "This implementation of the GET operation returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket."
19:02 🔗 schbirid that makes sense
19:02 🔗 schbirid so how do we query that bucket for more
19:03 🔗 Smiley hmmm it'll be something like Start=1001 or something
19:03 🔗 Smiley http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
19:03 🔗 Smiley errr maker
19:04 🔗 Smiley Specifies the key to start with when listing objects in a bucket. Amazon S3 lists objects in alphabetical order.
19:04 🔗 Smiley marker*; marker
19:04 🔗 Smiley Type: String
19:07 🔗 Arkiver2 oh shit
19:07 🔗 joepie91 ALL THE S3
19:07 🔗 Arkiver2 http://media-http-prod-0.earbits.com/ doesn't have the full list
19:07 🔗 Arkiver2 wait
19:08 🔗 schbirid someone needs an aws account, then we should be able to query it with s3cmd
19:08 🔗 Arkiver2 here
19:08 🔗 Arkiver2 http://pastebin.com/wwQgpWmw
19:08 🔗 schbirid eg s3cmd ls s3://earbits-media-production-0
19:08 🔗 Arkiver2 it goes up to 0078be8d5ad0a3189be4bfbb38422082.log
19:08 🔗 schbirid that's just the 1000 files, yeah :)
19:09 🔗 Arkiver2 songs like http://media-http-prod-0.earbits.com/0be2946d74dbf54b808ce58333bf1bcc.mp3 aren't there
19:09 🔗 Arkiver2 schbirid: do you know how to get the full list?
19:09 🔗 schbirid scroll up!
19:09 🔗 joepie91 there's flacs and wavs there!
19:09 🔗 Arkiver2 schbirid: sorry... I see
19:09 🔗 schbirid how much will it cost them if we do this :(
19:10 🔗 Arkiver2 why do we care about that? They oly said their website is going away 4 days before it is going away...
19:11 🔗 joepie91 not our problem
19:11 🔗 joepie91 they can save costs by shipping us a HDD
19:11 🔗 joepie91 lol
19:11 🔗 schbirid i am a nice person
19:11 🔗 schbirid that's why
19:11 🔗 Arkiver2 lol
19:12 🔗 Arkiver2 so, any solution for the max keys = 1000 problem?
19:12 🔗 * joepie91 points at the sign on the front door saying "Archive Team, Band of Rogue Archivists"
19:13 🔗 schbirid ok should not be much money if it is say 300k tracks at 5mb each. < 200usd unless i calced wrong
19:13 🔗 Smiley you need to juset set hte marker to 1001 and it'll return 1001-2001
19:14 🔗 schbirid Smiley: what marker:P
19:14 🔗 Smiley schbirid: not sure exactly how the GET works
19:14 🔗 Smiley but it has a attribute called marker, which you can set to start the index as a arbitary point.
19:14 🔗 schbirid Arkiver2: we need someone with aws credentials i think. i dont want to get my personal info involved so i wont do that. then s3cmd or similar tools will allow querying
19:14 🔗 Smiley as the site says D:
19:14 🔗 schbirid yeah, via s3
19:15 🔗 Arkiver2 hmm
19:19 🔗 Arkiver2 doing a crawl for what we ave already http://pastebin.com/TpbTuTEH Crawlig with Heritrix.
19:19 🔗 Arkiver2 discovering more urls!!!!
19:19 🔗 Arkiver2 http://cdn-1.earbits.com/
19:19 🔗 Arkiver2 the same
19:21 🔗 schbirid yuck, private information in there
19:22 🔗 Arkiver2 oh shit
19:22 🔗 Arkiver2 yeah
19:22 🔗 schbirid this is something one should tell them :(
19:22 🔗 Arkiver2 hmm
19:23 🔗 Arkiver2 Maybe it's weird, but shouldn't we first archive it? Because if they know of it they will also remove access to that url with all the songs
19:24 🔗 yipdw oh hahaha
19:24 🔗 yipdw that's an open S3 bucket
19:24 🔗 yipdw gg earbits
19:24 🔗 schbirid we should totally try to archive it first
19:24 🔗 pft wow, yeah
19:24 🔗 pft names and email addresses
19:24 🔗 pft hurrr
19:25 🔗 Arkiver2 this one is note accessible cdn-100.earbits.com
19:25 🔗 pft archive it but flag it as dark
19:25 🔗 Arkiver2 while it does have some urls
19:25 🔗 schbirid no one has aws credentials?
19:26 🔗 yipdw I have AWS accounts, but you don't need them
19:26 🔗 yipdw the bucket is public
19:26 🔗 schbirid how can we get the full list then?
19:26 🔗 schbirid i don't know how to query without access credentials
19:31 🔗 joepie91 wow, this is so bad
19:34 🔗 schbirid i am only getting ~15MBit/s from those buckets anyways .(
19:39 🔗 Arkiver2 any solution for the 1000 max links?
19:40 🔗 joepie91 Arkiver2: the solution has been mentioned several times, but won't get anywhere until somebody uses an Amazon account to actually do it
19:41 🔗 midas do we have chan yet?
19:41 🔗 midas or too small?
19:42 🔗 Arkiver2 what about #earbite ?
19:43 🔗 midas im down with that
19:44 🔗 Arkiver2 ok good I'm in it
19:44 🔗 schbirid ok
20:46 🔗 schbirid yipdw: if you know more about accessing public buckets, please help us in #earbite
21:56 🔗 Arkiver2 underscor: uploadspeed is back to normal again, thanks man!
22:35 🔗 dashcloud if I want to exclude club.angelfire.com while still grabbing everything else in the angelfire domain, should I use --reject-regex=club.angelfire.com* or --exclude-domains=club.angelfire.com ?
22:45 🔗 dashcloud ddrescue is out with a new release- changes to to copy algorithm and trimming: http://lists.gnu.org/archive/html/info-gnu/2014-06/msg00009.html
22:45 🔗 Smiley oo
22:48 🔗 SN4T14 dashcloud, about the axclude thing, they should both work, but I'd use exclude-domains because it's simpler.
22:49 🔗 dashcloud thanks!
23:44 🔗 Nemo_bis underscor: MOAR cacti links

irclogger-viewer