[01:41] well, the answer to that question is yes- which sucked for me [01:42] also, apparently you can't do --output-document and --truncate-output together- wget doesn't recognize the combination, but just fails as if you left off the URL [01:42] (for wget that is) [06:28] Uh, still no updates for http://archiveteam.org/index.php?title=Ancestry.com ? Does it have a channel or something? [06:29] if not, I propose #gravedigging as channel name [06:46] goddamnit [06:46] anybody have a copy of HipHop, that music thing? [06:46] site is dead, repo is dead [06:46] no forks [06:47] AUR package refers to source tarball on (now-defunct) site [06:47] no matches in wayback [06:47] well the site is in wayback, but the tarballs are not [06:50] ffs [06:50] how can there be no copies of this thing [06:56] https://github.com/rhumlover/hiphop [06:57] https://github.com/liamja/hiphop [06:58] https://encrypted.google.com/search?q=%22forked%20from%20hiphopapp%2Fhiphop%22%20site%3Agithub.com [07:08] ivan`: wtf? github search gave me 0 results :| [07:08] anyway, thanks [07:08] ivan` has Github Search+ [07:09] he paid for fastlane [07:09] how did hiphop actually work? youtube? [07:10] oh wait, the ISP's dont like that name, you have fastlane, but for faster you need hyperspeedlane [07:11] trs80: node-webkit application, searched on youtube, then (presumably) grabbed cover art for it [07:11] and displayed the list [07:11] as far as I can tell [07:11] so nothing too terribly new (because Tomahawk) but still [07:11] also Tomahawk is buggy as shit, so yeagh [07:11] yeah * [07:12] the only Tomahawk I know is the BT + Adam K track, which makes that sentence really funny [07:12] lol [07:12] yipdw: http://www.tomahawk-player.org/ [07:12] oh [07:12] yeah, I could see cover art came from itunes/last.fm [07:13] Tomahawk is a great idea, but last time I tried to use it, I gave up after an hour of chasing inexplicable bugs [07:16] huh, Tomahawk looks pretty neat [10:55] http://www.minuszerodegrees.net/manuals/ [11:01] https://archive.org/post/1018340/add-website [11:01] should I? yes i should [11:16] lol [11:21] archivebot for the win right :p [12:37] Ah it's nothing new. :( http://www.reddit.com/r/DataHoarder/comments/27y8ux/standing_up_40tbs_of_data_for_fun_times/ci601wk [13:33] midas, what would the archivebot get if you set it off on http://hardforum.com/ [13:33] off site images from posts included or not? [13:38] ohhdemgir: embedded images are grabbed from all domains [13:39] links are not [13:39] wonder how big hf is [13:42] Google thinks 3.5M pages [13:43] # of threads adds up to 949,580 [13:44] what command does the bot run to do the grab? [13:46] see https://github.com/ArchiveTeam/ArchiveBot/blob/master/pipeline/pipeline.py#L94 [13:59] Is the upload speed to the IA very slow today? [13:59] It is uploading very slow right now [14:00] Arkiver2, same here, very [14:00] midas, ^^ [14:00] random pickups here and there but generally it's been silly slow [14:00] Will be fixed in a bit of time probably [14:01] I just had to make sure it wasn't anything with me [14:01] where are you uploading from? [14:01] The Netherlands [14:05] Arkiver2: do a traceroute to s3.us.archive.org? [14:08] eek, I have 350 ms ping from Amsterdam via HE but 170 ms from Helsinki via ISC. Usual stuff. [14:11] yeah [14:11] got the same here [14:11] but I think it will be fixed in some hours, so I'll just wait [14:14] Nemo_bis: ah look: https://monitor.archive.org/weathermap/weathermap.html [14:14] HE uploading is 85 to 100% [14:15] damn, congested [14:15] 365ms delay here also netherlands [14:19] joepie91: could it be some kind of hacking thing? [14:19] Arkiver2: what? [14:19] Arkiver2: from online,bet is slow, OVH was hitting 11MB/s [14:20] s/bet/very [14:20] joepie91: nah, nevermind [14:21] Arkiver2: that's downloading [14:21] but maybe they cap the sum of up/down or something [14:22] Nemo_bis: yeah, just maybe someone is flooding it with download requests [14:22] nah, IA via HE.net is always slow for me [14:23] try to route via Japan if necessary :P to avoid HE [14:23] the only statement more insane than that would be "try to route via Australia, to get better speeds" [14:23] :P [14:24] I'll just wait for some hours, speed will be better then hopefully [14:37] Yeah, our HE uplink is generally pretty saturated [14:37] hai underscor [14:37] https://monitor.archive.org/pubaccess/graph_2064.html [14:38] damn [14:38] HE must either love or hate you [14:38] haha [14:38] underscor: but it will get back to normal in a while right? [14:38] depending on what you pay :P [14:38] well, I mean, we just pay for the 10gbit port unmetered, so they agreed to it originally [14:38] haha [14:39] Arkiver2: Nothing looks particularly busier than it usually is; how long has it been slow? [14:39] not sure [14:39] a few hours at least [14:39] I was not home [14:40] When I came back home it was very slow and according to the total uploaded bytes it was slow for a few hours already [14:40] underscor: heh, probably hate then [14:40] "10gbps unmetered? whatever, they'll never use all of it anyway" [14:40] yesterday I it was around 8 to 10 times as fast [14:40] ".... oh...." [14:40] :D [14:40] that's basically what happened with isc, haha [14:40] lol, really? [14:40] although in the end it netted them some really sweet peering agreements [14:41] yeah, we like, quadroupled their internet footprint XD [14:41] hahaha [14:42] Arkiver2: Looks like the intake racks are a bit busy today (ia9025) which is probably why it's slow at the moment [14:42] That's the only thing I see that explains it; catalog seems to be operating normally and the link saturation is in the other direction [14:42] underscor: ah, thank you! [14:43] If you give me your IP I can do a traceroute back to see what route your return traffic is taking, also [14:43] hmm [14:43] I will be back here in 2 hours ok? [14:43] (or one near your netblock) [14:43] sure, just pm me :) [14:43] thank underscor! [14:44] see you guys later [14:46] sorry, this live CD has a very dubious web browser [14:46] underscor: what did you say>? [14:47] lol [14:47] Arkiver2: Looks like the intake racks are a bit busy today (ia9025) which is probably why it's slow at the moment [14:47] If you give me your IP I can do a traceroute back to see what route your return traffic is taking, also [14:47] was basically it [14:47] I se [14:47] see * [14:48] also, boot repair just finished [14:48] so, time to reboot, and probably watch it not boot [14:48] gl;hf [14:48] not going to be much fun here [14:48] lol [14:48] I hate suicidal bootloeaders [14:48] haha [14:48] bootloaders * [14:48] :( [14:48] boatleaders [14:49] buttloafers [14:49] okay [14:49] time to reboot [14:50] back in anywhere between 5 minutes best case to 5 days worst case [14:50] :P [14:50] haha [14:53] guys whats cacti? [14:54] a tree based monitoring/graphing application named after the infamous desert plant [14:55] For router graphs and stuff [14:58] well, that was quite clearly a failure. [14:58] Co2e35_, plural of cactus. [15:15] thanks @underscor [15:25] joepie91 u are dutch right? [15:25] Co2e35_: correct [15:25] great can u check on #angerthehyve? [15:26] Co2e35_: um, what needs checking? [15:26] because I'm kinda trying to recover my bootloader :P [15:27] for what os? [15:27] Co2e35_: openSUSE [15:27] my GRUB killed itself (again) [15:28] cant u run it on 2 drives and copying the working one to the broken one sorry my programmer english is horrible [15:28] xD [15:29] :P [15:29] Co2e35_: that's implying I have two drives [15:29] I only have one usable HDD [15:29] virtual drive? [15:29] my old HDD is 40GB [15:29] and the setup on that is -definitely- broken [15:29] let me guess to small :P [15:29] well yes, and it has a broken opensuse and bootloader on it [15:29] let's just say that I was a bit too careless while cloning my disk [15:30] when I got a new HD [15:30] HDD * [15:30] xD [15:30] and since then, I've had no end of problems with bootloaders :p [15:30] but yeah, time to reboot and see if my stuff works now [15:30] so, brb [15:30] k gl [15:47] keep getting [15:47] https://monitor.archive.org/weathermap/weathermap.html [15:47] no item received [15:49] Co2e35_: for hyves, that project ended a long time ago [15:52] no for project justin [15:53] about hyves i was wondering if i can get a specific profile in the archive [15:57] Co2e35_: yeah, there's no items currently out for justin.tv either [16:01] ok [17:28] http://blog.earbits.com/online_radio/earbits-will-be-shutting-down-june-16th/ [17:28] 4 days notice [17:30] and did it work joepie? [17:34] started a wget from http://www.earbits.com/artists/ [17:35] pretty slow [17:41] looks like indie music [17:41] worth trying to save music? [17:48] Co2e35_: oh, yes, it did [17:48] after approximately 4290938 attempts [17:48] and then figuring out that my kernel was half-installed, which I suppose is a pretty valid reason for not booting [17:48] it just threw json with the mp3 urls at me but only for a minute or so [17:48] but further discussion of that belongs in #archiveteam-bs ;) [17:49] streaming is served like http://media-http-prod-0.earbits.com/0be2946d74dbf54b808ce58333bf1bcc.mp3 [17:49] that's for http://www.earbits.com/collections/indie-rock/tracks/50243b8680eb5b00020015f6 [17:57] ok there is some kind of api, but it requires a "Cookie: client_token" [18:01] on it [18:02] is there a good json parser on debian? i need to extract specific fields [18:02] grepping json_pp output works but ... [18:03] jq? [18:08] schbirid: beat my to it [18:09] i am writing a script to download tracks per artist [18:09] my warc will be errorneous, cant hurt to redo it :) [18:20] sometimes i hate bash [18:21] https://gist.github.com/SpiritQuaddicted/ebc149573e68ef2da33a [18:21] i need to have quotes for curl's -H parameters in the end but of course they get "stripped" [18:22] there are varaibles inside, so no ' ' [18:22] * schbirid feels like a noob [18:23] the "while read trackid" is failing, before it is fine [18:23] use python, save your lifespan [18:23] there are many reasons why the Warrior code moved from bash to python :P [18:24] :) [18:24] i use "\", works alright [18:34] i give up, can't find the proper trackids to query the API with. they are not the ones in the json [18:40] probably some md5 [18:49] underscor: the upload speed is slowly getting beter now... [18:50] if someone wants to help with earbits: https://gist.github.com/SpiritQuaddicted/af2e2479aec6aa4eda25 [18:53] schbirid, could you give me an example link with an ID? [18:53] And do they change if you play the same track twice? [18:54] good question [18:54] pmed you a URL [18:55] Will take a look at it in a bit [18:56] hm, can't re-trigger the website calling the API. it seems to remember what it looked up [18:56] the songs are just downloadable [18:56] here is a list with the songs: [18:56] http://media-http-prod-0.earbits.com/ [18:56] maybe https://chrome.google.com/webstore/detail/earbits-radio-free-music/mgkjffcdjblaipglnmhanakilfbniihj/details will help [18:57] lol [18:57] dumbest trick in the book, thanks [18:57] bahaha [18:57] :P [18:57] been a while since I've seen a site do that one wrong [18:57] glad I could help [18:57] lol yes [18:57] amazed [18:57] misconfigured s3, probably [18:57] they aren't the best at security... [18:57] i love those [18:57] :P [18:57] schbirid: goodies! [18:57] hmm [18:58] is there only http://media-http-prod-0.earbits.com/ [18:58] or also others? [18:58] there is no http://media-http-prod-1.earbits.com/ [18:58] only saw that one so far [18:58] hmm [18:58] going to create a list now of the mp3 [18:59] hm, just 800 files [18:59] https://www.google.com/search?q=%22media-http-prod-*.earbits.com%22&oq=%22media-http-prod-*.earbits.com%22&aqs=chrome..69i57.2247j0j4&sourceid=chrome&es_sm=0&ie=UTF-8 [18:59] helpful google is helpful [19:00] there are also log files in the list [19:00] http://media-http-prod-0.earbits.com/003b0478d0d4465f2a3da911e8b9f2ee.log [19:00] not sure if it's useful [19:00] but I thought I'd post it [19:01] https://www.google.com/search?q=site%3A%22*.earbits.com%22+-site%3Awww.earbits.com+-site%3Aemail.earbits.com+-site%3Ablog.earbits.com+-site%3Ahelp.earbits.com&oq=site%3A%22*.earbits.com%22+-site%3Awww.earbits.com+-site%3Aemail.earbits.com+-site%3Ablog.earbits.com+-site%3Ahelp.earbits.com&aqs=chrome..69i57j69i58.647j0j9&sourceid=chrome&es_sm=0&ie=UTF-8 [19:01] not much interesting otherwise [19:02] http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html "This implementation of the GET operation returns some or all (up to 1000) of the objects in a bucket. You can use the request parameters as selection criteria to return a subset of the objects in a bucket." [19:02] that makes sense [19:02] so how do we query that bucket for more [19:03] hmmm it'll be something like Start=1001 or something [19:03] http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html [19:03] errr maker [19:04] Specifies the key to start with when listing objects in a bucket. Amazon S3 lists objects in alphabetical order. [19:04] marker*; marker [19:04] Type: String [19:07] oh shit [19:07] ALL THE S3 [19:07] http://media-http-prod-0.earbits.com/ doesn't have the full list [19:07] wait [19:08] someone needs an aws account, then we should be able to query it with s3cmd [19:08] here [19:08] http://pastebin.com/wwQgpWmw [19:08] eg s3cmd ls s3://earbits-media-production-0 [19:08] it goes up to 0078be8d5ad0a3189be4bfbb38422082.log [19:08] that's just the 1000 files, yeah :) [19:09] songs like http://media-http-prod-0.earbits.com/0be2946d74dbf54b808ce58333bf1bcc.mp3 aren't there [19:09] schbirid: do you know how to get the full list? [19:09] scroll up! [19:09] there's flacs and wavs there! [19:09] schbirid: sorry... I see [19:09] how much will it cost them if we do this :( [19:10] why do we care about that? They oly said their website is going away 4 days before it is going away... [19:11] not our problem [19:11] they can save costs by shipping us a HDD [19:11] lol [19:11] i am a nice person [19:11] that's why [19:11] lol [19:12] so, any solution for the max keys = 1000 problem? [19:12] * joepie91 points at the sign on the front door saying "Archive Team, Band of Rogue Archivists" [19:13] ok should not be much money if it is say 300k tracks at 5mb each. < 200usd unless i calced wrong [19:13] you need to juset set hte marker to 1001 and it'll return 1001-2001 [19:14] Smiley: what marker:P [19:14] schbirid: not sure exactly how the GET works [19:14] but it has a attribute called marker, which you can set to start the index as a arbitary point. [19:14] Arkiver2: we need someone with aws credentials i think. i dont want to get my personal info involved so i wont do that. then s3cmd or similar tools will allow querying [19:14] as the site says D: [19:14] yeah, via s3 [19:15] hmm [19:19] doing a crawl for what we ave already http://pastebin.com/TpbTuTEH Crawlig with Heritrix. [19:19] discovering more urls!!!! [19:19] http://cdn-1.earbits.com/ [19:19] the same [19:21] yuck, private information in there [19:22] oh shit [19:22] yeah [19:22] this is something one should tell them :( [19:22] hmm [19:23] Maybe it's weird, but shouldn't we first archive it? Because if they know of it they will also remove access to that url with all the songs [19:24] oh hahaha [19:24] that's an open S3 bucket [19:24] gg earbits [19:24] we should totally try to archive it first [19:24] wow, yeah [19:24] names and email addresses [19:24] hurrr [19:25] this one is note accessible cdn-100.earbits.com [19:25] archive it but flag it as dark [19:25] while it does have some urls [19:25] no one has aws credentials? [19:26] I have AWS accounts, but you don't need them [19:26] the bucket is public [19:26] how can we get the full list then? [19:26] i don't know how to query without access credentials [19:31] wow, this is so bad [19:34] i am only getting ~15MBit/s from those buckets anyways .( [19:39] any solution for the 1000 max links? [19:40] Arkiver2: the solution has been mentioned several times, but won't get anywhere until somebody uses an Amazon account to actually do it [19:41] do we have chan yet? [19:41] or too small? [19:42] what about #earbite ? [19:43] im down with that [19:44] ok good I'm in it [19:44] ok [20:46] yipdw: if you know more about accessing public buckets, please help us in #earbite [21:56] underscor: uploadspeed is back to normal again, thanks man! [22:35] if I want to exclude club.angelfire.com while still grabbing everything else in the angelfire domain, should I use --reject-regex=club.angelfire.com* or --exclude-domains=club.angelfire.com ? [22:45] ddrescue is out with a new release- changes to to copy algorithm and trimming: http://lists.gnu.org/archive/html/info-gnu/2014-06/msg00009.html [22:45] oo [22:48] dashcloud, about the axclude thing, they should both work, but I'd use exclude-domains because it's simpler. [22:49] thanks! [23:44] underscor: MOAR cacti links