[00:00] garyrh, hmm, just had ia upload the same file twice.. [00:00] with the --checksum flag [00:00] didn't create a duplicate file or anything, but it transferred the data [00:01] eh.. it's doing it for everything [00:02] was the filename the same as the filename in the item? [00:05] yeah.. basically running "ia upload ident * --checksum" twice on the same files and it just starts from the top again [00:06] I'm guessing it doesn't work with multiple arguments? [00:06] was there a task running while you were uploading? (check https://archive.org/history/ident) [00:06] ia & robots.txt http://web.archive.org/web/*/https://groups.google.com/forum/#!topic/linux-sunxi/NKyOR4gxYgY [00:06] the robots.txt allow this page to be crawled [00:06] first rule to match [00:07] garyrh, don't see anything in there [00:08] huh, odd. [00:09] yeah, checksum just isn't working full stop [00:09] the first time I tested it it did tell me that a file already existed though, which is odd [00:09] is it https://archive.org/details/MadrotterTreasureHuntBlogDump ? [00:09] but now it's not doing it anymore [00:10] garyrh, yep [00:10] I'm probably going to use the --delete flag from now on, but would be good not to have to upload everything again pointlessly [00:10] ah, there are a bunch of archive tasks that haven't started yet. [00:10] those need to finish before --checksum will work [00:11] looks like something went wrong here too: https://catalogd.archive.org/log/397144129 [00:11] ouch.. what are those tasks? [00:13] is it not liking my cancelling an upload part way through or something? [00:14] it would be great if I could just disable all deriving until I'm actually done, seeing as I just want to get the data onto their servers as quickly as possible at this stage [00:15] canceling shouldn't affect it. [00:15] *** toad1 has joined #archiveteam [00:15] you can disable derives with ia upload ident file --no-derive [00:15] and then run a derive after you're done https://archive.org/manage/ident [00:15] oh nice! [00:16] so I'm thinking --checksum --delete --no-derive [00:17] garyrh, could I just scrap this one and make a new one? [00:18] I would wait until either the tasks finish and/or an admin fixes it. [00:19] It's just that I have limited disk space, so I have to get the file uploaded and deleted before I can download more :( [00:20] usually someone notices it within a few hours. If not, you can email info@archive.org. [00:42] *** johtso has quit IRC (Remote host closed the connection) [00:52] *** johtso has joined #archiveteam [00:58] *** mistym has quit IRC (Remote host closed the connection) [01:00] *** johtso_ has joined #archiveteam [01:02] *** johtso_ has quit IRC (Remote host closed the connection) [01:16] *** s0ph0s has joined #archiveteam [01:18] Hey, I was looking at the deathwatch page and I noticed that it says AOL is slated to kill several sites, including TUAW. I noticed the other day that Engadget and TUAW were mirroring each other in my RSS feeds, so I checked and the TUAW domain now redirects to Engadget. Could someone make that edit please? [01:31] s0ph0s: the wiki password to create an account is yahoosucks [01:33] Ah, okay. Thanks! [01:48] I made the edits. [02:00] *** Froggypwn has joined #archiveteam [02:00] *** mistym has joined #archiveteam [02:04] *** s0ph0s has left Textual IRC Client: www.textualapp.com [02:29] *** dashcloud has quit IRC (Read error: Operation timed out) [02:36] *** dashcloud has joined #archiveteam [02:45] *** Ymgve has quit IRC () [03:02] *** schbirid2 has joined #archiveteam [03:03] *** schbirid has quit IRC (Read error: Operation timed out) [04:10] *** aaaaaaaaa has quit IRC (Leaving) [04:30] *** mistym has quit IRC (Remote host closed the connection) [04:53] *** VADemon has quit IRC (Read error: Connection reset by peer) [04:57] *** mistym has joined #archiveteam [05:28] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [05:42] *** Jonimus has joined #archiveteam [05:52] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [06:04] *** Jonimus has joined #archiveteam [06:59] johtso: this is why I was doing https://github.com/espes/jdget [07:58] *** scyther has joined #archiveteam [08:33] *** dashcloud has quit IRC (Read error: Operation timed out) [08:39] *** dashcloud has joined #archiveteam [08:50] *** khaoohs_ has joined #archiveteam [08:50] *** khaoohs has quit IRC (Read error: Connection reset by peer) [09:00] *** mistym has quit IRC (Remote host closed the connection) [09:04] espes__, great idea! I would definitely use that [09:06] *** T31M has joined #archiveteam [09:28] is anyone familiar with gnu parallel? [09:28] struggling to get something like this to work "parallel unrar t {} && mv {} ../outbox ::: ./**/*.rar" [09:29] seems like it's putting an extra argument at the very end of the command [09:49] *** dashcloud has quit IRC (Read error: Operation timed out) [09:55] *** dashcloud has joined #archiveteam [10:00] urgh, i'll just stick with xargs [10:02] your && splits the commants in two [10:03] i would not recommend parallel unrarring unless you are on ssd [10:03] you could write a simply script instead and call that with parallel [10:04] $1=file [10:04] unrar t "${file}" && mv "${file}" ../outbox/ [10:04] then "parallel thatscript.sh ::: ./**/*.rar" [10:04] i'd recommend using absolute paths just to be safe of any shenanigans [10:06] johtso: ^ [10:07] schbirid2, this is what I came up with in the end: find . -name "*.rar" -print0 | xargs -0 -n 1 -I {} sh -c "unrar t '{}' && mv '{}' ../outbox [10:08] i like mine better! but whatever works works in hacky bash land [10:08] I am on an ssd though so parallel would be nice :) [10:17] schbirid2, think it's struggling with spaces in the file names.. [10:18] mine wouldnt [10:18] schbirid2, I mean with yourse [10:19] *yours [10:19] waaaat :D [10:19] shit =) [10:19] oh i forgott o tell parallel the filename [10:19] single quotes behaving different from double quotes? [10:19] "parallel thatscript.sh {} ::: ./**/*.rar" [10:19] oh! :D [10:20] in bash yes, inside single quotes variables are not used [10:20] sorry, we should go -bs, i thought this wass [10:20] /join #archiveteam-bs [10:21] ... [10:21] are you sure -bs is a good tag? [10:21] I mean, I'm not the kind of guy to point out the obvious... [10:22] what do you mean? [10:22] but that's VERY misleading [10:22] unless it means EXACTLY what I think it means... [10:22] it means bullshit :) [10:22] Aaaah! [10:22] our channel for random things so we do not pollute this one with unimportant chat [10:22] Then its not misleading at all! :D [10:23] nevermind, thankyoudrivetrough [11:04] It would be nice if the ia tool could watch a directory and upload anything put in it [11:04] I suppose I could just run it in a loop with --delete [11:19] *** Ymgve has joined #archiveteam [11:20] *** habi has joined #archiveteam [11:23] *** habi has quit IRC (Client Quit) [11:33] *** dashcloud has quit IRC (Read error: Operation timed out) [11:44] *** dashcloud has joined #archiveteam [11:46] *** habi has joined #archiveteam [12:07] *** habi has quit IRC (Quit: Leaving.) [12:18] *** BlueMaxim has quit IRC (Quit: Leaving) [12:32] *** Stiletto has quit IRC (Read error: Operation timed out) [12:55] *** schbirid2 has quit IRC (Read error: Operation timed out) [12:58] *** schbirid2 has joined #archiveteam [13:09] *** dashcloud has quit IRC (Read error: Operation timed out) [13:16] *** dashcloud has joined #archiveteam [13:25] *** primus104 has joined #archiveteam [13:36] *** pir^2 has joined #archiveteam [13:36] Hi. Did I do https://archive.org/details/dmoz-rdf-20150327 right? [13:40] If I have "server readonly -- tasks waiting for harddrive fix" is it still okay to uploading using ia? The uploads still seem to work [13:40] and the tasks are building up [13:40] *to upload [13:48] Can https://archive.org/details/pdp10nocrew be added to wayback too? [13:57] johtso: i think it's fine, it'll just upload to a different server [13:58] Kazzy, yeah, I see it's uploading to some kind of an s3 staging server.. and then a task rsyncs it over [13:58] I'm sure that's got to be a core part of the design.. once you upload something and get a 200 response, it should be safe [14:23] *** Stiletto has joined #archiveteam [14:26] *** pir^2 has quit IRC (Ping timeout: 370 seconds) [14:27] *** pir^2 has joined #archiveteam [14:28] Hi, is it possible to add https://archive.org/details/dmoz-rdf-20150327 to Wayback Machine? [14:34] getting really slow speeds uploading to IA now :( [14:37] *** primus104 has quit IRC (Leaving.) [14:49] http://didgoogleshutdown.com/ "Did Google Shutdown ___________ Yet?" [14:50] *** signius has quit IRC (Read error: Operation timed out) [14:51] trs80: looks appropriate for #froogle [14:52] no gmail, knol, google video? [15:05] *** signius has joined #archiveteam [15:16] *** DopefishJ has joined #archiveteam [15:19] *** VonGuard_ has joined #archiveteam [15:19] *** serapeum has quit IRC (Read error: Connection reset by peer) [15:20] *** deathy_ has joined #archiveteam [15:20] *** DFJustin has quit IRC (Ping timeout: 417 seconds) [15:20] *** fresco___ has quit IRC (Ping timeout: 417 seconds) [15:20] *** torvik has quit IRC (Ping timeout: 417 seconds) [15:20] *** LittUp has quit IRC (Ping timeout: 417 seconds) [15:20] *** deathy has quit IRC (Ping timeout: 417 seconds) [15:20] *** VonGuard has quit IRC (Ping timeout: 417 seconds) [15:20] *** mrfoo has quit IRC (Ping timeout: 417 seconds) [15:20] *** fresco___ has joined #archiveteam [15:20] *** VonGuard_ is now known as VonGuard [15:20] *** deathy_ is now known as deathy [15:20] *** mrfoo has joined #archiveteam [15:20] *** pfallenop has quit IRC (Read error: Connection reset by peer) [15:20] *** hive-mind has quit IRC (Read error: Connection reset by peer) [15:20] *** russss_ has joined #archiveteam [15:20] *** lhobas_ has joined #archiveteam [15:20] *** pfallenop has joined #archiveteam [15:20] *** Zebranky has joined #archiveteam [15:20] *** Zebranky_ has quit IRC (Read error: Connection reset by peer) [15:25] *** pir^2 has quit IRC (bye) [15:27] *** bmcginty has joined #archiveteam [15:27] *** sivoais_ has joined #archiveteam [15:27] *** russss has quit IRC (Ping timeout: 266 seconds) [15:27] *** lhobas has quit IRC (Ping timeout: 266 seconds) [15:27] *** mietek has quit IRC (Ping timeout: 266 seconds) [15:27] *** sivoais has quit IRC (Write error: Connection reset by peer) [15:27] *** russss_ has quit IRC (Ping timeout: 393 seconds) [15:27] *** lhobas_ has quit IRC (Ping timeout: 393 seconds) [15:27] *** mrfoo has quit IRC (Ping timeout: 393 seconds) [15:27] *** fresco___ has quit IRC (Ping timeout: 393 seconds) [15:27] *** deathy has quit IRC (Ping timeout: 393 seconds) [15:27] *** VonGuard has quit IRC (Ping timeout: 393 seconds) [15:27] *** johtso has quit IRC (Ping timeout: 393 seconds) [15:27] *** balrog has quit IRC (Ping timeout: 393 seconds) [15:27] *** Rickster has quit IRC (Ping timeout: 393 seconds) [15:27] *** marvinw has quit IRC (Ping timeout: 393 seconds) [15:27] *** kniffy has quit IRC (Ping timeout: 393 seconds) [15:27] *** johtso has joined #archiveteam [15:27] *** mietek has joined #archiveteam [15:27] *** kniffy has joined #archiveteam [15:27] *** russss_ has joined #archiveteam [15:27] *** VonGuard has joined #archiveteam [15:27] *** lhobas_ has joined #archiveteam [15:27] *** deathy has joined #archiveteam [15:27] *** Rickster` has joined #archiveteam [15:27] *** Rickster` is now known as Rickster [15:27] *** mrfoo has joined #archiveteam [15:28] *** fresco___ has joined #archiveteam [15:29] *** bmcginty_ has quit IRC (Ping timeout: 261 seconds) [15:30] *** torvik has joined #archiveteam [15:38] *** serapeum has joined #archiveteam [15:38] *** fresco___ has quit IRC (Ping timeout: 498 seconds) [15:38] *** mrfoo has quit IRC (Ping timeout: 498 seconds) [15:38] *** lhobas_ has quit IRC (Ping timeout: 498 seconds) [15:38] *** VonGuard has quit IRC (Ping timeout: 498 seconds) [15:38] *** russss_ has quit IRC (Ping timeout: 498 seconds) [15:38] *** deathy has quit IRC (Ping timeout: 498 seconds) [15:38] *** jk[SVP] has quit IRC (Ping timeout: 498 seconds) [15:38] *** thefinn93 has quit IRC (Ping timeout: 498 seconds) [15:38] *** fresco___ has joined #archiveteam [15:38] *** lhobas_ has joined #archiveteam [15:38] *** deathy has joined #archiveteam [15:38] *** mrfoo has joined #archiveteam [15:39] *** thefinn93 has joined #archiveteam [15:39] *** LittUp has joined #archiveteam [15:39] *** VonGuard has joined #archiveteam [15:39] *** russss_ has joined #archiveteam [15:40] *** marvinw has joined #archiveteam [15:46] *** jk[SVP] has joined #archiveteam [15:48] *** russss_ has quit IRC (Ping timeout: 488 seconds) [15:48] *** VonGuard has quit IRC (Ping timeout: 488 seconds) [15:48] *** mrfoo has quit IRC (Ping timeout: 488 seconds) [15:48] *** LittUp has quit IRC (Ping timeout: 488 seconds) [15:48] *** deathy has quit IRC (Ping timeout: 488 seconds) [15:48] *** lhobas_ has quit IRC (Ping timeout: 488 seconds) [15:48] *** fresco___ has quit IRC (Ping timeout: 488 seconds) [15:48] *** Kazzy has quit IRC (Ping timeout: 488 seconds) [15:48] *** Jon has quit IRC (Ping timeout: 488 seconds) [15:48] *** LittUp has joined #archiveteam [15:48] *** jmtd has joined #archiveteam [15:48] *** jmtd is now known as Jon [15:48] *** russss_ has joined #archiveteam [15:48] *** fresco___ has joined #archiveteam [15:48] *** mrfoo has joined #archiveteam [15:48] *** VonGuard has joined #archiveteam [15:48] *** lhobas_ has joined #archiveteam [15:48] *** deathy has joined #archiveteam [15:48] *** aaaaaaaaa has joined #archiveteam [15:49] *** Kazzy has joined #archiveteam [15:51] *** balrog has joined #archiveteam [15:51] *** swebb sets mode: +o balrog [15:54] *** hive-mind has joined #archiveteam [15:58] *** SoJa has joined #archiveteam [16:02] *** Nertsy has quit IRC (Quit: Nertsy) [16:04] So, I'm now to the point with the screenshotting project that I have a page where I drop the ones that didn't preview correctly. [16:08] *** Nertsy has joined #archiveteam [16:20] Which project currently uses the most bandwidth? [16:20] I have a dedi I would like to put toward a project, and it has a fat pipe, so I would like to help on a project that can use the speed [16:29] SoJa, I'd guess rapidshare [16:29] johtso, yeah that's what I guessed, thanks [16:31] I helped out on the twitch archive a while back, boy was that one fun :P [16:33] *** primus104 has joined #archiveteam [16:58] Oh wow I never realised Rapidshare was going [16:59] I need to keep up :) [17:00] seems like they have the help they need on it, I was mostly getting rate limited responses :P [17:00] How much disk space would running a script require? [17:00] most of the good stuff is gone already :( [17:00] will, depends on which files you get, and how many concurrent you do [17:00] SoJa: Right okay [17:01] as soon as they are done downloading, they are rsynced up to a server and then deleted, I think [17:30] *** mistym has joined #archiveteam [17:35] *** primus104 has quit IRC (Leaving.) [17:50] *** pir^2 has joined #archiveteam [17:52] Is there a process to request adding a WARC to Wayback Machine? [17:56] i *think* it just gets derived into the wayback [17:57] but I've never tried it/uploaded warcs [18:00] I think you have to set the type as "web" or something like that, but someone who knows more will show up eventually [18:01] the item with the warc needs to be have mediatype:web and if it's in the archiveteam collection, it will get ingested much quicker. [18:06] can we rewrite history? kinda scary to think about [18:08] *** pir^2 has quit IRC (Ping timeout: 370 seconds) [18:24] *** habi has joined #archiveteam [18:24] *** habi has left [18:27] *** Jonimus has quit IRC (Ping timeout: 370 seconds) [18:37] *** primus104 has joined #archiveteam [18:58] *** DopefishJ is now known as DFJustin [19:06] *** Jonimus has joined #archiveteam [19:25] *** mistym_ has joined #archiveteam [19:31] *** mistym has quit IRC (Read error: Operation timed out) [19:47] *** SN4T14_ has joined #archiveteam [19:55] *** SN4T14__ has quit IRC (Ping timeout: 512 seconds) [20:10] *** habi has joined #archiveteam [20:33] shhhhh [20:34] you need to set the mediatype to 'web', but only some users have privileges to do that [20:34] *** habi has left [20:34] *** Start has quit IRC (Read error: Connection reset by peer) [20:35] *** Start has joined #archiveteam [20:42] *** DFJustin has quit IRC (Quit: IMHOSTFU) [20:42] *** DFJustin has joined #archiveteam [20:42] *** swebb sets mode: +o DFJustin [20:56] *** Stiletto has quit IRC (Remote host closed the connection) [20:57] *** BlueMaxim has joined #archiveteam [20:58] *** Stiletto has joined #archiveteam [21:00] *** edsu_ has quit IRC (Quit: leaving) [21:00] *** edsu has joined #archiveteam [21:01] *** edsu has quit IRC (Client Quit) [21:01] *** edsu has joined #archiveteam [21:02] *** edsu has quit IRC (Client Quit) [21:03] *** edsu has joined #archiveteam [21:17] *** Emcy_ has joined #archiveteam [21:23] *** Emcy has quit IRC (Ping timeout: 512 seconds) [21:47] *** Jon has quit IRC (Ping timeout: 265 seconds) [21:47] *** Jon has joined #archiveteam [22:18] *** pir^2 has joined #archiveteam [22:19] https://archive.org/details/pdp10nocrew What more do I need to do to get it in the Wayback Machine? [22:20] @ xmc, Kazzy, aaaaaaaaa, et al. [22:21] also re garyrh [22:23] and can you add https://archive.org/details/dmoz-rdf-20150327 too? [22:26] retrospring.net keeps blocking archivebot, what should we do? [22:29] ip ban, or useragent ban etc? [22:30] slow down, more delay, less workers? [22:30] *** pir^2 has quit IRC (Ping timeout: 370 seconds) [22:32] ip ban likely [22:32] but i don't know [22:33] don't know how to test with archivebot [22:44] *** scyther has quit IRC (Leaving) [23:22] *** Emcy_ has quit IRC (Read error: Connection reset by peer) [23:25] *** Emcy has joined #archiveteam [23:42] *** c_b has joined #archiveteam [23:43] *** NovaKing_ has quit IRC (Read error: Operation timed out) [23:43] *** cadbury_ has quit IRC (Read error: Operation timed out) [23:43] *** aNthraXx_ has quit IRC (Read error: Operation timed out) [23:43] *** antomatic has quit IRC (Read error: Operation timed out) [23:43] *** ats has quit IRC (Read error: Operation timed out) [23:43] *** thefinn93 has quit IRC (Read error: Operation timed out) [23:43] *** ats has joined #archiveteam [23:43] *** Sk1d has quit IRC (Read error: Operation timed out) [23:43] *** antomatic has joined #archiveteam [23:43] *** svchfoo1 sets mode: +o antomatic [23:43] *** primus has quit IRC (Read error: Operation timed out) [23:44] *** Sk1d has joined #archiveteam [23:44] *** fenn has quit IRC (Read error: Operation timed out) [23:44] *** primus has joined #archiveteam [23:44] *** fenn has joined #archiveteam [23:44] *** thefinn93 has joined #archiveteam [23:44] *** nox has quit IRC (Read error: Operation timed out) [23:45] *** caber has quit IRC (Read error: Operation timed out) [23:45] *** filippo has quit IRC (Read error: Operation timed out) [23:46] *** w0rp has quit IRC (Read error: Operation timed out) [23:46] *** w0rp has joined #archiveteam [23:47] *** nox has joined #archiveteam [23:48] *** brayden has quit IRC (Read error: Operation timed out) [23:48] *** filippo has joined #archiveteam [23:50] *** PepsiMax_ has joined #archiveteam [23:50] *** PepsiMax has quit IRC (Read error: Connection reset by peer) [23:53] *** serapeum has quit IRC (Ping timeout: 606 seconds)