#archiveteam 2015-03-28,Sat

↑back Search

Time Nickname Message
00:00 🔗 johtso garyrh, hmm, just had ia upload the same file twice..
00:00 🔗 johtso with the --checksum flag
00:00 🔗 johtso didn't create a duplicate file or anything, but it transferred the data
00:01 🔗 johtso eh.. it's doing it for everything
00:02 🔗 garyrh was the filename the same as the filename in the item?
00:05 🔗 johtso yeah.. basically running "ia upload ident * --checksum" twice on the same files and it just starts from the top again
00:06 🔗 johtso I'm guessing it doesn't work with multiple arguments?
00:06 🔗 garyrh was there a task running while you were uploading? (check https://archive.org/history/ident)
00:06 🔗 nico_32 ia & robots.txt http://web.archive.org/web/*/https://groups.google.com/forum/#!topic/linux-sunxi/NKyOR4gxYgY
00:06 🔗 nico_32 the robots.txt allow this page to be crawled
00:06 🔗 nico_32 first rule to match
00:07 🔗 johtso garyrh, don't see anything in there
00:08 🔗 garyrh huh, odd.
00:09 🔗 johtso yeah, checksum just isn't working full stop
00:09 🔗 johtso the first time I tested it it did tell me that a file already existed though, which is odd
00:09 🔗 garyrh is it https://archive.org/details/MadrotterTreasureHuntBlogDump ?
00:09 🔗 johtso but now it's not doing it anymore
00:10 🔗 johtso garyrh, yep
00:10 🔗 johtso I'm probably going to use the --delete flag from now on, but would be good not to have to upload everything again pointlessly
00:10 🔗 garyrh ah, there are a bunch of archive tasks that haven't started yet.
00:10 🔗 garyrh those need to finish before --checksum will work
00:11 🔗 garyrh looks like something went wrong here too: https://catalogd.archive.org/log/397144129
00:11 🔗 johtso ouch.. what are those tasks?
00:13 🔗 johtso is it not liking my cancelling an upload part way through or something?
00:14 🔗 johtso it would be great if I could just disable all deriving until I'm actually done, seeing as I just want to get the data onto their servers as quickly as possible at this stage
00:15 🔗 garyrh canceling shouldn't affect it.
00:15 🔗 toad1 has joined #archiveteam
00:15 🔗 garyrh you can disable derives with ia upload ident file --no-derive
00:15 🔗 garyrh and then run a derive after you're done https://archive.org/manage/ident
00:15 🔗 johtso oh nice!
00:16 🔗 johtso so I'm thinking --checksum --delete --no-derive
00:17 🔗 johtso garyrh, could I just scrap this one and make a new one?
00:18 🔗 garyrh I would wait until either the tasks finish and/or an admin fixes it.
00:19 🔗 johtso It's just that I have limited disk space, so I have to get the file uploaded and deleted before I can download more :(
00:20 🔗 garyrh usually someone notices it within a few hours. If not, you can email info@archive.org.
00:42 🔗 johtso has quit IRC (Remote host closed the connection)
00:52 🔗 johtso has joined #archiveteam
00:58 🔗 mistym has quit IRC (Remote host closed the connection)
01:00 🔗 johtso_ has joined #archiveteam
01:02 🔗 johtso_ has quit IRC (Remote host closed the connection)
01:16 🔗 s0ph0s has joined #archiveteam
01:18 🔗 s0ph0s Hey, I was looking at the deathwatch page and I noticed that it says AOL is slated to kill several sites, including TUAW. I noticed the other day that Engadget and TUAW were mirroring each other in my RSS feeds, so I checked and the TUAW domain now redirects to Engadget. Could someone make that edit please?
01:31 🔗 trs80 s0ph0s: the wiki password to create an account is yahoosucks
01:33 🔗 s0ph0s Ah, okay. Thanks!
01:48 🔗 s0ph0s I made the edits.
02:00 🔗 Froggypwn has joined #archiveteam
02:00 🔗 mistym has joined #archiveteam
02:04 🔗 s0ph0s has left Textual IRC Client: www.textualapp.com
02:29 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:36 🔗 dashcloud has joined #archiveteam
02:45 🔗 Ymgve has quit IRC ()
03:02 🔗 schbirid2 has joined #archiveteam
03:03 🔗 schbirid has quit IRC (Read error: Operation timed out)
04:10 🔗 aaaaaaaaa has quit IRC (Leaving)
04:30 🔗 mistym has quit IRC (Remote host closed the connection)
04:53 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
04:57 🔗 mistym has joined #archiveteam
05:28 🔗 Jonimus has quit IRC (Ping timeout: 370 seconds)
05:42 🔗 Jonimus has joined #archiveteam
05:52 🔗 Jonimus has quit IRC (Ping timeout: 370 seconds)
06:04 🔗 Jonimus has joined #archiveteam
06:59 🔗 espes__ johtso: this is why I was doing https://github.com/espes/jdget
07:58 🔗 scyther has joined #archiveteam
08:33 🔗 dashcloud has quit IRC (Read error: Operation timed out)
08:39 🔗 dashcloud has joined #archiveteam
08:50 🔗 khaoohs_ has joined #archiveteam
08:50 🔗 khaoohs has quit IRC (Read error: Connection reset by peer)
09:00 🔗 mistym has quit IRC (Remote host closed the connection)
09:04 🔗 johtso espes__, great idea! I would definitely use that
09:06 🔗 T31M has joined #archiveteam
09:28 🔗 johtso is anyone familiar with gnu parallel?
09:28 🔗 johtso struggling to get something like this to work "parallel unrar t {} && mv {} ../outbox ::: ./**/*.rar"
09:29 🔗 johtso seems like it's putting an extra argument at the very end of the command
09:49 🔗 dashcloud has quit IRC (Read error: Operation timed out)
09:55 🔗 dashcloud has joined #archiveteam
10:00 🔗 johtso urgh, i'll just stick with xargs
10:02 🔗 schbirid2 your && splits the commants in two
10:03 🔗 schbirid2 i would not recommend parallel unrarring unless you are on ssd
10:03 🔗 schbirid2 you could write a simply script instead and call that with parallel
10:04 🔗 schbirid2 $1=file
10:04 🔗 schbirid2 unrar t "${file}" && mv "${file}" ../outbox/
10:04 🔗 schbirid2 then "parallel thatscript.sh ::: ./**/*.rar"
10:04 🔗 schbirid2 i'd recommend using absolute paths just to be safe of any shenanigans
10:06 🔗 schbirid2 johtso: ^
10:07 🔗 johtso schbirid2, this is what I came up with in the end: find . -name "*.rar" -print0 | xargs -0 -n 1 -I {} sh -c "unrar t '{}' && mv '{}' ../outbox
10:08 🔗 schbirid2 i like mine better! but whatever works works in hacky bash land
10:08 🔗 johtso I am on an ssd though so parallel would be nice :)
10:17 🔗 johtso schbirid2, think it's struggling with spaces in the file names..
10:18 🔗 schbirid2 mine wouldnt
10:18 🔗 johtso schbirid2, I mean with yourse
10:19 🔗 johtso *yours
10:19 🔗 schbirid2 waaaat :D
10:19 🔗 schbirid2 shit =)
10:19 🔗 schbirid2 oh i forgott o tell parallel the filename
10:19 🔗 johtso single quotes behaving different from double quotes?
10:19 🔗 schbirid2 "parallel thatscript.sh {} ::: ./**/*.rar"
10:19 🔗 johtso oh! :D
10:20 🔗 schbirid2 in bash yes, inside single quotes variables are not used
10:20 🔗 schbirid2 sorry, we should go -bs, i thought this wass
10:20 🔗 schbirid2 /join #archiveteam-bs
10:21 🔗 JMC ...
10:21 🔗 JMC are you sure -bs is a good tag?
10:21 🔗 JMC I mean, I'm not the kind of guy to point out the obvious...
10:22 🔗 schbirid2 what do you mean?
10:22 🔗 JMC but that's VERY misleading
10:22 🔗 JMC unless it means EXACTLY what I think it means...
10:22 🔗 schbirid2 it means bullshit :)
10:22 🔗 JMC Aaaah!
10:22 🔗 schbirid2 our channel for random things so we do not pollute this one with unimportant chat
10:22 🔗 JMC Then its not misleading at all! :D
10:23 🔗 JMC nevermind, thankyoudrivetrough
11:04 🔗 johtso It would be nice if the ia tool could watch a directory and upload anything put in it
11:04 🔗 johtso I suppose I could just run it in a loop with --delete
11:19 🔗 Ymgve has joined #archiveteam
11:20 🔗 habi has joined #archiveteam
11:23 🔗 habi has quit IRC (Client Quit)
11:33 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:44 🔗 dashcloud has joined #archiveteam
11:46 🔗 habi has joined #archiveteam
12:07 🔗 habi has quit IRC (Quit: Leaving.)
12:18 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:32 🔗 Stiletto has quit IRC (Read error: Operation timed out)
12:55 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
12:58 🔗 schbirid2 has joined #archiveteam
13:09 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:16 🔗 dashcloud has joined #archiveteam
13:25 🔗 primus104 has joined #archiveteam
13:36 🔗 pir^2 has joined #archiveteam
13:36 🔗 pir^2 Hi. Did I do https://archive.org/details/dmoz-rdf-20150327 right?
13:40 🔗 johtso If I have "server readonly -- tasks waiting for harddrive fix" is it still okay to uploading using ia? The uploads still seem to work
13:40 🔗 johtso and the tasks are building up
13:40 🔗 johtso *to upload
13:48 🔗 pir^2 Can https://archive.org/details/pdp10nocrew be added to wayback too?
13:57 🔗 Kazzy johtso: i think it's fine, it'll just upload to a different server
13:58 🔗 johtso Kazzy, yeah, I see it's uploading to some kind of an s3 staging server.. and then a task rsyncs it over
13:58 🔗 johtso I'm sure that's got to be a core part of the design.. once you upload something and get a 200 response, it should be safe
14:23 🔗 Stiletto has joined #archiveteam
14:26 🔗 pir^2 has quit IRC (Ping timeout: 370 seconds)
14:27 🔗 pir^2 has joined #archiveteam
14:28 🔗 pir^2 Hi, is it possible to add https://archive.org/details/dmoz-rdf-20150327 to Wayback Machine?
14:34 🔗 johtso getting really slow speeds uploading to IA now :(
14:37 🔗 primus104 has quit IRC (Leaving.)
14:49 🔗 trs80 http://didgoogleshutdown.com/ "Did Google Shutdown ___________ Yet?"
14:50 🔗 signius has quit IRC (Read error: Operation timed out)
14:51 🔗 pir^2 trs80: looks appropriate for #froogle
14:52 🔗 pir^2 no gmail, knol, google video?
15:05 🔗 signius has joined #archiveteam
15:16 🔗 DopefishJ has joined #archiveteam
15:19 🔗 VonGuard_ has joined #archiveteam
15:19 🔗 serapeum has quit IRC (Read error: Connection reset by peer)
15:20 🔗 deathy_ has joined #archiveteam
15:20 🔗 DFJustin has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 fresco___ has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 torvik has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 LittUp has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 deathy has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 VonGuard has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 mrfoo has quit IRC (Ping timeout: 417 seconds)
15:20 🔗 fresco___ has joined #archiveteam
15:20 🔗 VonGuard_ is now known as VonGuard
15:20 🔗 deathy_ is now known as deathy
15:20 🔗 mrfoo has joined #archiveteam
15:20 🔗 pfallenop has quit IRC (Read error: Connection reset by peer)
15:20 🔗 hive-mind has quit IRC (Read error: Connection reset by peer)
15:20 🔗 russss_ has joined #archiveteam
15:20 🔗 lhobas_ has joined #archiveteam
15:20 🔗 pfallenop has joined #archiveteam
15:20 🔗 Zebranky has joined #archiveteam
15:20 🔗 Zebranky_ has quit IRC (Read error: Connection reset by peer)
15:25 🔗 pir^2 has quit IRC (bye)
15:27 🔗 bmcginty has joined #archiveteam
15:27 🔗 sivoais_ has joined #archiveteam
15:27 🔗 russss has quit IRC (Ping timeout: 266 seconds)
15:27 🔗 lhobas has quit IRC (Ping timeout: 266 seconds)
15:27 🔗 mietek has quit IRC (Ping timeout: 266 seconds)
15:27 🔗 sivoais has quit IRC (Write error: Connection reset by peer)
15:27 🔗 russss_ has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 lhobas_ has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 mrfoo has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 fresco___ has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 deathy has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 VonGuard has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 johtso has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 balrog has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 Rickster has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 marvinw has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 kniffy has quit IRC (Ping timeout: 393 seconds)
15:27 🔗 johtso has joined #archiveteam
15:27 🔗 mietek has joined #archiveteam
15:27 🔗 kniffy has joined #archiveteam
15:27 🔗 russss_ has joined #archiveteam
15:27 🔗 VonGuard has joined #archiveteam
15:27 🔗 lhobas_ has joined #archiveteam
15:27 🔗 deathy has joined #archiveteam
15:27 🔗 Rickster` has joined #archiveteam
15:27 🔗 Rickster` is now known as Rickster
15:27 🔗 mrfoo has joined #archiveteam
15:28 🔗 fresco___ has joined #archiveteam
15:29 🔗 bmcginty_ has quit IRC (Ping timeout: 261 seconds)
15:30 🔗 torvik has joined #archiveteam
15:38 🔗 serapeum has joined #archiveteam
15:38 🔗 fresco___ has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 mrfoo has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 lhobas_ has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 VonGuard has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 russss_ has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 deathy has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 jk[SVP] has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 thefinn93 has quit IRC (Ping timeout: 498 seconds)
15:38 🔗 fresco___ has joined #archiveteam
15:38 🔗 lhobas_ has joined #archiveteam
15:38 🔗 deathy has joined #archiveteam
15:38 🔗 mrfoo has joined #archiveteam
15:39 🔗 thefinn93 has joined #archiveteam
15:39 🔗 LittUp has joined #archiveteam
15:39 🔗 VonGuard has joined #archiveteam
15:39 🔗 russss_ has joined #archiveteam
15:40 🔗 marvinw has joined #archiveteam
15:46 🔗 jk[SVP] has joined #archiveteam
15:48 🔗 russss_ has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 VonGuard has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 mrfoo has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 LittUp has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 deathy has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 lhobas_ has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 fresco___ has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 Kazzy has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 Jon has quit IRC (Ping timeout: 488 seconds)
15:48 🔗 LittUp has joined #archiveteam
15:48 🔗 jmtd has joined #archiveteam
15:48 🔗 jmtd is now known as Jon
15:48 🔗 russss_ has joined #archiveteam
15:48 🔗 fresco___ has joined #archiveteam
15:48 🔗 mrfoo has joined #archiveteam
15:48 🔗 VonGuard has joined #archiveteam
15:48 🔗 lhobas_ has joined #archiveteam
15:48 🔗 deathy has joined #archiveteam
15:48 🔗 aaaaaaaaa has joined #archiveteam
15:49 🔗 Kazzy has joined #archiveteam
15:51 🔗 balrog has joined #archiveteam
15:51 🔗 swebb sets mode: +o balrog
15:54 🔗 hive-mind has joined #archiveteam
15:58 🔗 SoJa has joined #archiveteam
16:02 🔗 Nertsy has quit IRC (Quit: Nertsy)
16:04 🔗 SketchCow So, I'm now to the point with the screenshotting project that I have a page where I drop the ones that didn't preview correctly.
16:08 🔗 Nertsy has joined #archiveteam
16:20 🔗 SoJa Which project currently uses the most bandwidth?
16:20 🔗 SoJa I have a dedi I would like to put toward a project, and it has a fat pipe, so I would like to help on a project that can use the speed
16:29 🔗 johtso SoJa, I'd guess rapidshare
16:29 🔗 SoJa johtso, yeah that's what I guessed, thanks
16:31 🔗 SoJa I helped out on the twitch archive a while back, boy was that one fun :P
16:33 🔗 primus104 has joined #archiveteam
16:58 🔗 will Oh wow I never realised Rapidshare was going
16:59 🔗 will I need to keep up :)
17:00 🔗 SoJa seems like they have the help they need on it, I was mostly getting rate limited responses :P
17:00 🔗 will How much disk space would running a script require?
17:00 🔗 johtso most of the good stuff is gone already :(
17:00 🔗 SoJa will, depends on which files you get, and how many concurrent you do
17:00 🔗 will SoJa: Right okay
17:01 🔗 SoJa as soon as they are done downloading, they are rsynced up to a server and then deleted, I think
17:30 🔗 mistym has joined #archiveteam
17:35 🔗 primus104 has quit IRC (Leaving.)
17:50 🔗 pir^2 has joined #archiveteam
17:52 🔗 pir^2 Is there a process to request adding a WARC to Wayback Machine?
17:56 🔗 Kazzy i *think* it just gets derived into the wayback
17:57 🔗 Kazzy but I've never tried it/uploaded warcs
18:00 🔗 aaaaaaaaa I think you have to set the type as "web" or something like that, but someone who knows more will show up eventually
18:01 🔗 garyrh the item with the warc needs to be have mediatype:web and if it's in the archiveteam collection, it will get ingested much quicker.
18:06 🔗 schbirid2 can we rewrite history? kinda scary to think about
18:08 🔗 pir^2 has quit IRC (Ping timeout: 370 seconds)
18:24 🔗 habi has joined #archiveteam
18:24 🔗 habi has left
18:27 🔗 Jonimus has quit IRC (Ping timeout: 370 seconds)
18:37 🔗 primus104 has joined #archiveteam
18:58 🔗 DopefishJ is now known as DFJustin
19:06 🔗 Jonimus has joined #archiveteam
19:25 🔗 mistym_ has joined #archiveteam
19:31 🔗 mistym has quit IRC (Read error: Operation timed out)
19:47 🔗 SN4T14_ has joined #archiveteam
19:55 🔗 SN4T14__ has quit IRC (Ping timeout: 512 seconds)
20:10 🔗 habi has joined #archiveteam
20:33 🔗 xmc shhhhh
20:34 🔗 xmc you need to set the mediatype to 'web', but only some users have privileges to do that
20:34 🔗 habi has left
20:34 🔗 Start has quit IRC (Read error: Connection reset by peer)
20:35 🔗 Start has joined #archiveteam
20:42 🔗 DFJustin has quit IRC (Quit: IMHOSTFU)
20:42 🔗 DFJustin has joined #archiveteam
20:42 🔗 swebb sets mode: +o DFJustin
20:56 🔗 Stiletto has quit IRC (Remote host closed the connection)
20:57 🔗 BlueMaxim has joined #archiveteam
20:58 🔗 Stiletto has joined #archiveteam
21:00 🔗 edsu_ has quit IRC (Quit: leaving)
21:00 🔗 edsu has joined #archiveteam
21:01 🔗 edsu has quit IRC (Client Quit)
21:01 🔗 edsu has joined #archiveteam
21:02 🔗 edsu has quit IRC (Client Quit)
21:03 🔗 edsu has joined #archiveteam
21:17 🔗 Emcy_ has joined #archiveteam
21:23 🔗 Emcy has quit IRC (Ping timeout: 512 seconds)
21:47 🔗 Jon has quit IRC (Ping timeout: 265 seconds)
21:47 🔗 Jon has joined #archiveteam
22:18 🔗 pir^2 has joined #archiveteam
22:19 🔗 pir^2 https://archive.org/details/pdp10nocrew What more do I need to do to get it in the Wayback Machine?
22:20 🔗 pir^2 @ xmc, Kazzy, aaaaaaaaa, et al.
22:21 🔗 pir^2 also re garyrh
22:23 🔗 pir^2 and can you add https://archive.org/details/dmoz-rdf-20150327 too?
22:26 🔗 Sanqui retrospring.net keeps blocking archivebot, what should we do?
22:29 🔗 Kazzy ip ban, or useragent ban etc?
22:30 🔗 Kazzy slow down, more delay, less workers?
22:30 🔗 pir^2 has quit IRC (Ping timeout: 370 seconds)
22:32 🔗 Sanqui ip ban likely
22:32 🔗 Sanqui but i don't know
22:33 🔗 Sanqui don't know how to test with archivebot
22:44 🔗 scyther has quit IRC (Leaving)
23:22 🔗 Emcy_ has quit IRC (Read error: Connection reset by peer)
23:25 🔗 Emcy has joined #archiveteam
23:42 🔗 c_b has joined #archiveteam
23:43 🔗 NovaKing_ has quit IRC (Read error: Operation timed out)
23:43 🔗 cadbury_ has quit IRC (Read error: Operation timed out)
23:43 🔗 aNthraXx_ has quit IRC (Read error: Operation timed out)
23:43 🔗 antomatic has quit IRC (Read error: Operation timed out)
23:43 🔗 ats has quit IRC (Read error: Operation timed out)
23:43 🔗 thefinn93 has quit IRC (Read error: Operation timed out)
23:43 🔗 ats has joined #archiveteam
23:43 🔗 Sk1d has quit IRC (Read error: Operation timed out)
23:43 🔗 antomatic has joined #archiveteam
23:43 🔗 svchfoo1 sets mode: +o antomatic
23:43 🔗 primus has quit IRC (Read error: Operation timed out)
23:44 🔗 Sk1d has joined #archiveteam
23:44 🔗 fenn has quit IRC (Read error: Operation timed out)
23:44 🔗 primus has joined #archiveteam
23:44 🔗 fenn has joined #archiveteam
23:44 🔗 thefinn93 has joined #archiveteam
23:44 🔗 nox has quit IRC (Read error: Operation timed out)
23:45 🔗 caber has quit IRC (Read error: Operation timed out)
23:45 🔗 filippo has quit IRC (Read error: Operation timed out)
23:46 🔗 w0rp has quit IRC (Read error: Operation timed out)
23:46 🔗 w0rp has joined #archiveteam
23:47 🔗 nox has joined #archiveteam
23:48 🔗 brayden has quit IRC (Read error: Operation timed out)
23:48 🔗 filippo has joined #archiveteam
23:50 🔗 PepsiMax_ has joined #archiveteam
23:50 🔗 PepsiMax has quit IRC (Read error: Connection reset by peer)
23:53 🔗 serapeum has quit IRC (Ping timeout: 606 seconds)

irclogger-viewer