[00:07] X done: 3.5-> 1.3 [00:12] *** JetBalsa has quit IRC (Read error: Operation timed out) [00:13] *** JetBalsa has joined #archiveteam-bs [00:22] Anyone got an rsync target we could use for the livejournal discovery? Only 5GB or so at max I think, mainly because my target is dying [00:35] Y done: 6 -> 2.2 [00:35] HCross yo [00:36] HCross: i have plenty of space, i'm in [00:37] bsmith093, pm your target to arkiver please [00:38] Z done: 0.333 -> 0.121 [00:38] Total (excluding H, N, T and misc) is only 75G -- zip compression works well, apparently. [00:42] Now pushing them up to FOS [00:43] JesseW: damn, that much better [00:43] JesseW: how do i set myself up as an rsync target? [00:44] well, don't forget that there's about 100GB remaining in those last 3 letters [00:44] bsmith093: IDK -- I haven't done this. Probably ask arkiver or HCross [00:45] http://www.archiveteam.org/index.php?title=Dev/Staging everything until Megawarc factory [00:47] HCross thanks [00:53] HCross arkiver ready [00:54] I think hes asleep now, but will get it tomorrow [00:54] HCross can you test if it works? [00:55] Sure [00:56] PM me the info [01:26] *** balrog has quit IRC (Bye) [01:37] *** balrog has joined #archiveteam-bs [01:37] *** swebb sets mode: +o balrog [02:25] OK, now generating extracting the metadata from the whole grab [02:29] is there anyone here using debian sid, and if so, how unstable i it [02:29] Kubuntu's inability to reliably reboot or restore touchpad settings has finally pissed me off [02:40] *** Microguru has joined #archiveteam-bs [02:47] HCross i could just pull the data using rsync, you dont have to push [03:08] *** bwn has quit IRC (Ping timeout: 492 seconds) [03:17] *** ppsym has joined #archiveteam-bs [03:18] *** altlabel has quit IRC (Ping timeout: 258 seconds) [03:21] *** PurpleSym has quit IRC (Ping timeout: 506 seconds) [03:23] *** ppsym is now known as PurpleSym [03:36] Microguru: yeah, #archiveteam uses IRC slightly unusually, I think. [03:37] We kinda need to, considering that this is where most of the coordination happens. [03:46] *** PotcFdk has quit IRC (Remote host closed the connection) [03:49] there's a post on the AT wiki about how to use statistics to estimate the number of pages on a site using repeated clicks of the random button or something like that. I have a site I'm starting to gather data on for archival, and I want to verify my previous estimate. I can't find that page. anyone know what it was again [03:49] *** PotcFdk has joined #archiveteam-bs [03:50] huh interesting [03:50] sounds like a straightforward application of the math used to solve the German Tank Problem [03:50] thank you. that's what I was thinking of. [03:55] well, I'll be uploading stuff to FOS for a while -- I have about 75GB to upload, and I'm getting ~ 0.3MB/s. :-/ [04:01] *** PurpleSym has quit IRC (*) [04:01] *** ppsym has joined #archiveteam-bs [04:01] *** ppsym is now known as PurpleSym [04:11] *** bwn has joined #archiveteam-bs [04:41] *** JesseW has quit IRC (Quit: Leaving.) [04:44] SketchCow: i'm up to 2012-05-15 with kpfa [04:44] i think i went thur 2 months in one day [04:45] and you may get a third before i go to bed [05:07] JesseW 75GB/300KBps = 250 000 seconds or about 2.894 days [05:10] maybe I've just been watching a lot of Star Trek lately, but a lot of the recent chatter sounds Vulcan [05:10] and I don't mean that in a positive way [05:13] wat [05:15] it's probably just the Star Trek, carry on [05:32] *** JesseW has joined #archiveteam-bs [05:54] JesseW: how goes the csv script? [05:56] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:57] bsmith093: currently crunching through the 187,361 files in Naruto [05:57] seems to be working well [05:57] Fanfiction_B.zip has an ETA of about 3 more hours. [05:58] jesus, that's almost athrid as poopular as harry potter with 600K [05:58] wow, typos :P [05:59] JesseW: so all the zips together , how big? [06:01] 75GB [06:01] (and remember, I still don't have the last big 3, which are about ~100GB (uncompressed)) [06:02] *** Sk1d has joined #archiveteam-bs [06:04] *** JesseW has quit IRC (Quit: Leaving.) [06:04] oh, right [06:16] JesseW needs a bouncer :p [06:16] you guys maybe getting a sean hannity collection at some point [06:32] *** bwn has quit IRC (Ping timeout: 492 seconds) [06:51] *** metalcamp has joined #archiveteam-bs [07:31] *** bwn has joined #archiveteam-bs [07:48] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:04] *** bsmith093 has quit IRC (Ping timeout: 370 seconds) [08:19] *** bsmith093 has joined #archiveteam-bs [08:30] SketchCow: looks like i found more Premiere Interactive radio shows [08:31] one called Jim Rome and it goes back to 2005 [08:31] and is open [08:40] so Jim Rome Show is a sports show [08:40] *** DFJustin has quit IRC (Read error: Connection reset by peer) [08:40] *** DFJustin has joined #archiveteam-bs [08:40] *** swebb sets mode: +o DFJustin [08:40] plus side we will get a intervew with Armstrong on jan 3 2005 hour 3 [08:54] *** superkuh has joined #archiveteam-bs [09:06] *** lytv has quit IRC (Ping timeout: 244 seconds) [09:07] *** JetBalsa has quit IRC (Read error: Operation timed out) [09:07] *** lytv has joined #archiveteam-bs [09:07] *** JetBalsa has joined #archiveteam-bs [09:11] *** schbirid has joined #archiveteam-bs [09:32] *** RichardG has joined #archiveteam-bs [09:56] *** BlueMaxim has quit IRC (Quit: Leaving) [10:18] *** brayden has quit IRC (Quit: Leaving) [10:19] *** brayden has joined #archiveteam-bs [10:19] *** swebb sets mode: +o brayden [11:37] *** dan- has quit IRC (Quit: Nyan nyan) [12:30] *** HCross2 has quit IRC () [12:34] *** dan- has joined #archiveteam-bs [12:53] *** metalcamp has joined #archiveteam-bs [13:15] *** HCross2 has joined #archiveteam-bs [13:20] bsmith093, ping [14:02] *** pgoetz has quit IRC (Remote host closed the connection) [14:19] *** Start has quit IRC (Quit: Disconnected.) [15:34] *** Start has joined #archiveteam-bs [15:55] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [15:59] *** RichardG has joined #archiveteam-bs [16:07] *** Start has quit IRC (Quit: Disconnected.) [16:13] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [16:28] *** JesseW has joined #archiveteam-bs [16:40] *** metalcamp has joined #archiveteam-bs [16:49] *** JesseW has quit IRC (Quit: Leaving.) [17:39] HCross pong [17:41] Anyone good at dealing with really strange errors? http://paste.nerds.io/ucimumopul.avrasm [17:43] *** Start has joined #archiveteam-bs [17:44] HCross: What's the context of the error? rsyncing to a newsbuddy worker? [17:44] yeah [17:45] *** metalcamp has quit IRC (Ping timeout: 250 seconds) [17:47] Is it reproducible? [17:47] Yes, its happening every time it rsyncs to a worker [17:47] A specific worker? [17:47] all of them [17:48] Can you reproduce the error with a manual command? [17:50] Also, perhaps try reinstalling the rsync package through your package manager. [17:52] *** signius_ has quit IRC (Read error: Operation timed out) [18:02] *** Start has quit IRC (Quit: Disconnected.) [18:06] *** signius_ has joined #archiveteam-bs [18:07] phuzion, works with other rsyncs manually [18:10] *** metalcamp has joined #archiveteam-bs [18:10] *** JW_work has joined #archiveteam-bs [18:13] HCross: Can you figure out the EXACT command that is being issued by the newsbuddy script and replicate that on the command line? [18:15] rsync -avz --no-o --no-g --progress --remove-source-files list-videos_temp" + str(i) + " " + rsync_targets[i] [18:15] when sourceforge responds [18:15] will test, ty [18:16] worked fine on the command line [18:17] Strange. [18:17] well, try reinstalling [18:17] rsync ^ [18:17] ok, can you pause the livejournal stuff please arkiver [18:18] *** Start has joined #archiveteam-bs [18:18] paused. what's the problem with it? [18:18] its going to the same server, thats having the rsync issues [18:21] reinstalled, now we wait [18:26] arkiver, ready for an unpause [18:27] right [18:27] do !start [18:28] on livejournal? [18:28] oh sorry [18:29] to unpause newsbuddy do !start [18:29] I'll restart livejournal [18:29] I didnt pause it [18:29] restarted! [18:29] I timed it so it wasnt rsyncing out [18:29] Right [18:30] HCross so am I still needed? [18:34] I should ask - any rsync experts around to give bsmith093 a hand? Trying to get an rsync target setup and his network is being strange [18:35] Could someone add https://github.com/matteobrusa/TumblrToStaticExporter to the wiki page for Tumblr? [18:38] o know port forwarding works, i have another one open just fine. [18:43] arkiver: help with my rsyns settings please, i cant seem to get the port to open on my linksys ea6500 router [18:44] phuzion, its still doing it http://paste.nerds.io/owomiyunub.erl [18:48] HCross: rsync --version [18:49] rsync version 3.1.1 protocol version 31 [18:49] phuzion: rsync version 3.1.1 protocol version 31 [18:49] ditto for me [18:49] bsmith093: Are you experiencing the same error that HCross is? [18:50] phuzion: he's the other end of the connection i'm trying to make [18:50] phuzion, hes having NAT issues with getting files in, he isnt having the issue [18:50] bsmith093, im talking about something different [18:50] Oh. Let's troubleshoot one thing at a time. [18:50] yeah, good plan [18:50] HCross sorry [18:50] HCross: what distro? [18:50] Debian 8 [18:51] phuzion, works fine on command line, when newsbuddy runs it, it falls over [18:53] HCross anyway, now port 873 tests as open, so go nuts [18:59] HCross: are all of your python packages installed with apt or do you have some that were installed via easy_install or pip? [18:59] phuzion, pip [19:00] its worked fine for ages, and then suddenly brokwe [19:00] broke [19:00] pip freeze? [19:00] http://paste.nerds.io/cukamufuwo.mel [19:02] Honestly, my next suggestion is pip --force-reinstall all those packages [19:02] ( -r requirements.txt obviously) [19:02] all those work fine, its just rsync being a pest [19:02] well, you reinstalled rsync [19:03] ok, ill try that [19:03] so, you theoretically know that the rsync binaries aren't corrupted or anything. [19:03] and the problem doesn't happen when you run the command from a shell (I assume bash?) [19:03] yeah [19:04] So, it leads me to believe that it's either a problem with your python environment, or a bug in the newsbuddy code. [19:05] There was an update recently [19:05] Which packages updated? [19:05] I meant a code update [19:05] Oh. [19:05] Thats probably killed it [19:06] phuzion, thanks for dealing with my noobishness [19:08] No problem. If you suspect you know which commit caused the problem, find the one before it and do 'git checkout abcd1234' or whatever the first 6-8 chars of the commit ID are, and try re-running the code. [19:16] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [19:23] *** Honno has joined #archiveteam-bs [19:40] *** bwn has quit IRC (Ping timeout: 246 seconds) [19:43] *** tomwsmf-a has joined #archiveteam-bs [19:44] *** Start has quit IRC (Quit: Disconnected.) [20:14] *** bwn has joined #archiveteam-bs [20:42] lo all, I wanna do a games torrent in the excess of 600GB. how the hell do I get the ball rolling? Need a server to start seeding it at decent rates right? What's a good, cost effective service for that? [20:43] Games torrent as in, a collection of games [20:44] o_O??? [20:44] yeah uh [20:44] IA? [20:44] they all freeware? [20:44] Internet Archive, I mean? [20:44] yeah all freeware games [20:44] oooo [20:44] nice [20:44] From http://www.archiveteam.org/index.php?title=GameMaker_Sandbox [20:44] I think it doesn't handle torrents that large, though. [20:45] IA does torrent hosting heh? [20:45] All items on IA are also available as torrents, yes. [20:45] And pretty much anywhere can act as a webseed [20:45] That's the point of webseeds [20:45] Anyway, I'm just looking into the warcs and stuff, what I want to do is compile all the game downloads into their own seperate folders, with a text file that contains meta data I strip from the archive [20:45] JW_work: your presence would be good in #urlteam if you have a moment [20:46] Interested in how that works JM_work [20:46] JW even, too tired, been downloading games all day lol [20:48] Er there is a concern I haven't really thought about [20:48] We love archives and all right, but we archive in their original form [20:48] What I'd be doing is taking thousands of developers work, even if free, and compiling them to a bundle [20:49] They're all freeware btw [20:49] The problem is, using the Internet Archive to browse the site is slow as hell, and using these big warcs is pretty hard for newbies [20:50] I also think just some massive torrent of games will gain some traction in communities, which is important because I want people to really get some enjoyment out of this stuff [20:51] First off, ethically how do you guys think about that, and secondly am I breaking some legal stuff [20:52] It's unreasonable to contact all the devs for their consent, I would image most if not all wouldn't mind this project however [20:52] I could use an opt-out system, where devs can contact me to remove their game from this listing, but then I can't really use torrents because they would be variable to be redundant [20:52] oh man the geocities torrent died? Where is the data now then? [20:53] Frogging: in lots of places — neocities (AFAIK), on IA (I am pretty certain), likely many other places too. [20:53] Just none of them happen to be seeding the original format of the torrent [20:54] ah [20:54] https://archive.org/details/2009-archiveteam-geocities-part1 [20:57] thanks [21:08] HCross it's working now [21:10] *** RichardG has quit IRC (Ping timeout: 260 seconds) [21:16] *** schbirid has quit IRC (Quit: Leaving) [21:26] *** RichardG has joined #archiveteam-bs [21:36] *** xXx_ndidd has joined #archiveteam-bs [21:43] *** ndiddy has quit IRC (Read error: Operation timed out) [21:48] *** xXx_ndidd is now known as ndiddy [22:04] SketchCow: kpfa is up to 2012-06-30 [22:07] Thank you! [22:07] There's a redone one [22:08] (Geocities torrent) [22:08] From Dragan. [22:09] *** RichardG has quit IRC (Read error: Connection reset by peer) [22:12] i'm starting to upload The Jim Rome Show: https://archive.org/details/The_Jim_Rome_Show_Podcast-2005-01-03 [22:12] *** RichardG has joined #archiveteam-bs [22:25] *** RichardG has quit IRC (Ping timeout: 244 seconds) [22:31] *** Honno has quit IRC (Ping timeout: 492 seconds) [22:33] Microguru: there are captcha solving plugins & services- check out the list from plowshare: https://github.com/mcrapet/plowshare/blob/master/docs/plowdown.1 [22:33] *** RichardG has joined #archiveteam-bs [23:38] *** Start has joined #archiveteam-bs [23:54] phuzion, thanks. We've jumped back to an older version and it seems to be working [23:56] note to self (or SketchCow) — the last comment on http://ascii.textfiles.com/archives/875 is spam. [23:58] *** RedType_ has left