#archiveteam-bs 2016-03-21,Mon

↑back Search

Time Nickname Message
00:07 🔗 JesseW X done: 3.5-> 1.3
00:12 🔗 JetBalsa has quit IRC (Read error: Operation timed out)
00:13 🔗 JetBalsa has joined #archiveteam-bs
00:22 🔗 HCross Anyone got an rsync target we could use for the livejournal discovery? Only 5GB or so at max I think, mainly because my target is dying
00:35 🔗 JesseW Y done: 6 -> 2.2
00:35 🔗 bsmith093 HCross yo
00:36 🔗 bsmith093 HCross: i have plenty of space, i'm in
00:37 🔗 HCross bsmith093, pm your target to arkiver please
00:38 🔗 JesseW Z done: 0.333 -> 0.121
00:38 🔗 JesseW Total (excluding H, N, T and misc) is only 75G -- zip compression works well, apparently.
00:42 🔗 JesseW Now pushing them up to FOS
00:43 🔗 bsmith093 JesseW: damn, that much better
00:43 🔗 bsmith093 JesseW: how do i set myself up as an rsync target?
00:44 🔗 JesseW well, don't forget that there's about 100GB remaining in those last 3 letters
00:44 🔗 JesseW bsmith093: IDK -- I haven't done this. Probably ask arkiver or HCross
00:45 🔗 HCross http://www.archiveteam.org/index.php?title=Dev/Staging everything until Megawarc factory
00:47 🔗 bsmith093 HCross thanks
00:53 🔗 bsmith093 HCross arkiver ready
00:54 🔗 HCross I think hes asleep now, but will get it tomorrow
00:54 🔗 bsmith093 HCross can you test if it works?
00:55 🔗 HCross Sure
00:56 🔗 HCross PM me the info
01:26 🔗 balrog has quit IRC (Bye)
01:37 🔗 balrog has joined #archiveteam-bs
01:37 🔗 swebb sets mode: +o balrog
02:25 🔗 JesseW OK, now generating extracting the metadata from the whole grab
02:29 🔗 yipdw is there anyone here using debian sid, and if so, how unstable i it
02:29 🔗 yipdw Kubuntu's inability to reliably reboot or restore touchpad settings has finally pissed me off
02:40 🔗 Microguru has joined #archiveteam-bs
02:47 🔗 bsmith093 HCross i could just pull the data using rsync, you dont have to push
03:08 🔗 bwn has quit IRC (Ping timeout: 492 seconds)
03:17 🔗 ppsym has joined #archiveteam-bs
03:18 🔗 altlabel has quit IRC (Ping timeout: 258 seconds)
03:21 🔗 PurpleSym has quit IRC (Ping timeout: 506 seconds)
03:23 🔗 ppsym is now known as PurpleSym
03:36 🔗 JesseW Microguru: yeah, #archiveteam uses IRC slightly unusually, I think.
03:37 🔗 Microguru We kinda need to, considering that this is where most of the coordination happens.
03:46 🔗 PotcFdk has quit IRC (Remote host closed the connection)
03:49 🔗 Microguru there's a post on the AT wiki about how to use statistics to estimate the number of pages on a site using repeated clicks of the random button or something like that. I have a site I'm starting to gather data on for archival, and I want to verify my previous estimate. I can't find that page. anyone know what it was again
03:49 🔗 PotcFdk has joined #archiveteam-bs
03:50 🔗 xmc huh interesting
03:50 🔗 xmc sounds like a straightforward application of the math used to solve the German Tank Problem
03:50 🔗 Microguru thank you. that's what I was thinking of.
03:55 🔗 JesseW well, I'll be uploading stuff to FOS for a while -- I have about 75GB to upload, and I'm getting ~ 0.3MB/s. :-/
04:01 🔗 PurpleSym has quit IRC (*)
04:01 🔗 ppsym has joined #archiveteam-bs
04:01 🔗 ppsym is now known as PurpleSym
04:11 🔗 bwn has joined #archiveteam-bs
04:41 🔗 JesseW has quit IRC (Quit: Leaving.)
04:44 🔗 godane SketchCow: i'm up to 2012-05-15 with kpfa
04:44 🔗 godane i think i went thur 2 months in one day
04:45 🔗 godane and you may get a third before i go to bed
05:07 🔗 bsmith093 JesseW 75GB/300KBps = 250 000 seconds or about 2.894 days
05:10 🔗 yipdw maybe I've just been watching a lot of Star Trek lately, but a lot of the recent chatter sounds Vulcan
05:10 🔗 yipdw and I don't mean that in a positive way
05:13 🔗 Frogging wat
05:15 🔗 yipdw it's probably just the Star Trek, carry on
05:32 🔗 JesseW has joined #archiveteam-bs
05:54 🔗 bsmith093 JesseW: how goes the csv script?
05:56 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
05:57 🔗 JesseW bsmith093: currently crunching through the 187,361 files in Naruto
05:57 🔗 JesseW seems to be working well
05:57 🔗 JesseW Fanfiction_B.zip has an ETA of about 3 more hours.
05:58 🔗 bsmith093 jesus, that's almost athrid as poopular as harry potter with 600K
05:58 🔗 bsmith093 wow, typos :P
05:59 🔗 bsmith093 JesseW: so all the zips together , how big?
06:01 🔗 JesseW 75GB
06:01 🔗 JesseW (and remember, I still don't have the last big 3, which are about ~100GB (uncompressed))
06:02 🔗 Sk1d has joined #archiveteam-bs
06:04 🔗 JesseW has quit IRC (Quit: Leaving.)
06:04 🔗 bsmith093 oh, right
06:16 🔗 Frogging JesseW needs a bouncer :p
06:16 🔗 godane you guys maybe getting a sean hannity collection at some point
06:32 🔗 bwn has quit IRC (Ping timeout: 492 seconds)
06:51 🔗 metalcamp has joined #archiveteam-bs
07:31 🔗 bwn has joined #archiveteam-bs
07:48 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
08:04 🔗 bsmith093 has quit IRC (Ping timeout: 370 seconds)
08:19 🔗 bsmith093 has joined #archiveteam-bs
08:30 🔗 godane SketchCow: looks like i found more Premiere Interactive radio shows
08:31 🔗 godane one called Jim Rome and it goes back to 2005
08:31 🔗 godane and is open
08:40 🔗 godane so Jim Rome Show is a sports show
08:40 🔗 DFJustin has quit IRC (Read error: Connection reset by peer)
08:40 🔗 DFJustin has joined #archiveteam-bs
08:40 🔗 swebb sets mode: +o DFJustin
08:40 🔗 godane plus side we will get a intervew with Armstrong on jan 3 2005 hour 3
08:54 🔗 superkuh has joined #archiveteam-bs
09:06 🔗 lytv has quit IRC (Ping timeout: 244 seconds)
09:07 🔗 JetBalsa has quit IRC (Read error: Operation timed out)
09:07 🔗 lytv has joined #archiveteam-bs
09:07 🔗 JetBalsa has joined #archiveteam-bs
09:11 🔗 schbirid has joined #archiveteam-bs
09:32 🔗 RichardG has joined #archiveteam-bs
09:56 🔗 BlueMaxim has quit IRC (Quit: Leaving)
10:18 🔗 brayden has quit IRC (Quit: Leaving)
10:19 🔗 brayden has joined #archiveteam-bs
10:19 🔗 swebb sets mode: +o brayden
11:37 🔗 dan- has quit IRC (Quit: Nyan nyan)
12:30 🔗 HCross2 has quit IRC ()
12:34 🔗 dan- has joined #archiveteam-bs
12:53 🔗 metalcamp has joined #archiveteam-bs
13:15 🔗 HCross2 has joined #archiveteam-bs
13:20 🔗 HCross bsmith093, ping
14:02 🔗 pgoetz has quit IRC (Remote host closed the connection)
14:19 🔗 Start has quit IRC (Quit: Disconnected.)
15:34 🔗 Start has joined #archiveteam-bs
15:55 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
15:59 🔗 RichardG has joined #archiveteam-bs
16:07 🔗 Start has quit IRC (Quit: Disconnected.)
16:13 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
16:28 🔗 JesseW has joined #archiveteam-bs
16:40 🔗 metalcamp has joined #archiveteam-bs
16:49 🔗 JesseW has quit IRC (Quit: Leaving.)
17:39 🔗 bsmith093 HCross pong
17:41 🔗 HCross Anyone good at dealing with really strange errors? http://paste.nerds.io/ucimumopul.avrasm
17:43 🔗 Start has joined #archiveteam-bs
17:44 🔗 phuzion HCross: What's the context of the error? rsyncing to a newsbuddy worker?
17:44 🔗 HCross yeah
17:45 🔗 metalcamp has quit IRC (Ping timeout: 250 seconds)
17:47 🔗 phuzion Is it reproducible?
17:47 🔗 HCross Yes, its happening every time it rsyncs to a worker
17:47 🔗 phuzion A specific worker?
17:47 🔗 HCross all of them
17:48 🔗 phuzion Can you reproduce the error with a manual command?
17:50 🔗 phuzion Also, perhaps try reinstalling the rsync package through your package manager.
17:52 🔗 signius_ has quit IRC (Read error: Operation timed out)
18:02 🔗 Start has quit IRC (Quit: Disconnected.)
18:06 🔗 signius_ has joined #archiveteam-bs
18:07 🔗 HCross phuzion, works with other rsyncs manually
18:10 🔗 metalcamp has joined #archiveteam-bs
18:10 🔗 JW_work has joined #archiveteam-bs
18:13 🔗 phuzion HCross: Can you figure out the EXACT command that is being issued by the newsbuddy script and replicate that on the command line?
18:15 🔗 arkiver rsync -avz --no-o --no-g --progress --remove-source-files list-videos_temp" + str(i) + " " + rsync_targets[i]
18:15 🔗 HCross when sourceforge responds
18:15 🔗 HCross will test, ty
18:16 🔗 HCross worked fine on the command line
18:17 🔗 phuzion Strange.
18:17 🔗 arkiver well, try reinstalling
18:17 🔗 arkiver rsync ^
18:17 🔗 HCross ok, can you pause the livejournal stuff please arkiver
18:18 🔗 Start has joined #archiveteam-bs
18:18 🔗 arkiver paused. what's the problem with it?
18:18 🔗 HCross its going to the same server, thats having the rsync issues
18:21 🔗 HCross reinstalled, now we wait
18:26 🔗 HCross arkiver, ready for an unpause
18:27 🔗 arkiver right
18:27 🔗 arkiver do !start
18:28 🔗 HCross on livejournal?
18:28 🔗 arkiver oh sorry
18:29 🔗 arkiver to unpause newsbuddy do !start
18:29 🔗 arkiver I'll restart livejournal
18:29 🔗 HCross I didnt pause it
18:29 🔗 arkiver restarted!
18:29 🔗 HCross I timed it so it wasnt rsyncing out
18:29 🔗 arkiver Right
18:30 🔗 bsmith093 HCross so am I still needed?
18:34 🔗 HCross I should ask - any rsync experts around to give bsmith093 a hand? Trying to get an rsync target setup and his network is being strange
18:35 🔗 JW_work Could someone add https://github.com/matteobrusa/TumblrToStaticExporter to the wiki page for Tumblr?
18:38 🔗 bsmith093 o know port forwarding works, i have another one open just fine.
18:43 🔗 bsmith093 arkiver: help with my rsyns settings please, i cant seem to get the port to open on my linksys ea6500 router
18:44 🔗 HCross phuzion, its still doing it http://paste.nerds.io/owomiyunub.erl
18:48 🔗 phuzion HCross: rsync --version
18:49 🔗 HCross rsync version 3.1.1 protocol version 31
18:49 🔗 bsmith093 phuzion: rsync version 3.1.1 protocol version 31
18:49 🔗 bsmith093 ditto for me
18:49 🔗 phuzion bsmith093: Are you experiencing the same error that HCross is?
18:50 🔗 bsmith093 phuzion: he's the other end of the connection i'm trying to make
18:50 🔗 HCross phuzion, hes having NAT issues with getting files in, he isnt having the issue
18:50 🔗 HCross bsmith093, im talking about something different
18:50 🔗 phuzion Oh. Let's troubleshoot one thing at a time.
18:50 🔗 HCross yeah, good plan
18:50 🔗 bsmith093 HCross sorry
18:50 🔗 phuzion HCross: what distro?
18:50 🔗 HCross Debian 8
18:51 🔗 HCross phuzion, works fine on command line, when newsbuddy runs it, it falls over
18:53 🔗 bsmith093 HCross anyway, now port 873 tests as open, so go nuts
18:59 🔗 phuzion HCross: are all of your python packages installed with apt or do you have some that were installed via easy_install or pip?
18:59 🔗 HCross phuzion, pip
19:00 🔗 HCross its worked fine for ages, and then suddenly brokwe
19:00 🔗 HCross broke
19:00 🔗 phuzion pip freeze?
19:00 🔗 HCross http://paste.nerds.io/cukamufuwo.mel
19:02 🔗 phuzion Honestly, my next suggestion is pip --force-reinstall all those packages
19:02 🔗 phuzion ( -r requirements.txt obviously)
19:02 🔗 HCross all those work fine, its just rsync being a pest
19:02 🔗 phuzion well, you reinstalled rsync
19:03 🔗 HCross ok, ill try that
19:03 🔗 phuzion so, you theoretically know that the rsync binaries aren't corrupted or anything.
19:03 🔗 phuzion and the problem doesn't happen when you run the command from a shell (I assume bash?)
19:03 🔗 HCross yeah
19:04 🔗 phuzion So, it leads me to believe that it's either a problem with your python environment, or a bug in the newsbuddy code.
19:05 🔗 HCross There was an update recently
19:05 🔗 phuzion Which packages updated?
19:05 🔗 HCross I meant a code update
19:05 🔗 phuzion Oh.
19:05 🔗 HCross Thats probably killed it
19:06 🔗 HCross phuzion, thanks for dealing with my noobishness
19:08 🔗 phuzion No problem. If you suspect you know which commit caused the problem, find the one before it and do 'git checkout abcd1234' or whatever the first 6-8 chars of the commit ID are, and try re-running the code.
19:16 🔗 metalcamp has quit IRC (Ping timeout: 244 seconds)
19:23 🔗 Honno has joined #archiveteam-bs
19:40 🔗 bwn has quit IRC (Ping timeout: 246 seconds)
19:43 🔗 tomwsmf-a has joined #archiveteam-bs
19:44 🔗 Start has quit IRC (Quit: Disconnected.)
20:14 🔗 bwn has joined #archiveteam-bs
20:42 🔗 Honno lo all, I wanna do a games torrent in the excess of 600GB. how the hell do I get the ball rolling? Need a server to start seeding it at decent rates right? What's a good, cost effective service for that?
20:43 🔗 Honno Games torrent as in, a collection of games
20:44 🔗 Smiley o_O???
20:44 🔗 Honno yeah uh
20:44 🔗 JW_work IA?
20:44 🔗 Smiley they all freeware?
20:44 🔗 JW_work Internet Archive, I mean?
20:44 🔗 Honno yeah all freeware games
20:44 🔗 Smiley oooo
20:44 🔗 Smiley nice
20:44 🔗 Honno From http://www.archiveteam.org/index.php?title=GameMaker_Sandbox
20:44 🔗 JW_work I think it doesn't handle torrents that large, though.
20:45 🔗 Honno IA does torrent hosting heh?
20:45 🔗 JW_work All items on IA are also available as torrents, yes.
20:45 🔗 JW_work And pretty much anywhere can act as a webseed
20:45 🔗 JW_work That's the point of webseeds
20:45 🔗 Honno Anyway, I'm just looking into the warcs and stuff, what I want to do is compile all the game downloads into their own seperate folders, with a text file that contains meta data I strip from the archive
20:45 🔗 xmc JW_work: your presence would be good in #urlteam if you have a moment
20:46 🔗 Honno Interested in how that works JM_work
20:46 🔗 Honno JW even, too tired, been downloading games all day lol
20:48 🔗 Honno Er there is a concern I haven't really thought about
20:48 🔗 Honno We love archives and all right, but we archive in their original form
20:48 🔗 Honno What I'd be doing is taking thousands of developers work, even if free, and compiling them to a bundle
20:49 🔗 Honno They're all freeware btw
20:49 🔗 Honno The problem is, using the Internet Archive to browse the site is slow as hell, and using these big warcs is pretty hard for newbies
20:50 🔗 Honno I also think just some massive torrent of games will gain some traction in communities, which is important because I want people to really get some enjoyment out of this stuff
20:51 🔗 Honno First off, ethically how do you guys think about that, and secondly am I breaking some legal stuff
20:52 🔗 Honno It's unreasonable to contact all the devs for their consent, I would image most if not all wouldn't mind this project however
20:52 🔗 Honno I could use an opt-out system, where devs can contact me to remove their game from this listing, but then I can't really use torrents because they would be variable to be redundant
20:52 🔗 Frogging oh man the geocities torrent died? Where is the data now then?
20:53 🔗 JW_work Frogging: in lots of places — neocities (AFAIK), on IA (I am pretty certain), likely many other places too.
20:53 🔗 JW_work Just none of them happen to be seeding the original format of the torrent
20:54 🔗 Frogging ah
20:54 🔗 xmc https://archive.org/details/2009-archiveteam-geocities-part1
20:57 🔗 Frogging thanks
21:08 🔗 bsmith093 HCross it's working now
21:10 🔗 RichardG has quit IRC (Ping timeout: 260 seconds)
21:16 🔗 schbirid has quit IRC (Quit: Leaving)
21:26 🔗 RichardG has joined #archiveteam-bs
21:36 🔗 xXx_ndidd has joined #archiveteam-bs
21:43 🔗 ndiddy has quit IRC (Read error: Operation timed out)
21:48 🔗 xXx_ndidd is now known as ndiddy
22:04 🔗 godane SketchCow: kpfa is up to 2012-06-30
22:07 🔗 SketchCow Thank you!
22:07 🔗 SketchCow There's a redone one
22:08 🔗 SketchCow (Geocities torrent)
22:08 🔗 SketchCow From Dragan.
22:09 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
22:12 🔗 godane i'm starting to upload The Jim Rome Show: https://archive.org/details/The_Jim_Rome_Show_Podcast-2005-01-03
22:12 🔗 RichardG has joined #archiveteam-bs
22:25 🔗 RichardG has quit IRC (Ping timeout: 244 seconds)
22:31 🔗 Honno has quit IRC (Ping timeout: 492 seconds)
22:33 🔗 dashcloud Microguru: there are captcha solving plugins & services- check out the list from plowshare: https://github.com/mcrapet/plowshare/blob/master/docs/plowdown.1
22:33 🔗 RichardG has joined #archiveteam-bs
23:38 🔗 Start has joined #archiveteam-bs
23:54 🔗 HCross phuzion, thanks. We've jumped back to an older version and it seems to be working
23:56 🔗 JW_work note to self (or SketchCow) — the last comment on http://ascii.textfiles.com/archives/875 is spam.
23:58 🔗 RedType_ has left

irclogger-viewer