#archiveteam-bs 2016-09-11,Sun

↑back Search

Time Nickname Message
00:07 🔗 BlueMaxim has joined #archiveteam-bs
00:33 🔗 Start has quit IRC (Quit: Disconnected.)
01:43 🔗 ranma xmc: here's some more juiciness from the kittyforums: http://www.howardforums.com/showthread.php/1864458-Verizon-MVNO-Puppy-Wirelss-Discussion?p=15935702#post15935702
01:43 🔗 ranma let me see if i can find the beginning of it all
01:44 🔗 xmc hum
01:45 🔗 ranma http://www.howardforums.com/showthread.php/1857887-Customer-Service-with-the-new-Page-Plus-support-center?p=15826603#post15826603
01:45 🔗 xmc my goodness that's a long fucking thread
01:45 🔗 ranma ignore that thread
01:45 🔗 ranma but the second is a history on i think the owner of kittywireless
01:45 🔗 xmc fair
01:46 🔗 xmc ugh, i don't care enough to read up on this shit
01:46 🔗 ranma well, from someone that sounds like they have a bit of an axe to gring
01:46 🔗 xmc not now, and maybe not ever
01:46 🔗 ranma grind
01:46 🔗 xmc yea
01:46 🔗 ranma yep
01:46 🔗 ranma drama drama drama
01:46 🔗 ranma you sounded have intrigued about it one time :)
01:46 🔗 xmc mhm
01:46 🔗 ranma or was it just about it shutting down?
01:50 🔗 ranma just 10 more seconds of your time: you should see the eyecancer (and dodginess) the Prepaid PIN reseller site was xD
01:50 🔗 ranma https://web.archive.org/web/20140104141402/http://kittywireless.com/
01:50 🔗 ranma @ xmc
01:50 🔗 xmc i mean, i'm interested in archiving these things
01:50 🔗 xmc but not so much reading them :)
01:50 🔗 ranma ah
01:50 🔗 xmc yow
01:51 🔗 ranma yep
01:52 🔗 ranma owner is/was quite a character (was in lieu of company shutting down)
02:11 🔗 kristian_ has quit IRC (Remote host closed the connection)
02:18 🔗 Start has joined #archiveteam-bs
03:52 🔗 FalconK Frogging: re. archivebot: it looks like it makes 3 synchronous calls for every log line, at pipeline/archivebot/control.py:273
03:53 🔗 FalconK hincrby, zadd, and publish
04:00 🔗 FalconK we could probably use twisted, though I am concerned it will create a great profusion of waiting tasks
04:09 🔗 FalconK actually it needs to have just one thread per job for sending logs, that has a queue in it
04:11 🔗 yipdw FalconK: threading gets tricky because it introduces the need for a supervisor
04:12 🔗 yipdw I wanted a way to integrate with the wpull event loop, which is what grab-site does
04:12 🔗 FalconK this seems to be the simplest possible case of it
04:12 🔗 yipdw then it was "well just use grab-site, all you need to do is switch the communication protocol to websockets" but then I never looked at it again
04:12 🔗 FalconK I'm not talking about anything at all except sending logs to redis asynchronously
04:13 🔗 FalconK so a single daemonic thread should do the trick
04:13 🔗 FalconK also the existing code just does pass on ConnectionError so it's not especially reliable as it stands anyway
04:14 🔗 yipdw it'll drop log lines, but there's no separate thread to monitor
04:14 🔗 yipdw the settings listener thread on the other hand does occasionally die for reasons I haven't been able to determine
04:16 🔗 FalconK yeah, I have no idea either
04:16 🔗 yipdw something else I was toying around with was using multiprocess to start a redis log shipper process
04:16 🔗 yipdw as multiprocessing seems more robust in a Python program
04:17 🔗 yipdw but I never got around to looking into it further
04:17 🔗 yipdw if a separate thread works, though, that'd be good
04:17 🔗 FalconK it's certainly easier!
04:17 🔗 FalconK I'll write it up right now, make a new pipeline, and run some unimportant job on it
04:18 🔗 yipdw run some long unimportant job
04:18 🔗 FalconK I could re-do infinity.disney.com
04:18 🔗 FalconK or something
04:18 🔗 yipdw I only noticed the settings listener lockups on like month-long jobs, which is not what archivebot was ever designed to do but that complaint is like Tim Berners-Lee complaining about porn on the internet
04:20 🔗 yipdw I'd like some sort of highly-accelerated failure test suite for this sort of stuff but I think to get that you need some idea of what the failure causes are
04:20 🔗 yipdw and I don't
04:21 🔗 yipdw chfoo has a huhhttp library that might be that, though
04:21 🔗 yipdw https://github.com/chfoo/huhhttp maybe
04:29 🔗 FalconK hey, is it a sin to share the same redis connection between threads?
04:30 🔗 FalconK maybe I need to mutex it or open a second one
04:30 🔗 yipdw it's fine
04:30 🔗 yipdw https://github.com/andymccurdy/redis-py#thread-safety
04:33 🔗 FalconK awesome!
04:33 🔗 FalconK this is a really simple change then
04:44 🔗 FalconK changes are at https://github.com/falconkirtaran/ArchiveBot
04:49 🔗 Frogging yipdw: would you prefer that ArchiveBot not be used for large jobs?
04:49 🔗 yipdw Frogging: well, I can say it was never built to handle gazillions of URLs
04:49 🔗 yipdw the README states this
04:50 🔗 yipdw but design considerations and actual use align approximately never so what I prefer is irrelevant
04:50 🔗 yipdw a
04:52 🔗 Frogging I see
04:53 🔗 yipdw i'm sure people will keep tweaking it to handle !a internet though
04:57 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:57 🔗 MrRadar !ig 5tnqxpy5c5isnrjrgixu3jgcy ^https?://forum\.teksyndicate\.com/.*/INBOX\.EXE$
05:02 🔗 tomwsmf_ has quit IRC (Ping timeout: 255 seconds)
05:04 🔗 Sk1d has joined #archiveteam-bs
05:20 🔗 ranma good enough reason to archive?
05:20 🔗 ranma "The US Secret Service censored YG's album Fuck Donald Trump. Some censored lyrics made it into this interview"
05:21 🔗 ranma or should that be left out of an !explain
05:21 🔗 Frogging sounds fine to me
05:49 🔗 joepie91 re-ask: what's up with teksyndicate?
05:52 🔗 joepie91 aha. https://www.reddit.com/r/TekSyndicate/comments/502aao/for_everyone_going_what_the_fuck_is_happening_on/
05:53 🔗 joepie91 http://www.twitlonger.com/show/n_1sp29to
06:14 🔗 Frogging well. this sounds like a bit of a shitstorm
06:14 🔗 Frogging (the Tek business)
06:15 🔗 Frogging sets mode: +o joepie91
06:48 🔗 joepie91 just a bit :)
07:10 🔗 metalcamp has joined #archiveteam-bs
07:38 🔗 schbirid has joined #archiveteam-bs
07:42 🔗 Genericen has joined #archiveteam-bs
07:45 🔗 ravetcofx has quit IRC (Ping timeout: 370 seconds)
07:48 🔗 Genericen has quit IRC (Quit: zzz)
08:08 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
08:10 🔗 RichardG has joined #archiveteam-bs
08:25 🔗 Honno has joined #archiveteam-bs
08:31 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
08:31 🔗 RichardG_ has joined #archiveteam-bs
09:06 🔗 kristian_ has joined #archiveteam-bs
09:53 🔗 brayden_ has joined #archiveteam-bs
09:53 🔗 swebb sets mode: +o brayden_
09:53 🔗 brayden has quit IRC (Read error: Connection reset by peer)
10:14 🔗 zerkalo has quit IRC (Ping timeout: 260 seconds)
10:19 🔗 zerkalo has joined #archiveteam-bs
11:21 🔗 VADemon has joined #archiveteam-bs
11:28 🔗 RichardG_ has quit IRC (Ping timeout: 370 seconds)
11:33 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:36 🔗 dashcloud has joined #archiveteam-bs
11:42 🔗 SketchCow I'm grabbing rave tapes because why not
11:43 🔗 SketchCow also, hiphop mixtape maniacs are helping me find missing ones
11:50 🔗 kristian_ has quit IRC (Quit: Leaving)
12:36 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:41 🔗 ranma can archivebot archive this? https://app.box.com/shared/8ch5r5nms1
12:42 🔗 ranma youtube user last active in 2008, supposedly dead
12:43 🔗 ranma "can...?" as in i have no idea how well it can deal with scripts
12:44 🔗 brayden_ has quit IRC (Read error: Operation timed out)
12:46 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:47 🔗 luckcolor i'll check ranma
12:49 🔗 dashcloud has joined #archiveteam-bs
12:50 🔗 brayden has joined #archiveteam-bs
12:50 🔗 swebb sets mode: +o brayden
12:52 🔗 ranma guess not?
12:52 🔗 ranma the archive should be about 48MB
12:52 🔗 ranma uploaded it to IA
12:53 🔗 ranma https://archive.org/details/Piano_201609
12:53 🔗 luckcolor ok
12:58 🔗 Genericen has joined #archiveteam-bs
13:04 🔗 RichardG has joined #archiveteam-bs
13:09 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:13 🔗 dashcloud has joined #archiveteam-bs
14:12 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
14:36 🔗 jspiros has joined #archiveteam-bs
14:40 🔗 kristian_ has joined #archiveteam-bs
16:05 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:09 🔗 jspiros has quit IRC (leaving)
16:09 🔗 dashcloud has joined #archiveteam-bs
16:13 🔗 jspiros has joined #archiveteam-bs
17:18 🔗 JesseW has joined #archiveteam-bs
18:10 🔗 VADemon has quit IRC (Quit: left4dead)
18:11 🔗 tomwsmf_ has joined #archiveteam-bs
18:14 🔗 VADemon has joined #archiveteam-bs
18:59 🔗 zenguy_pc has joined #archiveteam-bs
19:02 🔗 Matt_Lock has joined #archiveteam-bs
19:02 🔗 arkiver hi
19:02 🔗 Matt_Lock Hi again.
19:03 🔗 arkiver so when you go to the 'SHOW ALL' page on for example https://archive.org/details/archiveteam-fanfiction-warc-08 you get https://archive.org/download/archiveteam-fanfiction-warc-08
19:03 🔗 arkiver you can then have a look at the *.cdx.gz file, which contains a list of URLs that were saved in this WARC
19:03 🔗 arkiver https://archive.org/download/archiveteam-fanfiction-warc-08/archiveteam-fanfiction-warc-08.cdx.gz in this case
19:04 🔗 JesseW Matt_Lock: I was semi-involved in repackaging the text-only one.
19:04 🔗 Matt_Lock 187 megs... this is going to be a lot of downloading.
19:05 🔗 JesseW The repackaged form should be *somewhat* easier to find things in, although it's organized by fandom, not by author.
19:05 🔗 arkiver these URLs are also all in the wayback machine though
19:05 🔗 Matt_Lock I've looked at the text-only version. It is useful, but doesn't contain all the fics I want
19:05 🔗 Matt_Lock not wayback machine. Robots.txt is on the site
19:05 🔗 JesseW Matt_Lock: do you have specific URLs you are looking for?
19:06 🔗 Matt_Lock I have the url for the guy's profile page: https://www.fanfiction.net/u/1155973/MatrixExplosion
19:07 🔗 arkiver thanks
19:07 🔗 edsu has joined #archiveteam-bs
19:07 🔗 swebb sets mode: +o edsu
19:08 🔗 JesseW Matt_Lock: do you have titles of the stories you are looking for?
19:08 🔗 Matt_Lock https://www.fanfiction.net/s/9924634/
19:08 🔗 Matt_Lock Also this story
19:09 🔗 Matt_Lock titles are Unbound (by matrixExplosion), Time can't heal every pain (linked above), and anything else by matrixExplosion (if it exists)
19:09 🔗 Matt_Lock "Rise Of Naruto: Shinigami's Touch" was on the txt only archive, not the others
19:10 🔗 JesseW any idea about the dates they were written, and the dates they were deleted?
19:10 🔗 Matt_Lock I have it written somewhere
19:10 🔗 arkiver "I'm Gonna Be Hokage!"?
19:10 🔗 Matt_Lock one sec
19:10 🔗 Matt_Lock I have I'm gonna be hokage, thanks, though
19:11 🔗 schbirid has quit IRC (Quit: Leaving)
19:11 🔗 Matt_Lock Ah, here: this guy started writing Shinigami in 2008, and it was last updated in 2010. That fic for example, disappeared some time between Jul 15, 2013 and Jul 5, 2013, based on the dates of reviews
19:11 🔗 JesseW you may be able to use the .cdx.idx files (which are much smaller) to figure out which megawarc to look in
19:16 🔗 Matt_Lock Will do later, thanks. Computer's acting up right now.
19:16 🔗 Matt_Lock Thanks for the help.
19:17 🔗 Matt_Lock Hope this works out.
19:17 🔗 JesseW good luck -- if you do find copies, dump them on IA
19:17 🔗 Matt_Lock I'll leave asking how to do that for when (read: if) I do.
19:17 🔗 JesseW nods
19:18 🔗 JesseW at a minimum, you can dump it in a pastebin, then use web.archive.org/save/ to copy it into the Wayback Machine
19:19 🔗 Matt_Lock Got it.
19:23 🔗 Matt_Lock Actually, as it turns out... I have ~800 fics saved on my computer in epub and azw3 formats. 13 have been deleted. I *know* at least 1 of those 13 is on my computer, not on the text only archive. How do I add it to the archive?
19:28 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
19:29 🔗 JesseW has joined #archiveteam-bs
19:35 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
19:59 🔗 alembic has quit IRC (Ping timeout: 244 seconds)
20:01 🔗 alembic has joined #archiveteam-bs
20:46 🔗 Matt_Lock Is anybody still there who I was talking to? Because I've downloaded all 18 .cdx.idx files, and the guy's profile page doesn't seem to be listed in any of them, and nor are any of the fics that I know the names and//or IDs of. What's going on?
20:49 🔗 luckcolor it may be that they never were archived
20:50 🔗 luckcolor if the user urò isn't in any of the cdx then it means it wasn't archived
20:50 🔗 alembic has quit IRC (Read error: Connection reset by peer)
20:51 🔗 arkiver the idx files are partial
20:51 🔗 Matt_Lock Seems plausible. Is a 830 gigabyte "Scrape" supposed to be just some files or as many as possible?
20:51 🔗 arkiver if it's not in the idx files, it doesn't mean it's not archived
20:51 🔗 Matt_Lock Is there a way to check that doesn't involve downloading 830 gigs??
20:53 🔗 alembic has joined #archiveteam-bs
20:53 🔗 luckcolor arkiver: really?
20:53 🔗 luckcolor why are those partial
20:54 🔗 arkiver they *.cdx.gz files are full
20:54 🔗 luckcolor ah k
20:54 🔗 luckcolor Matt_Lock: try to download those then
20:55 🔗 HCross2 luckcolor: that's what he's trying to avoid
20:56 🔗 luckcolor ah i wasn't thinking that those where 800 gb
20:56 🔗 luckcolor then yeah
20:56 🔗 * luckcolor derped
20:57 🔗 Matt_Lock The cdx.gz files seem to be just(!) 180 ish megs each, so probably only 3 or 4 gigs total, assuming part 0 is reprasentative (I'm looking for files like archiveteam-fanfiction-warc-00.cdx.gz , right)?
20:58 🔗 luckcolor yeah
21:03 🔗 Matt_Lock Have to go now. Bye, and thanks for the help
21:03 🔗 Matt_Lock has quit IRC (Quit: Page closed)
21:09 🔗 luckcolor Np :P
21:27 🔗 metalcamp has quit IRC (Ping timeout: 506 seconds)
21:30 🔗 JesseW has joined #archiveteam-bs
21:37 🔗 Aranje has joined #archiveteam-bs
21:41 🔗 dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
21:42 🔗 dashcloud has joined #archiveteam-bs
21:46 🔗 tfgbd_znc has joined #archiveteam-bs
22:14 🔗 ravetcofx has joined #archiveteam-bs
22:26 🔗 VADemon has quit IRC (Quit: left4dead)
22:29 🔗 zenguy_pc has quit IRC (Excess Flood)
22:30 🔗 JesseW has quit IRC (Read error: Operation timed out)
22:31 🔗 Honno has quit IRC (Read error: Operation timed out)
22:32 🔗 zenguy_pc has joined #archiveteam-bs
22:37 🔗 Genericen has quit IRC (Remote host closed the connection)
22:56 🔗 zenguy_pc has quit IRC (Excess Flood)
22:58 🔗 zenguy_pc has joined #archiveteam-bs
23:04 🔗 zenguy_pc has quit IRC (Ping timeout: 255 seconds)
23:06 🔗 zenguy_pc has joined #archiveteam-bs
23:22 🔗 zenguy_pc has quit IRC (Ping timeout: 255 seconds)
23:23 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:26 🔗 JesseW has joined #archiveteam-bs
23:26 🔗 dashcloud has joined #archiveteam-bs
23:59 🔗 robink has quit IRC (Ping timeout: 246 seconds)

irclogger-viewer