#archiveteam 2013-08-17,Sat

↑back Search

Time Nickname Message
00:13 🔗 omf_ I am still cleaning up the data but it looks like I could have up to 152,510 yahoo groups
00:14 🔗 Baljem nice
00:15 🔗 Baljem do we already have infrastructure/any other projects where we do a historical grab and then try to keep it up to date?
00:15 🔗 omf_ fanfiction.net
00:15 🔗 Baljem (hmm, URLteam I guess needs to do something like that, so a solved problem I guess)
00:15 🔗 omf_ puush is doing that as well
00:15 🔗 Baljem cool
00:49 🔗 balrog omf_: did you get that downloader thing working yet?
00:49 🔗 balrog I'd like to test it at some point
00:52 🔗 balrog omf_: you're also indexing group names, right?
00:52 🔗 omf_ now that I have more than a handful of groups I can start working on it
00:52 🔗 omf_ ~152,510 yahoo groups
00:52 🔗 balrog wow
00:52 🔗 balrog and all of them require joining to access the "interesting" stuff.
00:52 🔗 balrog even so, archiving just messages isn't useless
02:21 🔗 omf_ everyone and balrog, this is what it feels like working on that yahoo groups code --> http://i.imgur.com/OVvEKxh.gif
02:24 🔗 balrog omf_: are you suggesting that it should be rewritten in (say) python ?
02:26 🔗 omf_ No
02:27 🔗 omf_ I have to fully understand how this thing works before a determination like that can be made
02:28 🔗 balrog I see :/
02:28 🔗 balrog and yes this thing might as well be obfuscated
02:31 🔗 omf_ maybe to the untrained eye. To me it just reeks of beginner coder
02:31 🔗 omf_ Too verbose, no language idioms, poor comments
02:31 🔗 omf_ lack of understanding of the language ecosystem
02:36 🔗 balrog ah :/
02:37 🔗 balrog and yeah... I'm not a perl programmer
14:24 🔗 balrog omf_: ya there? just got an email from the maintainer of that perl script
15:00 🔗 ersi Anyone grabbing the new Wikileaks insurance files? https://www.facebook.com/wikileaks/posts/561645433870573 http://wlstorage.net/torrent/wlinsurance-20130815-A.aes256.torrent (3.6GB) http://wlstorage.net/torrent/wlinsurance-20130815-B.aes256.torrent (49GB) http://wlstorage.net/torrent/wlinsurance-20130815-C.aes256.torrent (349GB)
15:33 🔗 omf_ here now balrog
15:45 🔗 Tephra ersi: will check if I ahhve disk space left
16:03 🔗 Tephra ersi: fisrt torrent going down, ~20 min left will get the others after it
16:04 🔗 omf_ yeah I don't have the space at present for 349gb on my server :(
16:08 🔗 Tephra found space in one of my externals (need to get more archiving media) so i'll grab it
16:08 🔗 Tephra wonder if my isp will notice :p
16:20 🔗 Tephra first down 2 to go
18:44 🔗 balrog omf_: http://sourceforge.net/p/grabyahoogroup/code/128/tree//trunk/yahoo_group/grabyahoogroup.pl?diff=50ee4b9834309d7d9055735b:127
18:54 🔗 omf_ did that fix the login issue with new tokens
18:55 🔗 godane http://tech.slashdot.org/story/13/08/17/1747228/yahoo-deletes-journalists-pre-paid-legacy-site-after-suicide?utm_source=rss1.0mainlinkanon&utm_medium=feed
18:57 🔗 DFJustin jesus christ
19:00 🔗 ersi Not suprising
19:03 🔗 omf_ I ran out of outrage :(
19:05 🔗 godane so i'm mirroring the sportesinreview.com
19:05 🔗 godane i was have trouble doing a normal mirror
19:06 🔗 godane so i have to make a index.txt file with pages that should exist
19:06 🔗 godane like up to ?p=8679 urls
19:07 🔗 godane and cat is up to 60
19:08 🔗 godane i'm glad that i could help mirror that blog
19:10 🔗 Tephra godane: i mirrored it
19:10 🔗 godane oh
19:11 🔗 godane link please?
19:11 🔗 Tephra godane: looking for it now, hope that I got the whole site
19:12 🔗 Tephra https://archive.org/details/Sportsinreview20130816
19:14 🔗 godane i'm going to download it to see the commands
19:14 🔗 godane used to mirror it
19:14 🔗 Tephra used http://archiveteam.org/index.php?title=User:Djsmiley2k
19:16 🔗 Tephra so yahoo used yanked it off....
19:21 🔗 Tephra *just
19:47 🔗 Tephra godane: does it look good?
20:01 🔗 godane i think its good
20:02 🔗 balrog omf_: nope.
20:03 🔗 balrog to get "tokens" what you do is programmatically log into yahoo then save the cookies
20:14 🔗 Tephra godane: had trouble starting the mirroring at the index so had to start it one step deep and only then wget would grab the whole site
20:22 🔗 godane ok
21:14 🔗 qwebirc20 Starting up automated wget's and uploads of various horticulture and gardening forums, particularly GardenWeb.com which has very little coverage of its actual posts (and member photos of their plants) in the WayBack Machine.
21:15 🔗 DFJustin save the plants
21:15 🔗 qwebirc20 https://archive.org/details/project_flowerpot_-_forums.gardenweb.com_-_california_gardening_-_2013-08-17/
21:16 🔗 Asparagir Er, that was me, with a messed up IRC nick. :-/
21:19 🔗 Asparagir There's an amazing amount of information about gardening online, freely shared on many long-running message boards, and it would be a shame not to preserve it.
21:19 🔗 SmileyG indeed.
21:28 🔗 SketchCow I called it (suicide leads to delete)
21:32 🔗 winr4r SketchCow: oh hi
21:35 🔗 Tephra SketchCow: we have his website that he put up prior to suicide http://archive.org/details/martinmanleylifeanddeath.com-20130816
21:36 🔗 Tephra SketchCow: his sports blog https://archive.org/details/Sportsinreview20130816
21:36 🔗 SketchCow Good.
21:36 🔗 SketchCow And his suicide page? Right?
21:37 🔗 Tephra yes
21:37 🔗 SketchCow ha ha
21:37 🔗 SketchCow Archive Team
21:37 🔗 Tephra both the zeroshare and the original
21:37 🔗 Tephra zeroshare mirror
21:37 🔗 SketchCow https://www.youtube.com/watch?v=dsboBHlzwcs
21:37 🔗 Tephra and i think antomatic got the blog that he had prior to sportsinreview
21:40 🔗 Tephra have looked for a twitter account but couldn't find one but seems we got his digital legacy archived!
21:42 🔗 SketchCow I won't crow about this one in case the family's behind the takedown.
21:43 🔗 ersi That seems like a good idea

irclogger-viewer