#archiveteam-bs 2014-12-10,Wed

↑back Search

Time Nickname Message
00:02 πŸ”— www2 has quit IRC (Read error: Operation timed out)
00:03 πŸ”— GLaDOS has quit IRC (Ping timeout: 272 seconds)
00:04 πŸ”— GLaDOS has joined #archiveteam-bs
00:04 πŸ”— swebb sets mode: +o GLaDOS
00:15 πŸ”— cbb has joined #archiveteam-bs
00:17 πŸ”— www2 has joined #archiveteam-bs
00:26 πŸ”— cbb2 has joined #archiveteam-bs
00:27 πŸ”— cbb has quit IRC (Ping timeout: 265 seconds)
00:27 πŸ”— cbb2 is now known as cbb
00:29 πŸ”— www2 has quit IRC (Read error: Operation timed out)
00:31 πŸ”— kadercavd has joined #archiveteam-bs
00:31 πŸ”— kadercavd http://strawpoll.me/3100584/r bana oy ver seninle sex sohbeti edicem sΓΆz
00:32 πŸ”— chfoo sets mode: +b *!*KaderCavd@213.74.159.*
00:32 πŸ”— kadercavd was kicked by xmc (spammer)
00:32 πŸ”— yipdw TAG TEAM
00:33 πŸ”— xmc sets mode: +o yipdw
00:33 πŸ”— xmc TAG YOU'RE IT
00:33 πŸ”— xmc 
00:33 πŸ”— xmc wow lag
00:33 πŸ”— yipdw sets mode: +o xmc
00:33 πŸ”— yipdw wait are we playing no tagbacks
00:34 πŸ”— xmc no lagbacks
00:35 πŸ”— yipdw if my students start to use that word I will be very D:
00:36 πŸ”— yipdw of course I write that and use "very D:" at the same time so haha fuck my neologism hypocrisy
00:46 πŸ”— www2 has joined #archiveteam-bs
01:26 πŸ”— www2 has quit IRC (Read error: Operation timed out)
01:30 πŸ”— primus104 has quit IRC (Leaving.)
01:31 πŸ”— mistym has quit IRC (Leaving...)
01:34 πŸ”— GLaDOS has quit IRC (Ping timeout: 272 seconds)
01:34 πŸ”— ersi has quit IRC (Read error: Operation timed out)
01:34 πŸ”— GLaDOS has joined #archiveteam-bs
01:34 πŸ”— swebb sets mode: +o GLaDOS
01:35 πŸ”— ersi has joined #archiveteam-bs
01:35 πŸ”— swebb sets mode: +o ersi
01:49 πŸ”— mistym has joined #archiveteam-bs
01:54 πŸ”— Lord_Nigh has quit IRC (Read error: Operation timed out)
01:56 πŸ”— cbb has quit IRC (Quit: cbb)
01:57 πŸ”— Lord_Nigh has joined #archiveteam-bs
02:09 πŸ”— LordNigh2 has joined #archiveteam-bs
02:11 πŸ”— Lord_Nigh has quit IRC (Ping timeout: 272 seconds)
02:11 πŸ”— LordNigh2 is now known as Lord_Nigh
02:35 πŸ”— Nertsy has joined #archiveteam-bs
02:37 πŸ”— chfoo has quit IRC (Remote host closed the connection)
02:55 πŸ”— Nertsy has quit IRC (Quit: Leaving)
02:56 πŸ”— chfoo has joined #archiveteam-bs
03:01 πŸ”— Nertsy has joined #archiveteam-bs
03:25 πŸ”— dx has quit IRC (Read error: Operation timed out)
03:41 πŸ”— hashtag has joined #archiveteam-bs
03:42 πŸ”— hashtag has left
03:53 πŸ”— godane uploaded: https://archive.org/details/G4_Icons_S01E01
04:11 πŸ”— RainbowCo has joined #archiveteam-bs
04:38 πŸ”— Lord_Nigh has quit IRC (Read error: Connection reset by peer)
04:39 πŸ”— Lord_Nigh has joined #archiveteam-bs
04:47 πŸ”— dx has joined #archiveteam-bs
04:58 πŸ”— aaaaaaaaa has quit IRC (Leaving)
05:14 πŸ”— rejon has joined #archiveteam-bs
05:58 πŸ”— Start is now known as StartAway
06:05 πŸ”— dx has quit IRC (Ping timeout: 369 seconds)
06:18 πŸ”— dx has joined #archiveteam-bs
06:56 πŸ”— Ctrl-S are there any projects to archive tumblr going on?
06:57 πŸ”— Ctrl-S I've got a script that sort of does it for up to a few hundred accounts
07:01 πŸ”— yipdw no because tumblr is massive and nobody is willing to pay for the cost of storing it
07:01 πŸ”— yipdw instead we grab individual tumblrs on a special-case basis
07:02 πŸ”— Ctrl-S okay
07:03 πŸ”— yipdw http://archive.fart.website/archivebot/viewer/?q=tumblr is a partial list
07:03 πŸ”— yipdw note that a significant portion of those are shallow grabs that target specific posts
07:09 πŸ”— Ctrl-S is there a standard method for scraping tumblr stuff you guys use?
07:10 πŸ”— yipdw #archivebot mostly
08:10 πŸ”— SketchCow There's a great statement by brokep about how unhappy he is where The Pirate Bay has gone
08:10 πŸ”— SketchCow (Not the fact it disappeared for the moment - he actually approves of that.)
08:10 πŸ”— SketchCow He thinks that it got handed down and handed down until it hit the lowest common denominator, and now it was just ads and stunts.
08:14 πŸ”— Ctrl-S what will replace the pirate bay?
08:14 πŸ”— Ctrl-S libre-piratebay?
08:17 πŸ”— mistym has quit IRC (Remote host closed the connection)
08:19 πŸ”— brayden has quit IRC (Read error: Operation timed out)
08:19 πŸ”— SketchCow Oh god who knows
08:21 πŸ”— godane i think a mesh network that works sort of like twitter/reddit/facebook is in order
08:21 πŸ”— godane i have been thinking of this for awhile
08:21 πŸ”— godane my idea is git-like mesh network
08:22 πŸ”— Ctrl-S main concern is spammers of various sorts. MPAA would need to be unable to break network just by setting up a few K VM instances
08:22 πŸ”— Lord_Nigh SketchCow: theres actually thepiratebay.ee which seems to be run by someone else (possibly as a honeypot since https doesn' work)?
08:22 πŸ”— Lord_Nigh and is still up
08:22 πŸ”— SketchCow It's down
08:22 πŸ”— Lord_Nigh .ee is?
08:23 πŸ”— Lord_Nigh http://thepiratebay.ee/recent looked up to me a few minutes ago
08:23 πŸ”— godane where the people you follow on a twitter-like app would be mirrored to your phone/desktop
08:24 πŸ”— Lord_Nigh but given that it doesn't appear to be run by the main tpb 'team' i can't vouch for the accuracy of the torrents there. then again, neither could tpb
08:24 πŸ”— Lord_Nigh is it worth scraping that?
08:25 πŸ”— Lord_Nigh (i'm guessing yes)
08:26 πŸ”— Lord_Nigh it lookslike it might have a copy of the db up until < 6 hours before the main site was taken down
08:26 πŸ”— Lord_Nigh since i see nothing newer than 12/09 10:32
08:28 πŸ”— Lord_Nigh if that's the case, then .ee should be scraped as quickly as we can manage it
08:28 πŸ”— Lord_Nigh starting with id 10,000,000 and going up
08:28 πŸ”— Lord_Nigh since we have everything from 9,999,999 and down
08:29 πŸ”— Lord_Nigh from other sources
08:29 πŸ”— Lord_Nigh main discussion is in #yarharfiddlededee
08:30 πŸ”— midas it isnt really down anyway, the police grabbed a loadbalancer
08:30 πŸ”— midas and a dns server
08:31 πŸ”— Lord_Nigh so... maybe .ee is actually the 'real thing'? that doesn't make sense, since .ee has no https, and .ee has that weird $5-per-year popup when trying to get magnet links (which you can enter any 6 digit numerical password to and it will work)
08:31 πŸ”— midas nah
08:31 πŸ”— Lord_Nigh (696969 works)
08:31 πŸ”— midas ee can be anything, im not sure
08:31 πŸ”— midas the dns server runs the internal dns traffic
08:31 πŸ”— Lord_Nigh ee seems to follow the same numbering tpb did
08:34 πŸ”— Lord_Nigh .. and now .ee is down
08:34 πŸ”— Ctrl-S the domain name or ip or both?
08:34 πŸ”— Lord_Nigh no, its not down
08:34 πŸ”— Lord_Nigh its ... unstable
08:34 πŸ”— Lord_Nigh random 502 errors
08:34 πŸ”— Lord_Nigh seems other people have found it
08:35 πŸ”— Lord_Nigh the ids DO exactly match the post-10000000 ids from the actual .se site
08:35 πŸ”— Ctrl-S can i ask for help setting up the warrior thing here?
08:35 πŸ”— Lord_Nigh so i think scraping .ee is a VERY good idea
08:37 πŸ”— arkiver All metro newspapers of brazil-brasilia of 2013 uploaded
08:37 πŸ”— Lord_Nigh its extremely sad that the .ee site, which has its origins (and may still technically be) a mirror of the original intended for scamming and phishing, may be the last running copy of all magnet ids after 10,000,000
08:39 πŸ”— Lord_Nigh the magnets are definitely valid!
08:39 πŸ”— Ctrl-S when I try to set the rate limit using the command from the wiki I get an error. "Syntax error: Invalid parameter '--name'"
08:40 πŸ”— Lord_Nigh but: it is missing the comments from thepiratebay.se
08:40 πŸ”— Lord_Nigh all of them
08:40 πŸ”— Lord_Nigh afaict
08:41 πŸ”— Lord_Nigh so it itself is a skimmed copy
08:41 πŸ”— Lord_Nigh well.... can we skim the skimmed copy?
08:43 πŸ”— Lord_Nigh we could also troll google cache until it comes down for the .se site and get comments etc
08:43 πŸ”— Lord_Nigh that's maybe a week tops
08:45 πŸ”— primus104 has joined #archiveteam-bs
08:55 πŸ”— joepie91 this is amazing stuff: http://vimeo.com/12672088
09:41 πŸ”— ivan has joined #archiveteam-bs
09:44 πŸ”— ivan I have been rsync'ing gentoo's distfiles very frequently for two years without deleting anything that they remove, but it is of low value to me and I don't have fast upstream to dump the 500GB+ somewhere
09:45 πŸ”— ivan if you want something from it, let me know before I delete it
09:45 πŸ”— joepie91 ivan: wait, what are distfiles?
09:46 πŸ”— ivan all the tarballs that the ebuilds grab
09:46 πŸ”— joepie91 are these available elsewhere in this format?
09:46 πŸ”— ivan old stuff gets removed from distfiles
09:46 πŸ”— joepie91 (historically)
09:46 πŸ”— ivan presumably 99.9% of it is in git repos elsewhere
09:47 πŸ”— joepie91 hrm. are they github-generated tarballs?
09:47 πŸ”— joepie91 or actual releases?
09:47 πŸ”— ivan they're mostly official releases
09:47 πŸ”— joepie91 any chance you can upload them over a long period of time?
09:47 πŸ”— joepie91 (I probably have an rsync target for it)
09:48 πŸ”— ivan no but maybe I can mail a drive to someone in the US
09:48 πŸ”— * joepie91 is not in the US
09:49 πŸ”— Ctrl-S what sort of content is this?
09:50 πŸ”— joepie91 Ctrl-S: basically, historical (source) releases of software
09:50 πŸ”— joepie91 definitely an archival value to it imo
09:50 πŸ”— Ctrl-S could you just upload the differences between versions?
09:50 πŸ”— joepie91 that's... actually not a bad idea
09:50 πŸ”— Ctrl-S sort of like wikis do
09:50 πŸ”— joepie91 ivan: are you aware of any methods for having a 'source' file and having delta'd "derived" files?
09:51 πŸ”— Ctrl-S every few hundred versions you upload the full version, and between those just the differences
09:51 πŸ”— joepie91 that seems like it could work here - take the first release as base, then delta every release after it
09:51 πŸ”— ivan I've programmed bizarre compression schemes like this before and it's not worth it
09:51 πŸ”— joepie91 I know that it's technically possible, but no idea if it already exists for this particular usecase
09:51 πŸ”— ivan I'm happy to pay the 6 bucks to mail it
09:51 πŸ”— joepie91 ivan: not so much compression, as diff'ing :P
09:51 πŸ”— joepie91 should be extremely efficient for this kind of data
09:52 πŸ”— joepie91 otoh
09:52 πŸ”— Ctrl-S is there a tool that does it?
09:52 πŸ”— joepie91 you could accomplish a near-identical result by just having one tarball per piece of software and having each release (uncompressed) in its own directory
09:52 πŸ”— joepie91 and using a compression format with a per-archive dictionary
09:53 πŸ”— joepie91 ivan: problem is mostly a mailing target :P
09:53 πŸ”— joepie91 I mean, unless you're planning on mailing it to NL... heh
09:54 πŸ”— ivan https://ludios.org/tmp/gentoo-distfiles.txt
09:55 πŸ”— ivan 37MB beware of browser crash
09:57 πŸ”— joepie91 that's a lot of packages :P
09:57 πŸ”— joepie91 ivan: should probably ask again in a few hours, when US-ians wake up
09:58 πŸ”— joepie91 (or ship it to NL)
09:58 πŸ”— ivan maybe SketchCow will take it
09:58 πŸ”— Void_ is this like all gentoo files ever
09:59 πŸ”— * joepie91 throws `sort` at it
09:59 πŸ”— * joepie91 watches it eat a core
10:00 πŸ”— joepie91 Void_: about 2 years, it seems
10:03 πŸ”— espes__ throw it in a git repo and let it handle the delta compression?
10:03 πŸ”— ivan that doesn't work for compressed tarballs
10:07 πŸ”— godane uploaded: https://archive.org/details/news.kbs.co.kr-search-news-code-1-to-10000-20141207
10:08 πŸ”— godane that has South Korean news from 1999-01-01 to 1999-02-21
10:10 πŸ”— godane that ups it by about 10k: https://web.archive.org/web/*/http://news.kbs.co.kr/news/NewsView.do?SEARCH_NEWS_CODE=*
10:10 πŸ”— godane only 2872 urls from those url types
10:10 πŸ”— brayden has joined #archiveteam-bs
10:55 πŸ”— schbirid has joined #archiveteam-bs
11:15 πŸ”— www2 has joined #archiveteam-bs
11:45 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
11:50 πŸ”— www2 has quit IRC (Read error: Operation timed out)
11:55 πŸ”— dx has quit IRC (Ping timeout: 265 seconds)
11:58 πŸ”— dx has joined #archiveteam-bs
12:06 πŸ”— Kadercavd has joined #archiveteam-bs
12:06 πŸ”— Kadercavd I swear to love Promise me here vote
12:06 πŸ”— Kadercavd Β Β Β  My name is Mark Bass http://strawpoll.me/3100584 Vote
12:06 πŸ”— midas no.
12:06 πŸ”— Kadercavd has quit IRC (Client Quit)
12:07 πŸ”— Kadercavd has joined #archiveteam-bs
12:08 πŸ”— midas NO.
12:08 πŸ”— Kadercavd has quit IRC (Client Quit)
12:12 πŸ”— joepie91 lol.
12:29 πŸ”— www2 has joined #archiveteam-bs
12:32 πŸ”— joepie91 I.. wht
12:32 πŸ”— joepie91 what *
12:32 πŸ”— joepie91 http://m.lg.com/ph/inside-lg/christmas-beat
12:33 πŸ”— joepie91 apparently LG is using PDFy now?
12:35 πŸ”— schbirid oh god please goatse them
12:53 πŸ”— dashcloud isn't that a good thing that LG's using it? we'll have all of their manuals backed up to IA automatically then
13:14 πŸ”— midas joepie91: how much bandwidth is pdfy using nowadays?
13:18 πŸ”— sirkov has quit IRC (Ping timeout: 370 seconds)
13:20 πŸ”— sirkov has joined #archiveteam-bs
13:48 πŸ”— sankin has joined #archiveteam-bs
13:53 πŸ”— sankin has quit IRC (Client Quit)
14:04 πŸ”— sankin has joined #archiveteam-bs
14:24 πŸ”— lrkj has quit IRC (Ping timeout: 612 seconds)
15:38 πŸ”— StartAway has quit IRC (Read error: Operation timed out)
15:50 πŸ”— mistym has joined #archiveteam-bs
15:51 πŸ”— mistym has quit IRC (Remote host closed the connection)
15:57 πŸ”— BiggieJo1 has quit IRC (Read error: Connection reset by peer)
15:57 πŸ”— Nertsy has quit IRC (Quit: Nertsy)
16:07 πŸ”— Nertsy has joined #archiveteam-bs
16:13 πŸ”— Start has joined #archiveteam-bs
16:18 πŸ”— aaaaaaaaa has joined #archiveteam-bs
16:58 πŸ”— Start has quit IRC (Read error: No route to host)
16:58 πŸ”— dx has quit IRC (Ping timeout: 246 seconds)
17:02 πŸ”— Start has joined #archiveteam-bs
17:02 πŸ”— dx has joined #archiveteam-bs
17:15 πŸ”— mistym has joined #archiveteam-bs
17:40 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
17:49 πŸ”— GLaDOS has quit IRC (Ping timeout: 272 seconds)
17:50 πŸ”— GLaDOS has joined #archiveteam-bs
17:50 πŸ”— swebb sets mode: +o GLaDOS
18:03 πŸ”— mistym has quit IRC (Remote host closed the connection)
18:04 πŸ”— mistym has joined #archiveteam-bs
18:27 πŸ”— joepie91 midas: 1.17TB last month
18:27 πŸ”— joepie91 peaked at 325mbps yesterday
18:27 πŸ”— joepie91 dashcloud: heheh
18:29 πŸ”— mistym has quit IRC (Remote host closed the connection)
18:30 πŸ”— mistym has joined #archiveteam-bs
18:31 πŸ”— brayden has quit IRC (Ping timeout: 607 seconds)
18:52 πŸ”— Start has joined #archiveteam-bs
18:58 πŸ”— rejon has quit IRC (Ping timeout: 480 seconds)
19:00 πŸ”— www2 has quit IRC (Ping timeout: 335 seconds)
19:01 πŸ”— SadDM Has anybody here looked at Google's newspaper archive?
19:01 πŸ”— SadDM I'm looking for way to download parts of a page without crawling through the rendered DOM and figuring out what images I need
19:02 πŸ”— SadDM *And* then re-assembling them.
19:17 πŸ”— ete_ has joined #archiveteam-bs
19:23 πŸ”— Atluxity /buffer ccc
19:23 πŸ”— Atluxity freaking whitespaces
19:41 πŸ”— aaaaaaaa_ has joined #archiveteam-bs
19:45 πŸ”— phuzion has quit IRC (Read error: Operation timed out)
19:47 πŸ”— xtr-201 has quit IRC (Read error: Operation timed out)
19:47 πŸ”— aaaaaaaaa has quit IRC (Read error: Operation timed out)
19:47 πŸ”— aaaaaaaa_ has quit IRC (Client Quit)
19:47 πŸ”— Start has quit IRC (Read error: Operation timed out)
19:47 πŸ”— aaaaaaaa_ has joined #archiveteam-bs
19:47 πŸ”— phuzion has joined #archiveteam-bs
19:48 πŸ”— xtr-201 has joined #archiveteam-bs
19:57 πŸ”— aaaaaaaa_ has quit IRC (Ping timeout: 480 seconds)
20:02 πŸ”— BlueMaxim has joined #archiveteam-bs
20:05 πŸ”— mistym_ has joined #archiveteam-bs
20:28 πŸ”— logchfoo starts logging #archiveteam-bs at Wed Dec 10 20:28:35 2014
20:28 πŸ”— logchfoo has joined #archiveteam-bs
20:42 πŸ”— Arkiver2 is now known as arkiver
20:48 πŸ”— brayden has joined #archiveteam-bs
20:58 πŸ”— kyan has quit IRC (Read error: Connection reset by peer)
21:30 πŸ”— kyan_ has joined #archiveteam-bs
21:33 πŸ”— www2 has joined #archiveteam-bs
21:36 πŸ”— APerti has joined #archiveteam-bs
21:39 πŸ”— APerti_ has quit IRC (Ping timeout: 370 seconds)
21:49 πŸ”— Start has joined #archiveteam-bs
21:58 πŸ”— schbirid has quit IRC (Leaving)
22:06 πŸ”— * Void_ uses a dirty scanner to poke saddm
22:06 πŸ”— Void_ huh
22:08 πŸ”— mistym_ has quit IRC (Quit: Leaving...)
22:09 πŸ”— ivan- is now known as ivan`-
22:26 πŸ”— Start has quit IRC (Read error: Operation timed out)
22:34 πŸ”— SN4T14_ has joined #archiveteam-bs
22:39 πŸ”— SN4T14 has quit IRC (Ping timeout: 369 seconds)
23:19 πŸ”— mistym has joined #archiveteam-bs
23:23 πŸ”— dashcloud has quit IRC (Ping timeout: 265 seconds)
23:23 πŸ”— nico has quit IRC (Ping timeout: 265 seconds)
23:24 πŸ”— Insomnia1 has quit IRC (Ping timeout: 265 seconds)
23:24 πŸ”— Insomnia_ has joined #archiveteam-bs
23:24 πŸ”— wm_ has quit IRC (Ping timeout: 265 seconds)
23:28 πŸ”— dashcloud has joined #archiveteam-bs
23:39 πŸ”— nico has joined #archiveteam-bs
23:46 πŸ”— wm_ has joined #archiveteam-bs
23:46 πŸ”— Start has joined #archiveteam-bs

irclogger-viewer