| Time |
Nickname |
Message |
|
00:02
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
|
00:03
π
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
|
00:04
π
|
|
GLaDOS has joined #archiveteam-bs |
|
00:04
π
|
|
swebb sets mode: +o GLaDOS |
|
00:15
π
|
|
cbb has joined #archiveteam-bs |
|
00:17
π
|
|
www2 has joined #archiveteam-bs |
|
00:26
π
|
|
cbb2 has joined #archiveteam-bs |
|
00:27
π
|
|
cbb has quit IRC (Ping timeout: 265 seconds) |
|
00:27
π
|
|
cbb2 is now known as cbb |
|
00:29
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
|
00:31
π
|
|
kadercavd has joined #archiveteam-bs |
|
00:31
π
|
kadercavd |
http://strawpoll.me/3100584/r bana oy ver seninle sex sohbeti edicem sΓΆz |
|
00:32
π
|
|
chfoo sets mode: +b *!*KaderCavd@213.74.159.* |
|
00:32
π
|
|
kadercavd was kicked by xmc (spammer) |
|
00:32
π
|
yipdw |
TAG TEAM |
|
00:33
π
|
|
xmc sets mode: +o yipdw |
|
00:33
π
|
xmc |
TAG YOU'RE IT |
|
00:33
π
|
xmc |
|
|
00:33
π
|
xmc |
wow lag |
|
00:33
π
|
|
yipdw sets mode: +o xmc |
|
00:33
π
|
yipdw |
wait are we playing no tagbacks |
|
00:34
π
|
xmc |
no lagbacks |
|
00:35
π
|
yipdw |
if my students start to use that word I will be very D: |
|
00:36
π
|
yipdw |
of course I write that and use "very D:" at the same time so haha fuck my neologism hypocrisy |
|
00:46
π
|
|
www2 has joined #archiveteam-bs |
|
01:26
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
|
01:30
π
|
|
primus104 has quit IRC (Leaving.) |
|
01:31
π
|
|
mistym has quit IRC (Leaving...) |
|
01:34
π
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
|
01:34
π
|
|
ersi has quit IRC (Read error: Operation timed out) |
|
01:34
π
|
|
GLaDOS has joined #archiveteam-bs |
|
01:34
π
|
|
swebb sets mode: +o GLaDOS |
|
01:35
π
|
|
ersi has joined #archiveteam-bs |
|
01:35
π
|
|
swebb sets mode: +o ersi |
|
01:49
π
|
|
mistym has joined #archiveteam-bs |
|
01:54
π
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
|
01:56
π
|
|
cbb has quit IRC (Quit: cbb) |
|
01:57
π
|
|
Lord_Nigh has joined #archiveteam-bs |
|
02:09
π
|
|
LordNigh2 has joined #archiveteam-bs |
|
02:11
π
|
|
Lord_Nigh has quit IRC (Ping timeout: 272 seconds) |
|
02:11
π
|
|
LordNigh2 is now known as Lord_Nigh |
|
02:35
π
|
|
Nertsy has joined #archiveteam-bs |
|
02:37
π
|
|
chfoo has quit IRC (Remote host closed the connection) |
|
02:55
π
|
|
Nertsy has quit IRC (Quit: Leaving) |
|
02:56
π
|
|
chfoo has joined #archiveteam-bs |
|
03:01
π
|
|
Nertsy has joined #archiveteam-bs |
|
03:25
π
|
|
dx has quit IRC (Read error: Operation timed out) |
|
03:41
π
|
|
hashtag has joined #archiveteam-bs |
|
03:42
π
|
|
hashtag has left |
|
03:53
π
|
godane |
uploaded: https://archive.org/details/G4_Icons_S01E01 |
|
04:11
π
|
|
RainbowCo has joined #archiveteam-bs |
|
04:38
π
|
|
Lord_Nigh has quit IRC (Read error: Connection reset by peer) |
|
04:39
π
|
|
Lord_Nigh has joined #archiveteam-bs |
|
04:47
π
|
|
dx has joined #archiveteam-bs |
|
04:58
π
|
|
aaaaaaaaa has quit IRC (Leaving) |
|
05:14
π
|
|
rejon has joined #archiveteam-bs |
|
05:58
π
|
|
Start is now known as StartAway |
|
06:05
π
|
|
dx has quit IRC (Ping timeout: 369 seconds) |
|
06:18
π
|
|
dx has joined #archiveteam-bs |
|
06:56
π
|
Ctrl-S |
are there any projects to archive tumblr going on? |
|
06:57
π
|
Ctrl-S |
I've got a script that sort of does it for up to a few hundred accounts |
|
07:01
π
|
yipdw |
no because tumblr is massive and nobody is willing to pay for the cost of storing it |
|
07:01
π
|
yipdw |
instead we grab individual tumblrs on a special-case basis |
|
07:02
π
|
Ctrl-S |
okay |
|
07:03
π
|
yipdw |
http://archive.fart.website/archivebot/viewer/?q=tumblr is a partial list |
|
07:03
π
|
yipdw |
note that a significant portion of those are shallow grabs that target specific posts |
|
07:09
π
|
Ctrl-S |
is there a standard method for scraping tumblr stuff you guys use? |
|
07:10
π
|
yipdw |
#archivebot mostly |
|
08:10
π
|
SketchCow |
There's a great statement by brokep about how unhappy he is where The Pirate Bay has gone |
|
08:10
π
|
SketchCow |
(Not the fact it disappeared for the moment - he actually approves of that.) |
|
08:10
π
|
SketchCow |
He thinks that it got handed down and handed down until it hit the lowest common denominator, and now it was just ads and stunts. |
|
08:14
π
|
Ctrl-S |
what will replace the pirate bay? |
|
08:14
π
|
Ctrl-S |
libre-piratebay? |
|
08:17
π
|
|
mistym has quit IRC (Remote host closed the connection) |
|
08:19
π
|
|
brayden has quit IRC (Read error: Operation timed out) |
|
08:19
π
|
SketchCow |
Oh god who knows |
|
08:21
π
|
godane |
i think a mesh network that works sort of like twitter/reddit/facebook is in order |
|
08:21
π
|
godane |
i have been thinking of this for awhile |
|
08:21
π
|
godane |
my idea is git-like mesh network |
|
08:22
π
|
Ctrl-S |
main concern is spammers of various sorts. MPAA would need to be unable to break network just by setting up a few K VM instances |
|
08:22
π
|
Lord_Nigh |
SketchCow: theres actually thepiratebay.ee which seems to be run by someone else (possibly as a honeypot since https doesn' work)? |
|
08:22
π
|
Lord_Nigh |
and is still up |
|
08:22
π
|
SketchCow |
It's down |
|
08:22
π
|
Lord_Nigh |
.ee is? |
|
08:23
π
|
Lord_Nigh |
http://thepiratebay.ee/recent looked up to me a few minutes ago |
|
08:23
π
|
godane |
where the people you follow on a twitter-like app would be mirrored to your phone/desktop |
|
08:24
π
|
Lord_Nigh |
but given that it doesn't appear to be run by the main tpb 'team' i can't vouch for the accuracy of the torrents there. then again, neither could tpb |
|
08:24
π
|
Lord_Nigh |
is it worth scraping that? |
|
08:25
π
|
Lord_Nigh |
(i'm guessing yes) |
|
08:26
π
|
Lord_Nigh |
it lookslike it might have a copy of the db up until < 6 hours before the main site was taken down |
|
08:26
π
|
Lord_Nigh |
since i see nothing newer than 12/09 10:32 |
|
08:28
π
|
Lord_Nigh |
if that's the case, then .ee should be scraped as quickly as we can manage it |
|
08:28
π
|
Lord_Nigh |
starting with id 10,000,000 and going up |
|
08:28
π
|
Lord_Nigh |
since we have everything from 9,999,999 and down |
|
08:29
π
|
Lord_Nigh |
from other sources |
|
08:29
π
|
Lord_Nigh |
main discussion is in #yarharfiddlededee |
|
08:30
π
|
midas |
it isnt really down anyway, the police grabbed a loadbalancer |
|
08:30
π
|
midas |
and a dns server |
|
08:31
π
|
Lord_Nigh |
so... maybe .ee is actually the 'real thing'? that doesn't make sense, since .ee has no https, and .ee has that weird $5-per-year popup when trying to get magnet links (which you can enter any 6 digit numerical password to and it will work) |
|
08:31
π
|
midas |
nah |
|
08:31
π
|
Lord_Nigh |
(696969 works) |
|
08:31
π
|
midas |
ee can be anything, im not sure |
|
08:31
π
|
midas |
the dns server runs the internal dns traffic |
|
08:31
π
|
Lord_Nigh |
ee seems to follow the same numbering tpb did |
|
08:34
π
|
Lord_Nigh |
.. and now .ee is down |
|
08:34
π
|
Ctrl-S |
the domain name or ip or both? |
|
08:34
π
|
Lord_Nigh |
no, its not down |
|
08:34
π
|
Lord_Nigh |
its ... unstable |
|
08:34
π
|
Lord_Nigh |
random 502 errors |
|
08:34
π
|
Lord_Nigh |
seems other people have found it |
|
08:35
π
|
Lord_Nigh |
the ids DO exactly match the post-10000000 ids from the actual .se site |
|
08:35
π
|
Ctrl-S |
can i ask for help setting up the warrior thing here? |
|
08:35
π
|
Lord_Nigh |
so i think scraping .ee is a VERY good idea |
|
08:37
π
|
arkiver |
All metro newspapers of brazil-brasilia of 2013 uploaded |
|
08:37
π
|
Lord_Nigh |
its extremely sad that the .ee site, which has its origins (and may still technically be) a mirror of the original intended for scamming and phishing, may be the last running copy of all magnet ids after 10,000,000 |
|
08:39
π
|
Lord_Nigh |
the magnets are definitely valid! |
|
08:39
π
|
Ctrl-S |
when I try to set the rate limit using the command from the wiki I get an error. "Syntax error: Invalid parameter '--name'" |
|
08:40
π
|
Lord_Nigh |
but: it is missing the comments from thepiratebay.se |
|
08:40
π
|
Lord_Nigh |
all of them |
|
08:40
π
|
Lord_Nigh |
afaict |
|
08:41
π
|
Lord_Nigh |
so it itself is a skimmed copy |
|
08:41
π
|
Lord_Nigh |
well.... can we skim the skimmed copy? |
|
08:43
π
|
Lord_Nigh |
we could also troll google cache until it comes down for the .se site and get comments etc |
|
08:43
π
|
Lord_Nigh |
that's maybe a week tops |
|
08:45
π
|
|
primus104 has joined #archiveteam-bs |
|
08:55
π
|
joepie91 |
this is amazing stuff: http://vimeo.com/12672088 |
|
09:41
π
|
|
ivan has joined #archiveteam-bs |
|
09:44
π
|
ivan |
I have been rsync'ing gentoo's distfiles very frequently for two years without deleting anything that they remove, but it is of low value to me and I don't have fast upstream to dump the 500GB+ somewhere |
|
09:45
π
|
ivan |
if you want something from it, let me know before I delete it |
|
09:45
π
|
joepie91 |
ivan: wait, what are distfiles? |
|
09:46
π
|
ivan |
all the tarballs that the ebuilds grab |
|
09:46
π
|
joepie91 |
are these available elsewhere in this format? |
|
09:46
π
|
ivan |
old stuff gets removed from distfiles |
|
09:46
π
|
joepie91 |
(historically) |
|
09:46
π
|
ivan |
presumably 99.9% of it is in git repos elsewhere |
|
09:47
π
|
joepie91 |
hrm. are they github-generated tarballs? |
|
09:47
π
|
joepie91 |
or actual releases? |
|
09:47
π
|
ivan |
they're mostly official releases |
|
09:47
π
|
joepie91 |
any chance you can upload them over a long period of time? |
|
09:47
π
|
joepie91 |
(I probably have an rsync target for it) |
|
09:48
π
|
ivan |
no but maybe I can mail a drive to someone in the US |
|
09:48
π
|
* |
joepie91 is not in the US |
|
09:49
π
|
Ctrl-S |
what sort of content is this? |
|
09:50
π
|
joepie91 |
Ctrl-S: basically, historical (source) releases of software |
|
09:50
π
|
joepie91 |
definitely an archival value to it imo |
|
09:50
π
|
Ctrl-S |
could you just upload the differences between versions? |
|
09:50
π
|
joepie91 |
that's... actually not a bad idea |
|
09:50
π
|
Ctrl-S |
sort of like wikis do |
|
09:50
π
|
joepie91 |
ivan: are you aware of any methods for having a 'source' file and having delta'd "derived" files? |
|
09:51
π
|
Ctrl-S |
every few hundred versions you upload the full version, and between those just the differences |
|
09:51
π
|
joepie91 |
that seems like it could work here - take the first release as base, then delta every release after it |
|
09:51
π
|
ivan |
I've programmed bizarre compression schemes like this before and it's not worth it |
|
09:51
π
|
joepie91 |
I know that it's technically possible, but no idea if it already exists for this particular usecase |
|
09:51
π
|
ivan |
I'm happy to pay the 6 bucks to mail it |
|
09:51
π
|
joepie91 |
ivan: not so much compression, as diff'ing :P |
|
09:51
π
|
joepie91 |
should be extremely efficient for this kind of data |
|
09:52
π
|
joepie91 |
otoh |
|
09:52
π
|
Ctrl-S |
is there a tool that does it? |
|
09:52
π
|
joepie91 |
you could accomplish a near-identical result by just having one tarball per piece of software and having each release (uncompressed) in its own directory |
|
09:52
π
|
joepie91 |
and using a compression format with a per-archive dictionary |
|
09:53
π
|
joepie91 |
ivan: problem is mostly a mailing target :P |
|
09:53
π
|
joepie91 |
I mean, unless you're planning on mailing it to NL... heh |
|
09:54
π
|
ivan |
https://ludios.org/tmp/gentoo-distfiles.txt |
|
09:55
π
|
ivan |
37MB beware of browser crash |
|
09:57
π
|
joepie91 |
that's a lot of packages :P |
|
09:57
π
|
joepie91 |
ivan: should probably ask again in a few hours, when US-ians wake up |
|
09:58
π
|
joepie91 |
(or ship it to NL) |
|
09:58
π
|
ivan |
maybe SketchCow will take it |
|
09:58
π
|
Void_ |
is this like all gentoo files ever |
|
09:59
π
|
* |
joepie91 throws `sort` at it |
|
09:59
π
|
* |
joepie91 watches it eat a core |
|
10:00
π
|
joepie91 |
Void_: about 2 years, it seems |
|
10:03
π
|
espes__ |
throw it in a git repo and let it handle the delta compression? |
|
10:03
π
|
ivan |
that doesn't work for compressed tarballs |
|
10:07
π
|
godane |
uploaded: https://archive.org/details/news.kbs.co.kr-search-news-code-1-to-10000-20141207 |
|
10:08
π
|
godane |
that has South Korean news from 1999-01-01 to 1999-02-21 |
|
10:10
π
|
godane |
that ups it by about 10k: https://web.archive.org/web/*/http://news.kbs.co.kr/news/NewsView.do?SEARCH_NEWS_CODE=* |
|
10:10
π
|
godane |
only 2872 urls from those url types |
|
10:10
π
|
|
brayden has joined #archiveteam-bs |
|
10:55
π
|
|
schbirid has joined #archiveteam-bs |
|
11:15
π
|
|
www2 has joined #archiveteam-bs |
|
11:45
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
11:50
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
|
11:55
π
|
|
dx has quit IRC (Ping timeout: 265 seconds) |
|
11:58
π
|
|
dx has joined #archiveteam-bs |
|
12:06
π
|
|
Kadercavd has joined #archiveteam-bs |
|
12:06
π
|
Kadercavd |
I swear to love Promise me here vote |
|
12:06
π
|
Kadercavd |
Β Β Β My name is Mark Bass http://strawpoll.me/3100584 Vote |
|
12:06
π
|
midas |
no. |
|
12:06
π
|
|
Kadercavd has quit IRC (Client Quit) |
|
12:07
π
|
|
Kadercavd has joined #archiveteam-bs |
|
12:08
π
|
midas |
NO. |
|
12:08
π
|
|
Kadercavd has quit IRC (Client Quit) |
|
12:12
π
|
joepie91 |
lol. |
|
12:29
π
|
|
www2 has joined #archiveteam-bs |
|
12:32
π
|
joepie91 |
I.. wht |
|
12:32
π
|
joepie91 |
what * |
|
12:32
π
|
joepie91 |
http://m.lg.com/ph/inside-lg/christmas-beat |
|
12:33
π
|
joepie91 |
apparently LG is using PDFy now? |
|
12:35
π
|
schbirid |
oh god please goatse them |
|
12:53
π
|
dashcloud |
isn't that a good thing that LG's using it? we'll have all of their manuals backed up to IA automatically then |
|
13:14
π
|
midas |
joepie91: how much bandwidth is pdfy using nowadays? |
|
13:18
π
|
|
sirkov has quit IRC (Ping timeout: 370 seconds) |
|
13:20
π
|
|
sirkov has joined #archiveteam-bs |
|
13:48
π
|
|
sankin has joined #archiveteam-bs |
|
13:53
π
|
|
sankin has quit IRC (Client Quit) |
|
14:04
π
|
|
sankin has joined #archiveteam-bs |
|
14:24
π
|
|
lrkj has quit IRC (Ping timeout: 612 seconds) |
|
15:38
π
|
|
StartAway has quit IRC (Read error: Operation timed out) |
|
15:50
π
|
|
mistym has joined #archiveteam-bs |
|
15:51
π
|
|
mistym has quit IRC (Remote host closed the connection) |
|
15:57
π
|
|
BiggieJo1 has quit IRC (Read error: Connection reset by peer) |
|
15:57
π
|
|
Nertsy has quit IRC (Quit: Nertsy) |
|
16:07
π
|
|
Nertsy has joined #archiveteam-bs |
|
16:13
π
|
|
Start has joined #archiveteam-bs |
|
16:18
π
|
|
aaaaaaaaa has joined #archiveteam-bs |
|
16:58
π
|
|
Start has quit IRC (Read error: No route to host) |
|
16:58
π
|
|
dx has quit IRC (Ping timeout: 246 seconds) |
|
17:02
π
|
|
Start has joined #archiveteam-bs |
|
17:02
π
|
|
dx has joined #archiveteam-bs |
|
17:15
π
|
|
mistym has joined #archiveteam-bs |
|
17:40
π
|
|
Start has quit IRC (Read error: Connection reset by peer) |
|
17:49
π
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
|
17:50
π
|
|
GLaDOS has joined #archiveteam-bs |
|
17:50
π
|
|
swebb sets mode: +o GLaDOS |
|
18:03
π
|
|
mistym has quit IRC (Remote host closed the connection) |
|
18:04
π
|
|
mistym has joined #archiveteam-bs |
|
18:27
π
|
joepie91 |
midas: 1.17TB last month |
|
18:27
π
|
joepie91 |
peaked at 325mbps yesterday |
|
18:27
π
|
joepie91 |
dashcloud: heheh |
|
18:29
π
|
|
mistym has quit IRC (Remote host closed the connection) |
|
18:30
π
|
|
mistym has joined #archiveteam-bs |
|
18:31
π
|
|
brayden has quit IRC (Ping timeout: 607 seconds) |
|
18:52
π
|
|
Start has joined #archiveteam-bs |
|
18:58
π
|
|
rejon has quit IRC (Ping timeout: 480 seconds) |
|
19:00
π
|
|
www2 has quit IRC (Ping timeout: 335 seconds) |
|
19:01
π
|
SadDM |
Has anybody here looked at Google's newspaper archive? |
|
19:01
π
|
SadDM |
I'm looking for way to download parts of a page without crawling through the rendered DOM and figuring out what images I need |
|
19:02
π
|
SadDM |
*And* then re-assembling them. |
|
19:17
π
|
|
ete_ has joined #archiveteam-bs |
|
19:23
π
|
Atluxity |
/buffer ccc |
|
19:23
π
|
Atluxity |
freaking whitespaces |
|
19:41
π
|
|
aaaaaaaa_ has joined #archiveteam-bs |
|
19:45
π
|
|
phuzion has quit IRC (Read error: Operation timed out) |
|
19:47
π
|
|
xtr-201 has quit IRC (Read error: Operation timed out) |
|
19:47
π
|
|
aaaaaaaaa has quit IRC (Read error: Operation timed out) |
|
19:47
π
|
|
aaaaaaaa_ has quit IRC (Client Quit) |
|
19:47
π
|
|
Start has quit IRC (Read error: Operation timed out) |
|
19:47
π
|
|
aaaaaaaa_ has joined #archiveteam-bs |
|
19:47
π
|
|
phuzion has joined #archiveteam-bs |
|
19:48
π
|
|
xtr-201 has joined #archiveteam-bs |
|
19:57
π
|
|
aaaaaaaa_ has quit IRC (Ping timeout: 480 seconds) |
|
20:02
π
|
|
BlueMaxim has joined #archiveteam-bs |
|
20:05
π
|
|
mistym_ has joined #archiveteam-bs |
|
20:28
π
|
|
logchfoo starts logging #archiveteam-bs at Wed Dec 10 20:28:35 2014 |
|
20:28
π
|
|
logchfoo has joined #archiveteam-bs |
|
20:42
π
|
|
Arkiver2 is now known as arkiver |
|
20:48
π
|
|
brayden has joined #archiveteam-bs |
|
20:58
π
|
|
kyan has quit IRC (Read error: Connection reset by peer) |
|
21:30
π
|
|
kyan_ has joined #archiveteam-bs |
|
21:33
π
|
|
www2 has joined #archiveteam-bs |
|
21:36
π
|
|
APerti has joined #archiveteam-bs |
|
21:39
π
|
|
APerti_ has quit IRC (Ping timeout: 370 seconds) |
|
21:49
π
|
|
Start has joined #archiveteam-bs |
|
21:58
π
|
|
schbirid has quit IRC (Leaving) |
|
22:06
π
|
* |
Void_ uses a dirty scanner to poke saddm |
|
22:06
π
|
Void_ |
huh |
|
22:08
π
|
|
mistym_ has quit IRC (Quit: Leaving...) |
|
22:09
π
|
|
ivan- is now known as ivan`- |
|
22:26
π
|
|
Start has quit IRC (Read error: Operation timed out) |
|
22:34
π
|
|
SN4T14_ has joined #archiveteam-bs |
|
22:39
π
|
|
SN4T14 has quit IRC (Ping timeout: 369 seconds) |
|
23:19
π
|
|
mistym has joined #archiveteam-bs |
|
23:23
π
|
|
dashcloud has quit IRC (Ping timeout: 265 seconds) |
|
23:23
π
|
|
nico has quit IRC (Ping timeout: 265 seconds) |
|
23:24
π
|
|
Insomnia1 has quit IRC (Ping timeout: 265 seconds) |
|
23:24
π
|
|
Insomnia_ has joined #archiveteam-bs |
|
23:24
π
|
|
wm_ has quit IRC (Ping timeout: 265 seconds) |
|
23:28
π
|
|
dashcloud has joined #archiveteam-bs |
|
23:39
π
|
|
nico has joined #archiveteam-bs |
|
23:46
π
|
|
wm_ has joined #archiveteam-bs |
|
23:46
π
|
|
Start has joined #archiveteam-bs |