Time |
Nickname |
Message |
00:02
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
00:03
π
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
00:04
π
|
|
GLaDOS has joined #archiveteam-bs |
00:04
π
|
|
swebb sets mode: +o GLaDOS |
00:15
π
|
|
cbb has joined #archiveteam-bs |
00:17
π
|
|
www2 has joined #archiveteam-bs |
00:26
π
|
|
cbb2 has joined #archiveteam-bs |
00:27
π
|
|
cbb has quit IRC (Ping timeout: 265 seconds) |
00:27
π
|
|
cbb2 is now known as cbb |
00:29
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
00:31
π
|
|
kadercavd has joined #archiveteam-bs |
00:31
π
|
kadercavd |
http://strawpoll.me/3100584/r bana oy ver seninle sex sohbeti edicem sΓΆz |
00:32
π
|
|
chfoo sets mode: +b *!*KaderCavd@213.74.159.* |
00:32
π
|
|
kadercavd was kicked by xmc (spammer) |
00:32
π
|
yipdw |
TAG TEAM |
00:33
π
|
|
xmc sets mode: +o yipdw |
00:33
π
|
xmc |
TAG YOU'RE IT |
00:33
π
|
xmc |
|
00:33
π
|
xmc |
wow lag |
00:33
π
|
|
yipdw sets mode: +o xmc |
00:33
π
|
yipdw |
wait are we playing no tagbacks |
00:34
π
|
xmc |
no lagbacks |
00:35
π
|
yipdw |
if my students start to use that word I will be very D: |
00:36
π
|
yipdw |
of course I write that and use "very D:" at the same time so haha fuck my neologism hypocrisy |
00:46
π
|
|
www2 has joined #archiveteam-bs |
01:26
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
01:30
π
|
|
primus104 has quit IRC (Leaving.) |
01:31
π
|
|
mistym has quit IRC (Leaving...) |
01:34
π
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
01:34
π
|
|
ersi has quit IRC (Read error: Operation timed out) |
01:34
π
|
|
GLaDOS has joined #archiveteam-bs |
01:34
π
|
|
swebb sets mode: +o GLaDOS |
01:35
π
|
|
ersi has joined #archiveteam-bs |
01:35
π
|
|
swebb sets mode: +o ersi |
01:49
π
|
|
mistym has joined #archiveteam-bs |
01:54
π
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
01:56
π
|
|
cbb has quit IRC (Quit: cbb) |
01:57
π
|
|
Lord_Nigh has joined #archiveteam-bs |
02:09
π
|
|
LordNigh2 has joined #archiveteam-bs |
02:11
π
|
|
Lord_Nigh has quit IRC (Ping timeout: 272 seconds) |
02:11
π
|
|
LordNigh2 is now known as Lord_Nigh |
02:35
π
|
|
Nertsy has joined #archiveteam-bs |
02:37
π
|
|
chfoo has quit IRC (Remote host closed the connection) |
02:55
π
|
|
Nertsy has quit IRC (Quit: Leaving) |
02:56
π
|
|
chfoo has joined #archiveteam-bs |
03:01
π
|
|
Nertsy has joined #archiveteam-bs |
03:25
π
|
|
dx has quit IRC (Read error: Operation timed out) |
03:41
π
|
|
hashtag has joined #archiveteam-bs |
03:42
π
|
|
hashtag has left |
03:53
π
|
godane |
uploaded: https://archive.org/details/G4_Icons_S01E01 |
04:11
π
|
|
RainbowCo has joined #archiveteam-bs |
04:38
π
|
|
Lord_Nigh has quit IRC (Read error: Connection reset by peer) |
04:39
π
|
|
Lord_Nigh has joined #archiveteam-bs |
04:47
π
|
|
dx has joined #archiveteam-bs |
04:58
π
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:14
π
|
|
rejon has joined #archiveteam-bs |
05:58
π
|
|
Start is now known as StartAway |
06:05
π
|
|
dx has quit IRC (Ping timeout: 369 seconds) |
06:18
π
|
|
dx has joined #archiveteam-bs |
06:56
π
|
Ctrl-S |
are there any projects to archive tumblr going on? |
06:57
π
|
Ctrl-S |
I've got a script that sort of does it for up to a few hundred accounts |
07:01
π
|
yipdw |
no because tumblr is massive and nobody is willing to pay for the cost of storing it |
07:01
π
|
yipdw |
instead we grab individual tumblrs on a special-case basis |
07:02
π
|
Ctrl-S |
okay |
07:03
π
|
yipdw |
http://archive.fart.website/archivebot/viewer/?q=tumblr is a partial list |
07:03
π
|
yipdw |
note that a significant portion of those are shallow grabs that target specific posts |
07:09
π
|
Ctrl-S |
is there a standard method for scraping tumblr stuff you guys use? |
07:10
π
|
yipdw |
#archivebot mostly |
08:10
π
|
SketchCow |
There's a great statement by brokep about how unhappy he is where The Pirate Bay has gone |
08:10
π
|
SketchCow |
(Not the fact it disappeared for the moment - he actually approves of that.) |
08:10
π
|
SketchCow |
He thinks that it got handed down and handed down until it hit the lowest common denominator, and now it was just ads and stunts. |
08:14
π
|
Ctrl-S |
what will replace the pirate bay? |
08:14
π
|
Ctrl-S |
libre-piratebay? |
08:17
π
|
|
mistym has quit IRC (Remote host closed the connection) |
08:19
π
|
|
brayden has quit IRC (Read error: Operation timed out) |
08:19
π
|
SketchCow |
Oh god who knows |
08:21
π
|
godane |
i think a mesh network that works sort of like twitter/reddit/facebook is in order |
08:21
π
|
godane |
i have been thinking of this for awhile |
08:21
π
|
godane |
my idea is git-like mesh network |
08:22
π
|
Ctrl-S |
main concern is spammers of various sorts. MPAA would need to be unable to break network just by setting up a few K VM instances |
08:22
π
|
Lord_Nigh |
SketchCow: theres actually thepiratebay.ee which seems to be run by someone else (possibly as a honeypot since https doesn' work)? |
08:22
π
|
Lord_Nigh |
and is still up |
08:22
π
|
SketchCow |
It's down |
08:22
π
|
Lord_Nigh |
.ee is? |
08:23
π
|
Lord_Nigh |
http://thepiratebay.ee/recent looked up to me a few minutes ago |
08:23
π
|
godane |
where the people you follow on a twitter-like app would be mirrored to your phone/desktop |
08:24
π
|
Lord_Nigh |
but given that it doesn't appear to be run by the main tpb 'team' i can't vouch for the accuracy of the torrents there. then again, neither could tpb |
08:24
π
|
Lord_Nigh |
is it worth scraping that? |
08:25
π
|
Lord_Nigh |
(i'm guessing yes) |
08:26
π
|
Lord_Nigh |
it lookslike it might have a copy of the db up until < 6 hours before the main site was taken down |
08:26
π
|
Lord_Nigh |
since i see nothing newer than 12/09 10:32 |
08:28
π
|
Lord_Nigh |
if that's the case, then .ee should be scraped as quickly as we can manage it |
08:28
π
|
Lord_Nigh |
starting with id 10,000,000 and going up |
08:28
π
|
Lord_Nigh |
since we have everything from 9,999,999 and down |
08:29
π
|
Lord_Nigh |
from other sources |
08:29
π
|
Lord_Nigh |
main discussion is in #yarharfiddlededee |
08:30
π
|
midas |
it isnt really down anyway, the police grabbed a loadbalancer |
08:30
π
|
midas |
and a dns server |
08:31
π
|
Lord_Nigh |
so... maybe .ee is actually the 'real thing'? that doesn't make sense, since .ee has no https, and .ee has that weird $5-per-year popup when trying to get magnet links (which you can enter any 6 digit numerical password to and it will work) |
08:31
π
|
midas |
nah |
08:31
π
|
Lord_Nigh |
(696969 works) |
08:31
π
|
midas |
ee can be anything, im not sure |
08:31
π
|
midas |
the dns server runs the internal dns traffic |
08:31
π
|
Lord_Nigh |
ee seems to follow the same numbering tpb did |
08:34
π
|
Lord_Nigh |
.. and now .ee is down |
08:34
π
|
Ctrl-S |
the domain name or ip or both? |
08:34
π
|
Lord_Nigh |
no, its not down |
08:34
π
|
Lord_Nigh |
its ... unstable |
08:34
π
|
Lord_Nigh |
random 502 errors |
08:34
π
|
Lord_Nigh |
seems other people have found it |
08:35
π
|
Lord_Nigh |
the ids DO exactly match the post-10000000 ids from the actual .se site |
08:35
π
|
Ctrl-S |
can i ask for help setting up the warrior thing here? |
08:35
π
|
Lord_Nigh |
so i think scraping .ee is a VERY good idea |
08:37
π
|
arkiver |
All metro newspapers of brazil-brasilia of 2013 uploaded |
08:37
π
|
Lord_Nigh |
its extremely sad that the .ee site, which has its origins (and may still technically be) a mirror of the original intended for scamming and phishing, may be the last running copy of all magnet ids after 10,000,000 |
08:39
π
|
Lord_Nigh |
the magnets are definitely valid! |
08:39
π
|
Ctrl-S |
when I try to set the rate limit using the command from the wiki I get an error. "Syntax error: Invalid parameter '--name'" |
08:40
π
|
Lord_Nigh |
but: it is missing the comments from thepiratebay.se |
08:40
π
|
Lord_Nigh |
all of them |
08:40
π
|
Lord_Nigh |
afaict |
08:41
π
|
Lord_Nigh |
so it itself is a skimmed copy |
08:41
π
|
Lord_Nigh |
well.... can we skim the skimmed copy? |
08:43
π
|
Lord_Nigh |
we could also troll google cache until it comes down for the .se site and get comments etc |
08:43
π
|
Lord_Nigh |
that's maybe a week tops |
08:45
π
|
|
primus104 has joined #archiveteam-bs |
08:55
π
|
joepie91 |
this is amazing stuff: http://vimeo.com/12672088 |
09:41
π
|
|
ivan has joined #archiveteam-bs |
09:44
π
|
ivan |
I have been rsync'ing gentoo's distfiles very frequently for two years without deleting anything that they remove, but it is of low value to me and I don't have fast upstream to dump the 500GB+ somewhere |
09:45
π
|
ivan |
if you want something from it, let me know before I delete it |
09:45
π
|
joepie91 |
ivan: wait, what are distfiles? |
09:46
π
|
ivan |
all the tarballs that the ebuilds grab |
09:46
π
|
joepie91 |
are these available elsewhere in this format? |
09:46
π
|
ivan |
old stuff gets removed from distfiles |
09:46
π
|
joepie91 |
(historically) |
09:46
π
|
ivan |
presumably 99.9% of it is in git repos elsewhere |
09:47
π
|
joepie91 |
hrm. are they github-generated tarballs? |
09:47
π
|
joepie91 |
or actual releases? |
09:47
π
|
ivan |
they're mostly official releases |
09:47
π
|
joepie91 |
any chance you can upload them over a long period of time? |
09:47
π
|
joepie91 |
(I probably have an rsync target for it) |
09:48
π
|
ivan |
no but maybe I can mail a drive to someone in the US |
09:48
π
|
* |
joepie91 is not in the US |
09:49
π
|
Ctrl-S |
what sort of content is this? |
09:50
π
|
joepie91 |
Ctrl-S: basically, historical (source) releases of software |
09:50
π
|
joepie91 |
definitely an archival value to it imo |
09:50
π
|
Ctrl-S |
could you just upload the differences between versions? |
09:50
π
|
joepie91 |
that's... actually not a bad idea |
09:50
π
|
Ctrl-S |
sort of like wikis do |
09:50
π
|
joepie91 |
ivan: are you aware of any methods for having a 'source' file and having delta'd "derived" files? |
09:51
π
|
Ctrl-S |
every few hundred versions you upload the full version, and between those just the differences |
09:51
π
|
joepie91 |
that seems like it could work here - take the first release as base, then delta every release after it |
09:51
π
|
ivan |
I've programmed bizarre compression schemes like this before and it's not worth it |
09:51
π
|
joepie91 |
I know that it's technically possible, but no idea if it already exists for this particular usecase |
09:51
π
|
ivan |
I'm happy to pay the 6 bucks to mail it |
09:51
π
|
joepie91 |
ivan: not so much compression, as diff'ing :P |
09:51
π
|
joepie91 |
should be extremely efficient for this kind of data |
09:52
π
|
joepie91 |
otoh |
09:52
π
|
Ctrl-S |
is there a tool that does it? |
09:52
π
|
joepie91 |
you could accomplish a near-identical result by just having one tarball per piece of software and having each release (uncompressed) in its own directory |
09:52
π
|
joepie91 |
and using a compression format with a per-archive dictionary |
09:53
π
|
joepie91 |
ivan: problem is mostly a mailing target :P |
09:53
π
|
joepie91 |
I mean, unless you're planning on mailing it to NL... heh |
09:54
π
|
ivan |
https://ludios.org/tmp/gentoo-distfiles.txt |
09:55
π
|
ivan |
37MB beware of browser crash |
09:57
π
|
joepie91 |
that's a lot of packages :P |
09:57
π
|
joepie91 |
ivan: should probably ask again in a few hours, when US-ians wake up |
09:58
π
|
joepie91 |
(or ship it to NL) |
09:58
π
|
ivan |
maybe SketchCow will take it |
09:58
π
|
Void_ |
is this like all gentoo files ever |
09:59
π
|
* |
joepie91 throws `sort` at it |
09:59
π
|
* |
joepie91 watches it eat a core |
10:00
π
|
joepie91 |
Void_: about 2 years, it seems |
10:03
π
|
espes__ |
throw it in a git repo and let it handle the delta compression? |
10:03
π
|
ivan |
that doesn't work for compressed tarballs |
10:07
π
|
godane |
uploaded: https://archive.org/details/news.kbs.co.kr-search-news-code-1-to-10000-20141207 |
10:08
π
|
godane |
that has South Korean news from 1999-01-01 to 1999-02-21 |
10:10
π
|
godane |
that ups it by about 10k: https://web.archive.org/web/*/http://news.kbs.co.kr/news/NewsView.do?SEARCH_NEWS_CODE=* |
10:10
π
|
godane |
only 2872 urls from those url types |
10:10
π
|
|
brayden has joined #archiveteam-bs |
10:55
π
|
|
schbirid has joined #archiveteam-bs |
11:15
π
|
|
www2 has joined #archiveteam-bs |
11:45
π
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
11:50
π
|
|
www2 has quit IRC (Read error: Operation timed out) |
11:55
π
|
|
dx has quit IRC (Ping timeout: 265 seconds) |
11:58
π
|
|
dx has joined #archiveteam-bs |
12:06
π
|
|
Kadercavd has joined #archiveteam-bs |
12:06
π
|
Kadercavd |
I swear to love Promise me here vote |
12:06
π
|
Kadercavd |
Β Β Β My name is Mark Bass http://strawpoll.me/3100584 Vote |
12:06
π
|
midas |
no. |
12:06
π
|
|
Kadercavd has quit IRC (Client Quit) |
12:07
π
|
|
Kadercavd has joined #archiveteam-bs |
12:08
π
|
midas |
NO. |
12:08
π
|
|
Kadercavd has quit IRC (Client Quit) |
12:12
π
|
joepie91 |
lol. |
12:29
π
|
|
www2 has joined #archiveteam-bs |
12:32
π
|
joepie91 |
I.. wht |
12:32
π
|
joepie91 |
what * |
12:32
π
|
joepie91 |
http://m.lg.com/ph/inside-lg/christmas-beat |
12:33
π
|
joepie91 |
apparently LG is using PDFy now? |
12:35
π
|
schbirid |
oh god please goatse them |
12:53
π
|
dashcloud |
isn't that a good thing that LG's using it? we'll have all of their manuals backed up to IA automatically then |
13:14
π
|
midas |
joepie91: how much bandwidth is pdfy using nowadays? |
13:18
π
|
|
sirkov has quit IRC (Ping timeout: 370 seconds) |
13:20
π
|
|
sirkov has joined #archiveteam-bs |
13:48
π
|
|
sankin has joined #archiveteam-bs |
13:53
π
|
|
sankin has quit IRC (Client Quit) |
14:04
π
|
|
sankin has joined #archiveteam-bs |
14:24
π
|
|
lrkj has quit IRC (Ping timeout: 612 seconds) |
15:38
π
|
|
StartAway has quit IRC (Read error: Operation timed out) |
15:50
π
|
|
mistym has joined #archiveteam-bs |
15:51
π
|
|
mistym has quit IRC (Remote host closed the connection) |
15:57
π
|
|
BiggieJo1 has quit IRC (Read error: Connection reset by peer) |
15:57
π
|
|
Nertsy has quit IRC (Quit: Nertsy) |
16:07
π
|
|
Nertsy has joined #archiveteam-bs |
16:13
π
|
|
Start has joined #archiveteam-bs |
16:18
π
|
|
aaaaaaaaa has joined #archiveteam-bs |
16:58
π
|
|
Start has quit IRC (Read error: No route to host) |
16:58
π
|
|
dx has quit IRC (Ping timeout: 246 seconds) |
17:02
π
|
|
Start has joined #archiveteam-bs |
17:02
π
|
|
dx has joined #archiveteam-bs |
17:15
π
|
|
mistym has joined #archiveteam-bs |
17:40
π
|
|
Start has quit IRC (Read error: Connection reset by peer) |
17:49
π
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
17:50
π
|
|
GLaDOS has joined #archiveteam-bs |
17:50
π
|
|
swebb sets mode: +o GLaDOS |
18:03
π
|
|
mistym has quit IRC (Remote host closed the connection) |
18:04
π
|
|
mistym has joined #archiveteam-bs |
18:27
π
|
joepie91 |
midas: 1.17TB last month |
18:27
π
|
joepie91 |
peaked at 325mbps yesterday |
18:27
π
|
joepie91 |
dashcloud: heheh |
18:29
π
|
|
mistym has quit IRC (Remote host closed the connection) |
18:30
π
|
|
mistym has joined #archiveteam-bs |
18:31
π
|
|
brayden has quit IRC (Ping timeout: 607 seconds) |
18:52
π
|
|
Start has joined #archiveteam-bs |
18:58
π
|
|
rejon has quit IRC (Ping timeout: 480 seconds) |
19:00
π
|
|
www2 has quit IRC (Ping timeout: 335 seconds) |
19:01
π
|
SadDM |
Has anybody here looked at Google's newspaper archive? |
19:01
π
|
SadDM |
I'm looking for way to download parts of a page without crawling through the rendered DOM and figuring out what images I need |
19:02
π
|
SadDM |
*And* then re-assembling them. |
19:17
π
|
|
ete_ has joined #archiveteam-bs |
19:23
π
|
Atluxity |
/buffer ccc |
19:23
π
|
Atluxity |
freaking whitespaces |
19:41
π
|
|
aaaaaaaa_ has joined #archiveteam-bs |
19:45
π
|
|
phuzion has quit IRC (Read error: Operation timed out) |
19:47
π
|
|
xtr-201 has quit IRC (Read error: Operation timed out) |
19:47
π
|
|
aaaaaaaaa has quit IRC (Read error: Operation timed out) |
19:47
π
|
|
aaaaaaaa_ has quit IRC (Client Quit) |
19:47
π
|
|
Start has quit IRC (Read error: Operation timed out) |
19:47
π
|
|
aaaaaaaa_ has joined #archiveteam-bs |
19:47
π
|
|
phuzion has joined #archiveteam-bs |
19:48
π
|
|
xtr-201 has joined #archiveteam-bs |
19:57
π
|
|
aaaaaaaa_ has quit IRC (Ping timeout: 480 seconds) |
20:02
π
|
|
BlueMaxim has joined #archiveteam-bs |
20:05
π
|
|
mistym_ has joined #archiveteam-bs |
20:28
π
|
|
logchfoo starts logging #archiveteam-bs at Wed Dec 10 20:28:35 2014 |
20:28
π
|
|
logchfoo has joined #archiveteam-bs |
20:42
π
|
|
Arkiver2 is now known as arkiver |
20:48
π
|
|
brayden has joined #archiveteam-bs |
20:58
π
|
|
kyan has quit IRC (Read error: Connection reset by peer) |
21:30
π
|
|
kyan_ has joined #archiveteam-bs |
21:33
π
|
|
www2 has joined #archiveteam-bs |
21:36
π
|
|
APerti has joined #archiveteam-bs |
21:39
π
|
|
APerti_ has quit IRC (Ping timeout: 370 seconds) |
21:49
π
|
|
Start has joined #archiveteam-bs |
21:58
π
|
|
schbirid has quit IRC (Leaving) |
22:06
π
|
* |
Void_ uses a dirty scanner to poke saddm |
22:06
π
|
Void_ |
huh |
22:08
π
|
|
mistym_ has quit IRC (Quit: Leaving...) |
22:09
π
|
|
ivan- is now known as ivan`- |
22:26
π
|
|
Start has quit IRC (Read error: Operation timed out) |
22:34
π
|
|
SN4T14_ has joined #archiveteam-bs |
22:39
π
|
|
SN4T14 has quit IRC (Ping timeout: 369 seconds) |
23:19
π
|
|
mistym has joined #archiveteam-bs |
23:23
π
|
|
dashcloud has quit IRC (Ping timeout: 265 seconds) |
23:23
π
|
|
nico has quit IRC (Ping timeout: 265 seconds) |
23:24
π
|
|
Insomnia1 has quit IRC (Ping timeout: 265 seconds) |
23:24
π
|
|
Insomnia_ has joined #archiveteam-bs |
23:24
π
|
|
wm_ has quit IRC (Ping timeout: 265 seconds) |
23:28
π
|
|
dashcloud has joined #archiveteam-bs |
23:39
π
|
|
nico has joined #archiveteam-bs |
23:46
π
|
|
wm_ has joined #archiveteam-bs |
23:46
π
|
|
Start has joined #archiveteam-bs |