Time |
Nickname |
Message |
00:00
🔗
|
godane |
i was hope for more full mirror of xkcd.org |
00:00
🔗
|
godane |
not just index files and xkcd.org |
00:00
🔗
|
godane |
*html files |
03:28
🔗
|
bsmith093 |
is anyone working on the knol scrape? |
04:34
🔗
|
db48x |
! |
04:34
🔗
|
db48x |
@ERROR: Unknown module 'db48x' |
04:41
🔗
|
Hydriz |
heh |
05:43
🔗
|
Coderjoe |
ah, wikipedia... |
05:43
🔗
|
Coderjoe |
http://imgur.com/gallery/sZA5k |
05:48
🔗
|
NotGLaDOS |
Someone doesn't like me |
05:48
🔗
|
NotGLaDOS |
I did a netstat, and there's about 1000 SYN_SENTs |
06:00
🔗
|
bsmith093 |
meaning what? |
06:04
🔗
|
arrith |
one kind of DDoS is a synflood, might be that |
06:04
🔗
|
arrith |
or just DoS |
06:05
🔗
|
bsmith093 |
is that what half-open connections mean |
06:07
🔗
|
db48x |
bsmith093: yep |
06:37
🔗
|
arrith |
should AT be looking into pastebin-type sites? |
06:37
🔗
|
arrith |
since quite a few pastes are marked as "forever" for "how long to keep" |
06:41
🔗
|
balrog |
arrith: there are quite a few pastes that should not have been pasted as forever |
06:45
🔗
|
bsmith093 |
isnt that sort of random even for us |
06:46
🔗
|
arrith |
balrog: couldn't the same be said for geocities sites? |
06:46
🔗
|
arrith |
bsmith093: i'm sure people think FF.net archiving is random :P |
06:46
🔗
|
balrog |
probably not to the same extent |
06:47
🔗
|
bsmith093 |
besides pastebin id= 8 chars mixedcase thats 52 factorial combos |
06:47
🔗
|
bsmith093 |
i think |
06:48
🔗
|
bsmith093 |
point is even if their each one bit, thats a hell of a lot of data, makes a tb look like nothing |
06:49
🔗
|
bsmith093 |
3.03423e+13 |
06:49
🔗
|
bsmith093 |
combos |
06:54
🔗
|
db48x |
hmm |
06:54
🔗
|
db48x |
bsmith093: if the averate is 1kb, that's only 27 petabytes |
06:55
🔗
|
db48x |
bsmith093: but to be honest, you're assuming that the namespace is anywhere near fully used |
06:55
🔗
|
bsmith093 |
oh well, exCUSE me , mr disk space is *free*, do u have several multi tb drives to fill to the brim? |
06:56
🔗
|
bsmith093 |
its probably not anywhere near exhausted but still thats a lot |
06:56
🔗
|
db48x |
;) |
06:57
🔗
|
bsmith093 |
even if its only 1 bit per, and 25percent full, thats 7.5 tb |
06:57
🔗
|
bsmith093 |
30tb max/4 |
06:58
🔗
|
bsmith093 |
besides imagine brute forcing THAT keyspace |
06:59
🔗
|
db48x |
I'd be suprised if it was 0.25% full |
07:01
🔗
|
bsmith093 |
1000q/s, which i doubt u could sustain, means ~961 **years** |
07:01
🔗
|
arrith |
i'm not sure how to estimate it but yeah, people putting even a few petabytes onto a pastebin seems strange |
07:01
🔗
|
arrith |
in terms of storage, you can compress it. just recently i got 300MB of text to 800KB |
07:06
🔗
|
bsmith093 |
2.405370053 years at 1000q/s if .25%full |
07:06
🔗
|
bsmith093 |
ehat text |
07:09
🔗
|
db48x |
that gives us a good yard stick |
07:09
🔗
|
Coderjoe |
bsmith093: 53459728531456 possibilities, given 52 choices and 8 positions |
07:09
🔗
|
db48x |
has pastebin received 1000 submissions per second in the last few years of operation? |
07:09
🔗
|
db48x |
doubtful |
07:09
🔗
|
bsmith093 |
probably not |
07:10
🔗
|
db48x |
probably 10 or 20 an hour |
07:10
🔗
|
bsmith093 |
i meant how fast we could pull them |
07:10
🔗
|
db48x |
yea, but I'm trying to gauge how full it might be |
07:10
🔗
|
Coderjoe |
for example: 2 possible symbols (0 and 1), 8 positions: 2 to the 8th power = 256 |
07:10
🔗
|
db48x |
even .25% seems like a large overestimate |
07:11
🔗
|
bsmith093 |
1 year = 8 765.81277 hours |
07:11
🔗
|
db48x |
wikipedia says that it's been around for 9 years |
07:11
🔗
|
db48x |
google says (20 per hour) * (9 years) = 1 577 846.3 |
07:11
🔗
|
db48x |
that would be trivial to archive |
07:12
🔗
|
bsmith093 |
couple gig tops |
07:12
🔗
|
db48x |
just about the only thing we would have to do is modify our new universal tracker to show kilobytes instead of megabytes |
07:12
🔗
|
bsmith093 |
im impressed at the human races, (first world) lack of abuse of the commons of a free anything paste site |
07:12
🔗
|
bsmith093 |
:) |
07:13
🔗
|
bsmith093 |
we have a universal tracker |
07:13
🔗
|
bsmith093 |
??? |
07:13
🔗
|
db48x |
yea, it runs splinder.heroku.com, memac.heroku.com, etc |
07:26
🔗
|
bsmith093 |
ffnet.heroku.com ? |
07:29
🔗
|
db48x |
yea, we could probably set something up for fanfiction.net |
07:32
🔗
|
bsmith093 |
how fast can u run a curl script, were using this script http://pastebin.com/M2dgrAUE with this number list to sort good ids from bad ones id list 0000000-9999999 |
07:34
🔗
|
bsmith093 |
its not really distributable, or paralellizable, bu tgivit a fat pipe and itll goto town |
07:35
🔗
|
bsmith093 |
the total #of stroies is probably <3million |
07:35
🔗
|
bsmith093 |
maybe 5m |
07:50
🔗
|
arrith |
bsmith093: once underscor is done with his thing i'd rather rewrite it in python or perl then get it working with the universal tracker |
07:51
🔗
|
bsmith093 |
hmm, so how long might that take, on his end |
07:51
🔗
|
bsmith093 |
also is he still here? |
07:51
🔗
|
arrith |
the download.py from fanfictiondownloader misses a lot of the site and doesn't put it in a nice of format as underscor's |
07:51
🔗
|
bsmith093 |
true, but im not using that anymore |
07:51
🔗
|
arrith |
i'm not sure if he's still here. he's working at that thing though |
07:52
🔗
|
arrith |
oh? using wget-warc? |
07:55
🔗
|
bsmith093 |
umm no, using this www.fanficdowloader.net app |
07:55
🔗
|
bsmith093 |
its a blob, but it can read in link lists, which is perfect for me |
07:58
🔗
|
bsmith093 |
www.fanfictiondownloader.net app |
08:13
🔗
|
Hydriz |
Hi guys, whats the latest? |
08:15
🔗
|
Hydriz |
Something about FanFiction.Net? |
08:21
🔗
|
db48x |
yes, possibly |
08:22
🔗
|
Hydriz |
when can we start archiving it? |
08:23
🔗
|
* |
Hydriz loves to archive random stuff |
08:26
🔗
|
bsmith093 |
http://www.fanfictiondownloader.net |
08:27
🔗
|
Hydriz |
eh, no tracker? |
08:27
🔗
|
bsmith093 |
no, ot yet, turns out distributed stuff is harder than it looks |
08:28
🔗
|
Hydriz |
I see |
08:28
🔗
|
bsmith093 |
heres the list of links to put thorugh the app |
08:28
🔗
|
Hydriz |
catch you guys another day, need to go now |
10:07
🔗
|
godane |
good news everyone on crankygeeks |
10:09
🔗
|
godane |
i maybe able to get episodes 119-125 of crankgeeks |
10:10
🔗
|
godane |
there still hosted on pcmag.com it looks like |
10:11
🔗
|
godane |
just no web page for those episodes for some reason |
11:20
🔗
|
Schbirid |
fun project idea: video advert (from magazines) collection |
12:10
🔗
|
emijrp |
A friend of mine call us Diogenes Team. I laugh. |
12:18
🔗
|
Schbirid |
sisyphus would fit |
12:38
🔗
|
emijrp |
Did you know THE ARCHIVERS? http://thearchivers.blogspot.com/ |
16:20
🔗
|
rude___ |
hey hey |
16:20
🔗
|
rude___ |
any book archiving enthusi-asts in here? |
16:23
🔗
|
emijrp |
surprise me |
16:32
🔗
|
rude___ |
uploading scans of mutt n jeff comics circa 1914 right now |
17:49
🔗
|
emijrp |
416 days since I started to download Jamendo. |
17:49
🔗
|
emijrp |
28000 albums, 1.1TB. |
17:50
🔗
|
emijrp |
I'm about 50%. |
17:53
🔗
|
underscor |
wow |
18:00
🔗
|
Schbirid |
:) |
18:00
🔗
|
Schbirid |
i got mine spread to 2 disks |
18:01
🔗
|
Schbirid |
that reminds me, i uploaded a filelist for you some weeks ago |
18:07
🔗
|
Schbirid |
man, just thinking of jamendo makes me sad and angry |
18:07
🔗
|
Schbirid |
such incompetent idiots |
18:09
🔗
|
Schbirid |
ouch, good thing i am not root |
18:09
🔗
|
Schbirid |
mv: cannot move `sbin' to a subdirectory of itself, `sbin/sbin' |
18:09
🔗
|
emijrp |
why incompetent? |
18:10
🔗
|
Schbirid |
failure to get the platform stable in years |
18:10
🔗
|
Schbirid |
everything is buggy |
18:10
🔗
|
Schbirid |
and ugly |
18:11
🔗
|
Schbirid |
oh look "Jamendo is currently under maintainance, sad isn't it ?"... |
18:11
🔗
|
emijrp |
lul |
18:11
🔗
|
Schbirid |
perfect timing :) |
18:33
🔗
|
emijrp |
Internet Archive doesn't host porn. When historians in the future look back, they will be asexual Internet society. |
18:33
🔗
|
emijrp |
Thanks Internet Archive. |
18:33
🔗
|
emijrp |
will see* |
18:49
🔗
|
underscor |
emijrp: They do |
18:49
🔗
|
underscor |
They just don't disseminate it |
18:51
🔗
|
emijrp |
And when are they going to disseminate it? 100 years? 250? |
18:53
🔗
|
emijrp |
Another problem with Wayback Machine is that you find what you know it existed. |
18:53
🔗
|
emijrp |
You can "google" the weayback machine, right? |
18:53
🔗
|
emijrp |
can't* |
18:53
🔗
|
dnova |
http://i.imgur.com/JakW6.jpg |
18:54
🔗
|
underscor |
emijrp: It falls under the same thing as the copyrighted music and movie archive |
18:54
🔗
|
underscor |
Whenever it becomes legal to distribute |
18:55
🔗
|
Schbirid |
please tell me archive.org grabs all scene releases |
18:55
🔗
|
underscor |
and, no, as far as I know you can't google the wayback machine |
18:55
🔗
|
emijrp |
And copyrighted websites? Wayback Machine is full of that |
18:56
🔗
|
underscor |
Hey, I'm not the decision maker here |
18:56
🔗
|
ersi |
emijrp: What's so hard to understand? |
18:56
🔗
|
underscor |
Also, if you update your robots.txt, your website will automatically disappear |
18:56
🔗
|
ersi |
*We* don't care about copyrighted material, IA have to. |
18:56
🔗
|
underscor |
Also, Joe Schmoe is a lot easier to deal with than Universal Pictures |
18:56
🔗
|
Schbirid |
i never really got it either |
18:56
🔗
|
Schbirid |
especially when jason started uploading all those magazines and manuals |
18:57
🔗
|
underscor |
We should stop discussing this here |
18:57
🔗
|
Schbirid |
seems to be a abandonware-ish view |
18:57
🔗
|
underscor |
#archiverights if you want |
18:57
🔗
|
Schbirid |
oO |
18:57
🔗
|
underscor |
Jason prefers to not have these discussions in #archiveteam |
18:58
🔗
|
ersi |
It's very simple; We don't host crap, IA does. Hence they need to care about stuff we don't. IA still stores crap it can't display |
18:58
🔗
|
underscor |
Yeah |
18:58
🔗
|
Schbirid |
aye |
18:59
🔗
|
underscor |
At least a quarter of the stuff IA stores is not publicly available due to rights encumberment. |
18:59
🔗
|
Schbirid |
also they surely respond to dmca and what not so they are doing nothing wrong (on the contrary :) ) |
19:00
🔗
|
emijrp |
I know all that. |
19:02
🔗
|
underscor |
Schbirid: A scene archive would be really cool |
19:02
🔗
|
underscor |
I've contemplated it |
19:02
🔗
|
underscor |
But the current release rate is too fast |
19:02
🔗
|
underscor |
IA doesn't want to dedicate that much money/resources to something like that when there are older artifacts to preserve |
19:03
🔗
|
dnova |
200tb of mobileme though ... heh |
19:03
🔗
|
underscor |
Hahah |
19:03
🔗
|
underscor |
Yeah |
19:03
🔗
|
underscor |
That one'll be interesting |
19:04
🔗
|
underscor |
We're currently burning at ~26TB a week |
19:04
🔗
|
underscor |
(IA is) |
19:05
🔗
|
Schbirid |
i love reading "IA" |
19:05
🔗
|
Schbirid |
in german it is the name of the donkey from winnie the pooh :) |
19:05
🔗
|
Schbirid |
eeyore or what it is originally called |
19:06
🔗
|
underscor |
hahaha really? |
19:07
🔗
|
emijrp |
http://www.archive.org/details/archiveteam-yahoovideo |
19:08
🔗
|
underscor |
where: &w_collection=archiveteam-yahoovideo | size: 11,161,347,515 KB| redrows: 0 (0.077 seconds) |
19:09
🔗
|
underscor |
Nice size! |
19:09
🔗
|
ersi |
emijrp: For knowing that, you sure seem to ask a lot about it. |
19:09
🔗
|
emijrp |
Why is that public? |
19:09
🔗
|
ersi |
Because no one has whined |
19:09
🔗
|
emijrp |
They are copyrgihted videos. |
19:09
🔗
|
ersi |
Again, what's so hard to understand? |
19:10
🔗
|
ersi |
If copyright owners complain, the collection/files will be delisted. IA will still have it stored though. |
19:10
🔗
|
dnova |
they may be copyrighted videos that were uploaded to a public website for the purpose of sharing them |
19:11
🔗
|
emijrp |
Lawyers Team. Shut up. |
19:11
🔗
|
underscor |
No one has complained. |
19:11
🔗
|
underscor |
That's the biggest reason |
19:11
🔗
|
underscor |
Same with the magazines |
19:11
🔗
|
underscor |
and the manuals |
19:12
🔗
|
ersi |
If we give him 500 examples more, maybe he'll get it |
19:12
🔗
|
underscor |
lol |
19:12
🔗
|
ersi |
Just maybe |
19:12
🔗
|
emijrp |
Give me 500 examples of porn sites that have complained to IA about distributing their content. |
19:13
🔗
|
emijrp |
underscor said that it is the reason why porn is not public |
19:13
🔗
|
ersi |
It's their own business what they choose to list or not |
19:13
🔗
|
ersi |
No he didn't |
19:13
🔗
|
ersi |
It's ONE reason it MAY be delisted |
19:14
🔗
|
underscor |
emijrp: In the case of porn |
19:14
🔗
|
underscor |
It isn't listed because the administration feels that it "tarnishes" the image of IA |
19:14
🔗
|
underscor |
to be entirely frank. |
19:14
🔗
|
dnova |
and they don't want to be the go-to place for easy porn access, I would imagine |
19:15
🔗
|
underscor |
^ |
19:15
🔗
|
dnova |
I don't blame them one bit. |
19:15
🔗
|
emijrp |
underscor: thats the point |
19:15
🔗
|
underscor |
Ok, so we make it available |
19:15
🔗
|
underscor |
All the porn IA has is "stolen" subscription content from sites that are still up |
19:15
🔗
|
emijrp |
Don't give me lessons of law. I came from Wikipedia, the copyright-smarty-lawyers-trollish community. |
19:15
🔗
|
underscor |
Within a week it would all be down |
19:15
🔗
|
dnova |
emijrp: physical museums have only a small fraction of their collection available for public viewing at any given time. It's the same kind of idea. It's still THERE. If someone needs access to it for a good reason, they can have it. |
19:16
🔗
|
underscor |
^ |
19:16
🔗
|
dnova |
Just not everyone and their grandma every day all the time. |
19:16
🔗
|
* |
underscor imagines dnova and his grandma searching through porn archives |
19:16
🔗
|
dnova |
hah |
19:16
🔗
|
dnova |
not QUITE what I was hoping to put in anyone's mind |
19:16
🔗
|
underscor |
hahahaha |
19:17
🔗
|
underscor |
emijrp: Plus, the request volume for archived porn is low. |
19:17
🔗
|
underscor |
If we had content from a site that no longer existed, AND if someone asked about it |
19:17
🔗
|
underscor |
then we' |
19:17
🔗
|
underscor |
d disseminate it |
19:19
🔗
|
underscor |
And, in fact, before I started "volunteering" at the archive |
19:20
🔗
|
underscor |
They weren't saving copyrighted music or movies. |
19:20
🔗
|
underscor |
At all. |
19:22
🔗
|
underscor |
Aside from the standard definition TV recording |
19:46
🔗
|
underscor |
*cricket cricket cricket* |
19:46
🔗
|
underscor |
haha |
19:52
🔗
|
Schbirid |
we are all searching for porn on ia |
19:53
🔗
|
underscor |
Too bad there's no PD porn yet |
19:54
🔗
|
underscor |
I mean, porn existed in 1923 didn't it? |
19:56
🔗
|
Paradoks |
What? There's been porn since the first camera. Well, before, if you count various physical artifacts. |
19:56
🔗
|
emijrp |
There was a Creative Commons porn clip, but it was deleted in the official website. |
19:56
🔗
|
Paradoks |
So there's definitely PD porn out there. |
19:57
🔗
|
emijrp |
That is the only open porn I heard ever. |
19:57
🔗
|
emijrp |
Obviously, pre-1923 porn materials (movies, pics) are PD. |
19:59
🔗
|
emijrp |
Afghanistan is the sole country in the world without copyright laws. |
20:00
🔗
|
emijrp |
But America went there to give democracy. |
20:01
🔗
|
emijrp |
https://en.wikipedia.org/wiki/Afghanistan_and_copyright_issues |
20:03
🔗
|
Paradoks |
http://www.freedomporn.org/smut/Category:public_domain |
20:04
🔗
|
emijrp |
Interesting wiki. |
20:13
🔗
|
Schbirid |
any one got an idea about "CIX conferences" and if they were archived somewhere? |
20:14
🔗
|
Schbirid |
ah compulink |
20:15
🔗
|
Schbirid |
http://web.archive.org/web/19971211045936/http://www.compulink.co.uk/ |
20:46
🔗
|
db48x |
oops |
20:50
🔗
|
bsmith094 |
.join #archiverights |
23:54
🔗
|
db48x2 |
hrm |
23:54
🔗
|
db48x2 |
wget is using so much memory |
23:54
🔗
|
dnova |
for splinder? |
23:54
🔗
|
dnova |
wget-warc uses a lot with the huge profiles |
23:54
🔗
|
db48x2 |
this is mobileme, actually |
23:54
🔗
|
db48x2 |
finishing up my incompletes |
23:55
🔗
|
db48x2 |
this one has 43k lines in its urls.txt |
23:55
🔗
|
dnova |
how much memory? |
23:55
🔗
|
underscor |
grrrrrr |
23:55
🔗
|
underscor |
220 ftp.nodc.noaa.gov FTP server hello there friendly person |
23:55
🔗
|
underscor |
331 Guest login ok, send your complete e-mail address as password. |
23:55
🔗
|
underscor |
530- |
23:55
🔗
|
underscor |
Name (ftp.nodc.noaa.gov:abuie): anonymous |
23:55
🔗
|
underscor |
Password: |
23:55
🔗
|
underscor |
530- Sorry, there is currently a limit of 10 ftp users |
23:55
🔗
|
underscor |
530- on this system. Try again later. |
23:55
🔗
|
db48x2 |
10.2 gigs |
23:55
🔗
|
underscor |
530- |
23:55
🔗
|
underscor |
530 Login incorrect. |
23:55
🔗
|
underscor |
Login failed. |
23:55
🔗
|
underscor |
Really? |
23:55
🔗
|
underscor |
10 users? |
23:55
🔗
|
dnova |
oh shit man |
23:55
🔗
|
underscor |
What the hell |
23:56
🔗
|
dnova |
lol underscor |
23:56
🔗
|
db48x2 |
out of the 8 gigs that the machine has |
23:56
🔗
|
dnova |
they haven't changed the settings since 1994 |
23:56
🔗
|
underscor |
dnova: hahaha |
23:56
🔗
|
underscor |
db48x2: ouch :( |
23:57
🔗
|
dnova |
I thought this was bad |
23:57
🔗
|
dnova |
26664 splinda 15 0 330m 268m 1316 S 0.0 15.7 2:33.98 wget-warc |
23:57
🔗
|
dnova |
guess it could be worse . |
23:57
🔗
|
db48x2 |
heh |