Time |
Nickname |
Message |
00:01
🔗
|
|
benuski has quit IRC (Quit: Leaving) |
00:18
🔗
|
|
maelstrom has quit IRC (Quit: Leaving) |
00:22
🔗
|
|
maelstrom has joined #archiveteam |
00:26
🔗
|
|
hive-mind has quit IRC (Ping timeout: 260 seconds) |
00:26
🔗
|
|
hive-mind has joined #archiveteam |
00:31
🔗
|
|
jrwr has quit IRC (Remote host closed the connection) |
00:33
🔗
|
|
jrwr has joined #archiveteam |
00:49
🔗
|
|
powerKitt has joined #archiveteam |
00:59
🔗
|
|
powerKitt has quit IRC (Quit: Page closed) |
01:06
🔗
|
|
JesseW has joined #archiveteam |
01:45
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
02:01
🔗
|
|
ravetcofx has quit IRC (Ping timeout: 506 seconds) |
02:10
🔗
|
|
ravetcofx has joined #archiveteam |
02:24
🔗
|
|
rudolphos has joined #archiveteam |
02:25
🔗
|
|
jrwr has quit IRC (Remote host closed the connection) |
02:29
🔗
|
|
rudolphos has quit IRC (Leaving) |
02:52
🔗
|
|
ndiddy has quit IRC (Quit: Leaving) |
02:53
🔗
|
|
Froggypwn has quit IRC (Read error: Operation timed out) |
02:53
🔗
|
|
Froggypwn has joined #archiveteam |
02:53
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
02:54
🔗
|
|
BlueMaxim has joined #archiveteam |
03:54
🔗
|
|
GLaDOS has quit IRC (Quit: Oh crap, I died.) |
04:18
🔗
|
|
maelstrom has quit IRC (Remote host closed the connection) |
05:12
🔗
|
|
balrog has quit IRC (Ping timeout: 260 seconds) |
05:21
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
05:27
🔗
|
|
Sk1d has joined #archiveteam |
05:52
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
05:55
🔗
|
|
Start has joined #archiveteam |
06:15
🔗
|
|
balrog has joined #archiveteam |
06:15
🔗
|
|
swebb sets mode: +o balrog |
06:30
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
08:00
🔗
|
|
Observer has quit IRC (Ping timeout: 268 seconds) |
08:15
🔗
|
|
WinterFox has joined #archiveteam |
08:36
🔗
|
|
khaoohs_ has quit IRC (Read error: Connection reset by peer) |
08:37
🔗
|
|
khaoohs_ has joined #archiveteam |
08:42
🔗
|
|
W1nterFox has joined #archiveteam |
08:48
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
09:31
🔗
|
Medowar0 |
who was doing the home.arcor.de discovery? Some more google scraping, again, raw output, no dedup etc. https://www.medowar.de/lab/at/arcor/liste2.txt |
10:04
🔗
|
PurpleSym |
That would be me, Medowar0. |
10:05
🔗
|
|
ravetcofx has quit IRC (Ping timeout: 506 seconds) |
10:19
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:43
🔗
|
|
antomati_ has joined #archiveteam |
10:43
🔗
|
|
swebb sets mode: +o antomati_ |
10:49
🔗
|
|
antomatic has quit IRC (Read error: Operation timed out) |
11:39
🔗
|
|
Budgiebra has joined #archiveteam |
11:53
🔗
|
Medowar0 |
rip. DNShistory is now officially offline. I was crawling it very slowly, but it is now officially dead. |
12:25
🔗
|
|
Budgiebra has left |
12:32
🔗
|
|
khaoohs__ has joined #archiveteam |
12:34
🔗
|
|
khaoohs_ has quit IRC (Read error: Operation timed out) |
12:57
🔗
|
|
tatata has joined #archiveteam |
12:58
🔗
|
|
tatata has quit IRC (Client Quit) |
13:23
🔗
|
|
bRick5772 has joined #archiveteam |
13:39
🔗
|
|
W1nterFox has quit IRC (Read error: Operation timed out) |
13:51
🔗
|
|
sep332 has joined #archiveteam |
14:25
🔗
|
|
ndiddy has joined #archiveteam |
14:26
🔗
|
|
ndizzle has joined #archiveteam |
14:26
🔗
|
|
ndizzle has quit IRC (Read error: Connection reset by peer) |
15:25
🔗
|
|
arkiver sets mode: +o HCross |
15:48
🔗
|
|
JesseW has joined #archiveteam |
16:08
🔗
|
|
RichardG has joined #archiveteam |
16:09
🔗
|
|
atomotic has joined #archiveteam |
16:33
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
16:48
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
17:16
🔗
|
SketchCow |
Define "Megawarc seems stuck" |
17:19
🔗
|
SketchCow |
Greetings, I'm home |
17:19
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
17:19
🔗
|
SketchCow |
I expect to drink 45 5-hour energy drinks and go through our (and other) backlogs |
17:19
🔗
|
|
BartoCH has joined #archiveteam |
17:19
🔗
|
SketchCow |
POSTIMAGE |
17:19
🔗
|
SketchCow |
HELLO POSTIMAGE |
17:20
🔗
|
|
kristian_ has joined #archiveteam |
17:21
🔗
|
SketchCow |
Hello ArchiveTeam, |
17:21
🔗
|
SketchCow |
Our project hosts over 140 million images used in ~450k websites all over the web, including a number of vibrant communities and bulletin boards. |
17:21
🔗
|
SketchCow |
We have recently found ourselves in financial dire straits, and would like to investigate the opportunities for archiving our collection in case we do not survive after all [although there's still a good chance that we do]. Our total image database is nearly 100Tb large, but almost 40% of that is adult imagery that we believe can be safely sacrificed. |
17:21
🔗
|
SketchCow |
What do you think about this? |
17:21
🔗
|
xmc |
let's take it |
17:21
🔗
|
xmc |
it's uh |
17:22
🔗
|
SketchCow |
I'm going to write them about it |
17:22
🔗
|
SketchCow |
And request a phone call, etc. |
17:22
🔗
|
xmc |
20x as big as gitorious, and i was the only person willing to host gitorious |
17:22
🔗
|
SketchCow |
I just want hard drives |
17:22
🔗
|
xmc |
aye |
17:27
🔗
|
|
JW_work has joined #archiveteam |
17:33
🔗
|
SketchCow |
Anyway, that's on the mantle |
17:39
🔗
|
|
powerKitt has joined #archiveteam |
17:39
🔗
|
arkiver |
SketchCow: I think they announced on their page they're not in trouble anymore |
17:40
🔗
|
SketchCow |
Which, Postimage? |
17:47
🔗
|
DFJustin |
won't somebody think of the adult imagery |
17:49
🔗
|
arkiver |
http://postimage.org/ |
17:49
🔗
|
arkiver |
yeah |
17:50
🔗
|
arkiver |
but it looks like it's changed/removed now |
17:50
🔗
|
arkiver |
do we still want to grab it? |
17:58
🔗
|
powerKitt |
Looks like they're still saying their in danger of closing. |
18:00
🔗
|
|
Aranje has joined #archiveteam |
18:04
🔗
|
|
PepsiMax has joined #archiveteam |
18:09
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
18:13
🔗
|
SketchCow |
They've mailed me and I'm working to get a con call |
18:13
🔗
|
xmc |
rad |
18:16
🔗
|
Kenshin |
SketchCow: i'm going to discuss internally to see if we can step in to help postimage with our cdn |
18:20
🔗
|
SketchCow |
You got it. |
18:20
🔗
|
SketchCow |
But ideally they just send us 50 hard drives. |
18:20
🔗
|
SketchCow |
We have tons of hard drives. |
18:21
🔗
|
Kaz |
jesus, reading their blog post |
18:21
🔗
|
Kenshin |
true, but then project goes into read only mode over at archive.org. not a bad thing if it's kept alive instead |
18:22
🔗
|
Kenshin |
Kaz: there's plenty of other traffic heavy sites that are hiding behind cloudflare, just a matter of time before they get snuffed out |
18:22
🔗
|
Kaz |
I guess |
18:22
🔗
|
Kenshin |
most people just assume cloudflare would offer them free bw forever |
18:22
🔗
|
xmc |
lol |
18:22
🔗
|
Kaz |
just the fact that they really had no plans at all, "We didn�t pay enough attention to making money off Postimage" |
18:22
🔗
|
Kenshin |
they're probably developers > business people |
18:30
🔗
|
|
bwn has joined #archiveteam |
18:40
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |
18:47
🔗
|
|
bwn has joined #archiveteam |
18:51
🔗
|
Nemo_bis |
Kenshin: do you mean heroku can provide a cheaper option when cloudflare stops paying all the bills? |
18:51
🔗
|
Kenshin |
Nemo_bis: don't get you |
18:53
🔗
|
Yoshimura |
There are a lot of cheaper providers. |
18:53
🔗
|
Yoshimura |
The only question is if they can handle the load. |
18:57
🔗
|
|
cadbury_ has quit IRC (Read error: Operation timed out) |
18:57
🔗
|
yipdw |
i have a lot of nice things to say about Heroku but "cheap" is not one of them |
18:58
🔗
|
yipdw |
well, maybe at company budgets it is |
18:58
🔗
|
yipdw |
on individual scale though |
19:05
🔗
|
|
ravetcofx has joined #archiveteam |
19:06
🔗
|
|
bsmith093 has quit IRC (Read error: Operation timed out) |
19:08
🔗
|
|
cadbury_ has joined #archiveteam |
19:25
🔗
|
|
BlackoutI has joined #archiveteam |
19:27
🔗
|
|
BlackoutI has left |
19:27
🔗
|
|
Blackout has joined #archiveteam |
19:28
🔗
|
Blackout |
So is there any point in setting my warrior to vine rn? |
19:31
🔗
|
Yoshimura |
Blackout: Nope. Use Archiveteam's choice |
19:32
🔗
|
Yoshimura |
That's always the best, unless you are around your instance all the time and obsessed. |
19:37
🔗
|
Blackout |
@Yoshimura I have a gigabit line and I figured I'd probably want to run multiple projects. I'm assuming a ton of warriors is inefficient? |
19:39
🔗
|
Yoshimura |
Glad you ask, running a lot of VMs, is inefficient, yes. Warrior itself is inefficient, yes. So you got it squared. |
19:39
🔗
|
Yoshimura |
Simplest thing is to run multiple warriors with modified code in a docker. |
19:39
🔗
|
xmc |
also. you shouldn't run more than one warrior per IP |
19:39
🔗
|
Yoshimura |
xmc: Why not? |
19:39
🔗
|
Blackout |
Even for different projects? |
19:39
🔗
|
xmc |
if you stack multiple warriors on the same ip address, you're twice as likely to get ip-banned |
19:39
🔗
|
xmc |
etc |
19:40
🔗
|
xmc |
for the same project, that is |
19:40
🔗
|
Yoshimura |
Yeah, ip bans are obvious thing. |
19:40
🔗
|
Blackout |
So basically only have one on auto |
19:40
🔗
|
xmc |
because we try to make it so that each warrior sneaks in under IP bans by whoever we're archiving |
19:40
🔗
|
Yoshimura |
Blackout: Is that Linux? |
19:40
🔗
|
Blackout |
Well my main box is Win 10 With hyperV but I have an ESXi host as well on my lan |
19:42
🔗
|
Blackout |
I would say I'm new here but I've been around once before for a tracker I can't quite recall the name of |
19:42
🔗
|
Yoshimura |
Linux host and Warrior dockers (just forward UI to different port), one per project. And use mounting parameters: relatime. And idealy also writeback (need filesystem tweak first). Or mount /data in the containers to a ramdisk. |
19:42
🔗
|
Yoshimura |
Each wget thread competes for IO, plus syslog, so it is pretty inefficient without tweaks (which the VM has) |
19:43
🔗
|
Blackout |
How much data do they download before shipping it back? |
19:48
🔗
|
|
bsmith093 has joined #archiveteam |
19:49
🔗
|
Blackout |
o/ |
19:49
🔗
|
Kaz |
you could also just run the scripts for each project if you're confortable with that. means you can run a lot higher in terms of concurrency etc |
19:50
🔗
|
Yoshimura |
You can just modify single number in code and it will for Warrior also, but you need to watch the IP bans. |
19:51
🔗
|
Yoshimura |
Like Panoramio will not care, but with more threads and process context switching you do not get much performance above about 10-20 threads. |
19:52
🔗
|
Blackout |
What kind of disk space should I allocate though? |
19:57
🔗
|
Yoshimura |
Depends on project. Panoramio items are small, so running that of tmpfs is fine. tmpfs can swap if needed. I run on gigabytes, but keep close eye, and I run on 100Mbit line. |
19:58
🔗
|
Yoshimura |
Panoramio needs few dozen MB per thread. |
19:58
🔗
|
SketchCow |
The Internet Archive S3 infrastructure just got a boost |
19:58
🔗
|
Blackout |
They use S3? |
20:00
🔗
|
Yoshimura |
Great to hear. |
20:00
🔗
|
Yoshimura |
Blackout: S3 is API. S3 compatible products are not a rare thing. |
20:01
🔗
|
Blackout |
Oh ok right that's a widely adopted api. Gotcha |
20:05
🔗
|
SketchCow |
We tend to use "S3-like" but most people in here get it. It's the moving of the term from S3 as a Amazon brand and "S3" as a format. |
20:05
🔗
|
SketchCow |
There was an FTP company once, after all |
20:05
🔗
|
SketchCow |
Fuck those guys |
20:07
🔗
|
Frogging |
TIL |
20:09
🔗
|
Kaz |
what's the 'boost'? |
20:13
🔗
|
SketchCow |
Additional 16cpu machine with 10gig connection |
20:13
🔗
|
SketchCow |
I mean, you and Kenshin are going to assault it to within an inch of its life anyway |
20:13
🔗
|
SketchCow |
but there it is |
20:14
🔗
|
Kaz |
whee |
20:22
🔗
|
Blackout |
Is that an ingest server you're talking about @SketchCow ? |
20:22
🔗
|
godane |
i thought was just me 'assault it to within an inch of its life' :P |
20:23
🔗
|
SketchCow |
I think you all can be blamed |
20:23
🔗
|
SketchCow |
You're all monsters |
20:34
🔗
|
xmc |
the kleenexing of amazon |
20:46
🔗
|
SketchCow |
http://prawfsblawg.blogs.com/.a/6a00d8341c6a7953ef0134851907f7970c-500wi |
20:48
🔗
|
Aoede |
https://www.adobe.com/legal/permissions/trademarks.html |
20:48
🔗
|
xmc |
Aoede: what? |
20:49
🔗
|
Aoede |
Adobe has same problem with trademarks |
20:49
🔗
|
xmc |
oh |
20:49
🔗
|
powerKitt |
Specifically, with the usage of "photoshop" to mean "edit an image" |
20:49
🔗
|
Aoede |
"Correct: The image was enhanced using Adobe® Photoshop® software." |
20:50
🔗
|
Aoede |
" Incorrect: The image was photoshopped." |
20:50
🔗
|
powerKitt |
"Incorrect: My hobby is photoshopping.: |
20:50
🔗
|
Blackout |
I love that |
20:50
🔗
|
SketchCow |
Adobe: OUR NEW BARN DOOR IS GOING TO COMPLETELY CONTAIN THE ESCAPED HORSE |
20:50
🔗
|
Blackout |
Good luck Adobe |
20:50
🔗
|
powerKitt |
"Incorrect: The photoshop pokes fun at the Senator." |
20:50
🔗
|
xmc |
is it better or worse if i call it a shoop |
20:51
🔗
|
SketchCow |
Better |
20:51
🔗
|
xmc |
gr8 |
20:52
🔗
|
powerKitt |
shoop the woop |
20:54
🔗
|
|
maelstrom has joined #archiveteam |
21:01
🔗
|
SketchCow |
Postimage guy gave me his skype. |
21:01
🔗
|
SketchCow |
We'll talk |
21:10
🔗
|
Blackout |
How do you set max rsync jobs with run-pipeline? |
21:12
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
21:13
🔗
|
|
Start has joined #archiveteam |
21:19
🔗
|
Nemo_bis |
Blackout: that's just a legal safeguard for the trademark |
21:20
🔗
|
Nemo_bis |
Just like kleenex tries not to lose the trademark due to the word becoming a common noun |
21:21
🔗
|
Nemo_bis |
Wikimedia Foundation makes the same stupid request for that reason. </endlegalintermezzo> |
21:24
🔗
|
xmc |
woop woop woop off-topic siren |
21:27
🔗
|
HCross2 |
SketchCow: are things still bumpy? 40Mbps up atm, 20tb to shift |
21:56
🔗
|
|
BlueMaxim has joined #archiveteam |
22:03
🔗
|
SketchCow |
Yes |
22:16
🔗
|
|
db48x has joined #archiveteam |
22:20
🔗
|
|
maelstrom has quit IRC (Remote host closed the connection) |
22:29
🔗
|
|
RichardG_ has joined #archiveteam |
22:29
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
22:33
🔗
|
|
maelstrom has joined #archiveteam |
22:35
🔗
|
|
maelstrom has quit IRC (Client Quit) |
22:51
🔗
|
SketchCow |
https://pbs.twimg.com/media/CwIL6ezWAAAC0id.jpg:large |
22:51
🔗
|
SketchCow |
Everyone got that? |
22:52
🔗
|
Blackout |
Nice |
22:54
🔗
|
|
powerKitt has quit IRC (Ping timeout: 268 seconds) |
23:03
🔗
|
xmc |
hrmph |
23:07
🔗
|
|
atomotic has joined #archiveteam |
23:13
🔗
|
JW_work |
That's a lot of broken links, though... |
23:16
🔗
|
JW_work |
more detail: https://www.whitehouse.gov/participate/opening-our-data-public |
23:16
🔗
|
JW_work |
https://www.whitehouse.gov/blog/2016/10/31/digital-transition-how-presidential-transition-works-social-media-age |
23:18
🔗
|
xmc |
so i'm going to assume they are getting special dispensation from twitter to enable them to migrate tweets from one account to another |
23:18
🔗
|
xmc |
it would be the only way to keep ids and timestamps reasonably the same, which is necessary for any archival at all amio |
23:18
🔗
|
xmc |
*imo |
23:19
🔗
|
xmc |
but we should def throw a scraper or two at them |
23:20
🔗
|
joepie91 |
https://www.reddit.com/r/trackers/comments/5aew97/sciencehd_says_farewell_on_november_31/ |
23:21
🔗
|
joepie91 |
apparently big private(-ish?) torrent tracker closing with sciencey(?) stuff |
23:21
🔗
|
joepie91 |
enabled site-wide freeleech until shutdown |
23:21
🔗
|
joepie91 |
unsure if within scope, I'd imagine there's a lot of rare materials |
23:21
🔗
|
joepie91 |
sounds sciencey, no idea what it really is |
23:21
🔗
|
joepie91 |
seems it's not free signup though |
23:24
🔗
|
Yoshimura |
Would need someone with account |
23:24
🔗
|
Yoshimura |
The applications are closed. |
23:24
🔗
|
Yoshimura |
https://sciencehd.me/applications.php |
23:27
🔗
|
|
RichardG_ has quit IRC (Read error: Connection reset by peer) |
23:27
🔗
|
|
RichardG has joined #archiveteam |
23:29
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
23:34
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
23:38
🔗
|
godane |
do people not know that there is no november 31st |
23:38
🔗
|
godane |
thats the second time a closing site has november 31st in the closing post |
23:39
🔗
|
xmc |
yeah! there was one last week, too |