Time |
Nickname |
Message |
00:00
🔗
|
yipdw |
FalconK: I think you can try it out just by uploading to the Community Texts collection |
00:00
🔗
|
yipdw |
collection id is opensource, I think |
00:01
🔗
|
yipdw |
that's open to anyone with an IA account |
00:01
🔗
|
yipdw |
I can't remember if your account needs special privileges to upload with mediatype web, SketchCow would know more |
00:02
🔗
|
dxrt |
Should be able to set mediatype to web on any standard account - from my experience. |
00:10
🔗
|
MrRadar |
When I've tried to set mediatype through the web interface it has blocked me |
00:10
🔗
|
MrRadar |
At least for setting it to web |
00:11
🔗
|
SketchCow |
YPi cam |
00:11
🔗
|
dxrt |
I do it through curl, no probs |
00:11
🔗
|
SketchCow |
You can't load directly into web. |
00:11
🔗
|
|
zenguy has quit IRC (Read error: Operation timed out) |
00:12
🔗
|
dxrt |
collection opensource, mediatype web? |
00:12
🔗
|
SketchCow |
Try, but I don't know |
00:12
🔗
|
dxrt |
works for me at least. |
00:15
🔗
|
JW_work |
there's a *web* collection ( https://archive.org/details/web ) which is different than the "web" mediatype . This is particularly confusing because there appears to be magic that makes the few items whose identifiers are the name of a mediatype also appear to contain all the items with that mediatype. |
00:16
🔗
|
|
zenguy has joined #archiveteam-bs |
00:24
🔗
|
JW_work |
it certainly looks like there is no restriction on giving things the web mediatype. See for example, this: https://archive.org/details/warc-files.tjw.moe |
00:25
🔗
|
JW_work |
or even more so: https://archive.org/details/heckert_gnu_png |
00:34
🔗
|
FalconK |
well, alright. I'll test it with opensource, and then submit a pull request. |
00:35
🔗
|
FalconK |
I also have a pull request on the megawarc assembler - don't use cleartext HTTP and IA's authorization header at the same time! |
00:35
🔗
|
FalconK |
unless you believe in passing cleartext passwords over networks ;) |
00:35
🔗
|
|
ohhdemgir has quit IRC (Read error: Operation timed out) |
00:36
🔗
|
|
w0rp has quit IRC (Read error: Operation timed out) |
00:37
🔗
|
|
ohhdemgir has joined #archiveteam-bs |
00:38
🔗
|
|
w0rp has joined #archiveteam-bs |
01:16
🔗
|
|
Stiletto has quit IRC () |
01:17
🔗
|
|
Stiletto has joined #archiveteam-bs |
02:20
🔗
|
|
zenguy has quit IRC (Read error: Operation timed out) |
02:23
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
02:23
🔗
|
|
zenguy has joined #archiveteam-bs |
02:27
🔗
|
|
dashcloud has joined #archiveteam-bs |
02:51
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
02:55
🔗
|
|
dashcloud has joined #archiveteam-bs |
03:11
🔗
|
|
JesseW has joined #archiveteam-bs |
03:34
🔗
|
ErkDog |
http://puu.sh/nyMLP/74d28d17ac.png wheee |
03:37
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
03:39
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
03:40
🔗
|
|
Start has joined #archiveteam-bs |
03:40
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
03:47
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
03:49
🔗
|
|
Start has joined #archiveteam-bs |
03:57
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
04:04
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
04:17
🔗
|
|
JesseW has joined #archiveteam-bs |
04:20
🔗
|
|
fie has quit IRC (Read error: Connection reset by peer) |
04:20
🔗
|
|
bwn has joined #archiveteam-bs |
04:34
🔗
|
ErkDog |
sadness :( |
04:34
🔗
|
ErkDog |
http://puu.sh/nyPYB/16d1d24031.png |
04:35
🔗
|
ErkDog |
http://puu.sh/nyQ1e/e493a50a76.png 17 hours to upload one work unit :( |
04:49
🔗
|
yipdw |
are you still hung up on the rsync thing |
04:50
🔗
|
xmc |
computerwise or ontologically |
04:50
🔗
|
yipdw |
yes |
05:12
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:19
🔗
|
|
Sk1d has joined #archiveteam-bs |
06:02
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
06:02
🔗
|
|
Stiletto has joined #archiveteam-bs |
06:17
🔗
|
|
Sk2d has joined #archiveteam-bs |
06:22
🔗
|
|
Sk1d has quit IRC (hub.se irc.du.se) |
06:32
🔗
|
FalconK |
ErkDog: the patch for that is in |
06:33
🔗
|
FalconK |
just need to bust out the login creating and permission granting and server updating |
06:33
🔗
|
FalconK |
as tempting as it is to direct all the ananiel pipeline stuff (full disk :/) to collection opensource |
06:33
🔗
|
FalconK |
also, I tried to commit into archivebot, and yes, one needs the blessing |
06:34
🔗
|
FalconK |
SketchCow: WTB 1x commit access to collection archivebot for user FalconK |
06:37
🔗
|
|
Sk2d is now known as Sk1d |
06:43
🔗
|
yipdw |
FalconK: ErkDog's thing looks like a Warrior project, which doesn't use the ArchiveBot uploader |
06:52
🔗
|
|
metalcamp has joined #archiveteam-bs |
06:57
🔗
|
JesseW |
I don't think there's anything wrong with putting the stuff inthe opensource collection. |
06:57
🔗
|
JesseW |
It can be moved later. |
06:57
🔗
|
xmc |
FalconK: right. you should add a tag or whatever it's called 'archivebot' |
06:59
🔗
|
JesseW |
I think tags are called "subject" |
07:00
🔗
|
xmc |
maybe |
07:00
🔗
|
xmc |
or keywords |
07:07
🔗
|
|
vitzli has joined #archiveteam-bs |
07:16
🔗
|
|
mismatch_ has quit IRC (Remote host closed the connection) |
07:17
🔗
|
|
mismatch_ has joined #archiveteam-bs |
07:26
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
08:24
🔗
|
|
schbirid has joined #archiveteam-bs |
08:36
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
08:52
🔗
|
|
Boppen has joined #archiveteam-bs |
09:02
🔗
|
|
bwn has joined #archiveteam-bs |
09:10
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
09:33
🔗
|
|
godane has joined #archiveteam-bs |
10:12
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
11:20
🔗
|
|
jspiros has quit IRC (leaving) |
11:26
🔗
|
|
jspiros has joined #archiveteam-bs |
11:47
🔗
|
|
metalcamp has joined #archiveteam-bs |
11:49
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
12:15
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
12:19
🔗
|
|
godane has joined #archiveteam-bs |
12:21
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
12:47
🔗
|
Smiley |
midas: for some reason it didn't show up lols |
12:55
🔗
|
|
Smiley has quit IRC (Remote host closed the connection) |
13:02
🔗
|
|
Smiley has joined #archiveteam-bs |
13:42
🔗
|
|
RichardG has joined #archiveteam-bs |
14:18
🔗
|
|
wacky has joined #archiveteam-bs |
14:54
🔗
|
|
pgoetz has quit IRC (Remote host closed the connection) |
15:01
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:03
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
15:04
🔗
|
|
w0rp has quit IRC (Read error: Operation timed out) |
15:04
🔗
|
|
closure has quit IRC (Read error: Operation timed out) |
15:04
🔗
|
|
godane has joined #archiveteam-bs |
15:04
🔗
|
|
beardicus has quit IRC (Read error: Operation timed out) |
15:05
🔗
|
|
closure has joined #archiveteam-bs |
15:05
🔗
|
|
midas sets mode: +o closure |
15:05
🔗
|
|
beardicus has joined #archiveteam-bs |
15:06
🔗
|
|
w0rp has joined #archiveteam-bs |
15:20
🔗
|
|
pgoetz has joined #archiveteam-bs |
15:29
🔗
|
SketchCow |
I got a new Ultra-High-Def monitor, so you're all doomed. |
15:29
🔗
|
SketchCow |
I see EVERYTHING |
15:30
🔗
|
midas |
has ultra-high-def monitor, still runs mame at 640x480 |
15:44
🔗
|
|
Start has joined #archiveteam-bs |
16:15
🔗
|
|
pgoetz has quit IRC (Remote host closed the connection) |
16:18
🔗
|
|
ersi has quit IRC (Read error: Operation timed out) |
16:20
🔗
|
|
ersi has joined #archiveteam-bs |
16:20
🔗
|
|
midas sets mode: +o ersi |
16:20
🔗
|
|
swebb sets mode: +o ersi |
16:48
🔗
|
|
JesseW has joined #archiveteam-bs |
16:49
🔗
|
|
pgoetz has joined #archiveteam-bs |
17:07
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
17:11
🔗
|
|
JesseW has quit IRC (Quit: Leaving.) |
17:11
🔗
|
|
vitzli has quit IRC (Leaving) |
17:16
🔗
|
|
metalcamp has joined #archiveteam-bs |
17:46
🔗
|
SimpBrain |
wow scaleway not mucking about with cloud server prices |
18:13
🔗
|
ErkDog |
holy crap yeah |
18:13
🔗
|
HCross |
Yea, but their network speed isnt good |
18:13
🔗
|
ErkDog |
cause of the 300Mbit? |
18:14
🔗
|
HCross |
nope, because they oversell |
18:15
🔗
|
ErkDog |
ahhh so saturated |
18:15
🔗
|
Frogging |
overselling can hit I/O and CPU performance too |
18:15
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
18:15
🔗
|
ErkDog |
LOL plus they publically advertise hey run torrents |
18:15
🔗
|
ErkDog |
https://www.scaleway.com/imagehub/torrents/ |
18:16
🔗
|
SimpBrain |
good for private sites |
18:16
🔗
|
ErkDog |
yeah unmetered servers, run torrents, that will make a good experience for all |
18:16
🔗
|
Frogging |
cloud to butt is fun |
18:16
🔗
|
Frogging |
http://archiveteam.org/index.php?title=User_talk:Jscott |
18:17
🔗
|
Frogging |
"This is partly "fuck my butt" and partly "archive team" related" |
18:17
🔗
|
SimpBrain |
saturated pipes everywhere |
18:25
🔗
|
|
tomwsmf-a has joined #archiveteam-bs |
18:28
🔗
|
joepie91 |
SimpBrain: how dare you use the word 'cloud' |
18:28
🔗
|
joepie91 |
;) |
18:28
🔗
|
SimpBrain |
well it's not a physical dedicated server :P |
18:29
🔗
|
Frogging |
butt server |
18:30
🔗
|
SimpBrain |
tbh going into the future, it should be like cloud dedis especially for tiny companies and individuals, why do you physically need something physical to say it's yours, |
18:33
🔗
|
Frogging |
That's pretty much the way it is already. But dedicated physical servers have advantages, such as not sharing system resources with other users, and you can usually get a whole hard disk to yuorself |
18:33
🔗
|
Frogging |
yourself* |
18:34
🔗
|
Frogging |
I have a VPS and a dedicated server, because sometimes I need more than 24GB of disk space and I don't want to pay $100/month for a higher VPS tier |
18:35
🔗
|
SimpBrain |
yeah hdd space is what is killing vps for small time use |
18:36
🔗
|
yipdw |
dedicated physical server is so nice because I can be super-lazy in my Xen allocations and not give a shit |
18:36
🔗
|
yipdw |
"how much for gitlab? fuck it, 8 gigs" |
18:37
🔗
|
Frogging |
VPS definitely has its place though. They're very flexible and scalable |
18:38
🔗
|
Frogging |
as with most things, it's not black-and-white "doing it this way is unquestionably better at everything" |
18:38
🔗
|
yipdw |
these days I read "flexible" as "fuck you, do it yourself" and "scalable" as "fuck you, pay us more for more nodes" |
18:39
🔗
|
yipdw |
if you're on EC2 both are literally that |
18:39
🔗
|
Frogging |
yipdw: Hah. Yeah it's a bit buzzwordy |
18:39
🔗
|
xmc |
hahaha yes |
18:39
🔗
|
joepie91 |
[19:28] <SimpBrain> well it's not a physical dedicated server :P |
18:39
🔗
|
Frogging |
But I more meant that you can start an instance and do some stuff and then get rid of it without paying a setup fee up front |
18:39
🔗
|
joepie91 |
scaleway? it absolutely is |
18:39
🔗
|
joepie91 |
the ARM pxes anyway |
18:39
🔗
|
joepie91 |
er |
18:39
🔗
|
joepie91 |
boxes |
18:40
🔗
|
joepie91 |
yep |
18:40
🔗
|
Frogging |
If I want to test some shit on a clean system with a clean connection, I just click "new Linode" |
18:40
🔗
|
yipdw |
I am also very annoyed at tracking down this one memory leak that is causing a load balancer to trigger scaling notifications which is causing an autoscaling group to go haywire |
18:40
🔗
|
joepie91 |
still ARM boxes |
18:40
🔗
|
yipdw |
so I am probably biased |
18:40
🔗
|
joepie91 |
SimpBrain: anyhow, "cloud" doesn't mean anything anyway |
18:40
🔗
|
joepie91 |
it's either a physical server, or a VM, and it might have hourly billing, or have an API for spinning them up |
18:40
🔗
|
SimpBrain |
yeah |
18:40
🔗
|
joepie91 |
or have high availability |
18:40
🔗
|
joepie91 |
or geographic redundancy |
18:40
🔗
|
joepie91 |
or a SAN |
18:40
🔗
|
joepie91 |
and any of these things might be indicated with 'cloud' |
18:40
🔗
|
Frogging |
joepie91: nah man |
18:40
🔗
|
joepie91 |
in any combination |
18:40
🔗
|
|
schbirid has joined #archiveteam-bs |
18:40
🔗
|
joepie91 |
:p |
18:40
🔗
|
Frogging |
it's literally in the clouds |
18:40
🔗
|
Frogging |
there's nothing physical about it |
18:41
🔗
|
joepie91 |
it's a meaningless buzzword basically |
18:41
🔗
|
Frogging |
to be fair, it has some degree of meaning. Unlike "internet of things" |
18:41
🔗
|
joepie91 |
no, it really doesn't |
18:41
🔗
|
yipdw |
I store my files in a bong |
18:41
🔗
|
yipdw |
personal cloud |
18:42
🔗
|
Frogging |
i store my files in my butt |
18:42
🔗
|
yipdw |
anyway I don't know where this conversation started, what is it about |
18:42
🔗
|
Frogging |
don't remember :p |
18:42
🔗
|
* |
Frogging scrolls up |
18:43
🔗
|
Frogging |
SimpBrain said something about Scaleway |
18:43
🔗
|
ErkDog |
FOS Makes me so sad :( |
18:43
🔗
|
Frogging |
[12:46:43] <@SimpBrain> wow scaleway not mucking about with cloud server prices |
18:43
🔗
|
ErkDog |
80Kbps :( |
18:43
🔗
|
yipdw |
fos has served us all well for years |
18:43
🔗
|
* |
SimpBrain hides |
18:43
🔗
|
ErkDog |
I've got 45G of data waiting to by dumped.... :-/ |
18:44
🔗
|
ErkDog |
like Wiki and GameTracker would be done if we could dump it somewhere, lol |
18:44
🔗
|
ErkDog |
or at least "caught up" |
18:47
🔗
|
SimpBrain |
gametrailers really hit fos hard |
18:47
🔗
|
SimpBrain |
didnt help we was archiving 4 sites at the time i think |
18:47
🔗
|
ErkDog |
lol gametrailers is a massive amount of data |
18:47
🔗
|
yipdw |
fos is not getting slammed like it was, maybe there's been some controls put on it |
18:48
🔗
|
yipdw |
anyway the fos-to-ErkDog connection doesn't seem like the best either https://gist.github.com/yipdw/07994326c74c7ffa16e6 |
18:48
🔗
|
ErkDog |
well I get 80K/sec here and about 125 from the server I am using |
18:48
🔗
|
ErkDog |
skyrim.towfowi.net |
18:48
🔗
|
SketchCow |
I'm going to revisit FOS and its connection when I get there. |
18:48
🔗
|
ErkDog |
ohhhh, yeah that's the trace you did |
18:48
🔗
|
yipdw |
it could be either end, I think blaming it on fos is premature |
18:48
🔗
|
ErkDog |
i'm on ha.wa.ecansol.net |
18:49
🔗
|
SketchCow |
No, FOS is definitely doing something. |
18:49
🔗
|
SketchCow |
Something bad. |
18:49
🔗
|
ErkDog |
poor FOSy :( |
18:49
🔗
|
ErkDog |
or BAD FOSy whatever the case may be ;-D |
18:49
🔗
|
SketchCow |
Part of it, of course, are the people going "Oh, it's not working fast, LET ME PUT 20 SIMULTANEOUS CONNECTIONS ON IT THAT WILL FIX IT" |
18:49
🔗
|
phuzion |
Yeah, I'm hovering between 115 and 130KB/s going to FOS. |
18:49
🔗
|
SketchCow |
Luckily I can't tell who does this, which is why they are still alive |
18:49
🔗
|
ErkDog |
yeah cause they don't understand the idea of IO thrashing |
18:49
🔗
|
ErkDog |
SketchCow you should be able to |
18:49
🔗
|
ErkDog |
an incoming rsynch shows as a process doesn't ? |
18:50
🔗
|
SketchCow |
No, if I do it, I'll just start murdering you fucks |
18:50
🔗
|
SketchCow |
All of you |
18:50
🔗
|
ErkDog |
LOL |
18:50
🔗
|
ErkDog |
well you can only complain so much, if you want to download all the internet, you have to give us a place to put it bro |
18:50
🔗
|
xmc |
eep |
18:50
🔗
|
SketchCow |
It'll be me and a room of corpses and me with a machete going "good meeting, good meeting" and chewing a sour patch kid |
18:50
🔗
|
phuzion |
hahaha |
18:51
🔗
|
ErkDog |
netstat -alnp|grep #### where ### is the port of your incoming rsynch connections will tell you too |
18:51
🔗
|
ErkDog |
at least it would tell you the # of connections from that IP, but not who owns it |
18:51
🔗
|
ErkDog |
but you could firewall off people who have 1,000 processes running and when they ask why they can't upload stuff, we can explain to them, that they need ONE process per Project, per server, at -most- |
18:52
🔗
|
yipdw |
I considered doing that and it is much harder to maintain than just finding the people and asking them to back off a bit |
18:53
🔗
|
ErkDog |
couldn't you just tell the rsync/ssh protocol to only allow 2 connections per IP? |
18:53
🔗
|
yipdw |
yes but it's not a relevant defense |
18:54
🔗
|
yipdw |
not when you have some people who have access to large subnets and are running warriors on all of them |
18:54
🔗
|
ErkDog |
true |
18:54
🔗
|
ErkDog |
but when I look at the trackers |
18:54
🔗
|
yipdw |
anyway, Atluxity is running a lot of traffic to fotolog |
18:54
🔗
|
ErkDog |
I only see like 5 or so people active on any given project |
18:54
🔗
|
yipdw |
yeah it's one person with a large number of nodes |
18:55
🔗
|
Frogging |
so it's people running a bunch of warriors on one machine that's hammering FOS? |
18:55
🔗
|
yipdw |
many warriors on many machines |
18:55
🔗
|
Frogging |
Is more warriors not better? |
18:55
🔗
|
Frogging |
Or are they doing it wrong |
18:55
🔗
|
yipdw |
more warriors is fine but there are limits to how fast we can take stuff in |
18:55
🔗
|
yipdw |
this is just a limit |
18:55
🔗
|
yipdw |
find why and work around it, etc |
18:56
🔗
|
SketchCow |
I'm going to reboot the box. |
18:56
🔗
|
yipdw |
I also hate the word "scalable" because it gets people excited for no fucking reasn |
18:56
🔗
|
SketchCow |
I do see that the upload speed just skyrocketed. |
18:56
🔗
|
Frogging |
Perhaps the system could be adjusted so that FOS coodinates who is uploading what and when |
18:57
🔗
|
yipdw |
SketchCow: you might want to hold off, it looks like DFJustin's doing a compile |
18:57
🔗
|
ErkDog |
well likely the bottleneck is disk I/O |
18:57
🔗
|
SketchCow |
He is ALWAYS doing a compile |
18:57
🔗
|
yipdw |
oh ok never mind |
18:57
🔗
|
SketchCow |
STOP BEING MY MECHANICS FOR A MOMENT |
18:57
🔗
|
SketchCow |
I have two torrents going on the box, I'm trying to shut them down and avoid living a pile of buff |
18:58
🔗
|
Frogging |
Maybe instead of warriors uploading things ASAP they could upload when FOS asks them to, to limit load |
18:58
🔗
|
ErkDog |
because as you add additional incoming rsynchs, the spead of all the existing transfers is diminished significantly, so 10 RSynchs take more than 10 times as long to complete as a single RSynch |
18:58
🔗
|
yipdw |
so |
18:59
🔗
|
SketchCow |
No, no. The problem is just a matter of the fact that the machine got extended at one point and it never, ever goes back. |
18:59
🔗
|
SketchCow |
And then people "do things" |
18:59
🔗
|
SketchCow |
I wish I knew the command in rtorrent to say "and delete the data" |
18:59
🔗
|
Frogging |
https://www.youtube.com/watch?v=EHybN9UbhWM |
19:01
🔗
|
* |
ersi scales yipdw |
19:03
🔗
|
ErkDog |
If you want to delete data on remove I would suggest adding the below to your rtorrent.rc. It will be both faster and more robust than rutorrent's delete function (which relies on php and a forked process) and has the benefit of not crashing rtorrent since it remembers state instead. |
19:03
🔗
|
ErkDog |
method.set_key = event.download.erased, remove_file,"execute={rm,-drf,--,$d.get_base_path=}" |
19:03
🔗
|
PurpleSym |
rsyncd is able to execute a script before starting a transfer. One could check the current load and stop the transfer if it is too high. |
19:05
🔗
|
SketchCow |
Ha ha ha |
19:05
🔗
|
SketchCow |
HEY GUESS WHAT GUYS |
19:05
🔗
|
SketchCow |
I just found out there's a scheduled reboot of FOS anyway at 7pm EST |
19:08
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
19:09
🔗
|
xmc |
bhahaha |
19:12
🔗
|
ErkDog |
lol |
19:22
🔗
|
|
Start has joined #archiveteam-bs |
19:27
🔗
|
|
bwn has joined #archiveteam-bs |
19:48
🔗
|
SketchCow |
OK, so I'm going to see about shutting down my torrenting, cleaning up a few things, and then we get the reboot |
19:59
🔗
|
|
SN4T14 has quit IRC (Remote host closed the connection) |
20:02
🔗
|
|
SN4T14 has joined #archiveteam-bs |
20:23
🔗
|
wacky |
Don't suppose anyone from the IA could gimme 5 min of time to hit a few questions off of them |
20:24
🔗
|
ErkDog |
sweet SketchCow thanks :-D |
20:26
🔗
|
JW_work |
wacky: toss your questions here — the worst that will happen is none of us will know or be willing to answer. |
20:29
🔗
|
wacky |
I work for a commercial archiving solution, we have a client (end user who owns the originally archived content) who is looking to get some content, them as the original content owner is it possible to get a warc/warc export? |
20:29
🔗
|
wacky |
They would have no problem paying for such a service |
20:29
🔗
|
MrRadar |
That's a question that would need to be addressed to the IA directly. |
20:30
🔗
|
MrRadar |
If it's content from the IA's Wayback Machine |
20:30
🔗
|
JW_work |
I'd suggest sending that question to info@archive.org, providing (in the initial email) the specific URLs you are interested in, and whatever proof you have that you represent the original content owner. I have no idea whether that would be feasible, but it seems reasonable to me. |
20:32
🔗
|
MrRadar |
If it's something that we (the Archiveteam) archived then the WARCs should already be available for download from the IA |
20:34
🔗
|
JW_work |
good point. You can look up archivebot stuff with the viewer; for other stuff … probably search the wiki to see if it was a project. |
20:35
🔗
|
MrRadar |
For reference, the ArchiveBot viewer is here: http://archive.fart.website/archivebot/viewer/ |
20:37
🔗
|
wacky |
Awesome - thanks all! Ill give the suggestions a shot |
20:38
🔗
|
JW_work |
cool, glad we were able to give you some pointers |
20:39
🔗
|
ErkDog |
sigh fart.website |
20:40
🔗
|
ErkDog |
lol a lot of the things archivebot is working on don't seem like "small" websites |
20:40
🔗
|
ErkDog |
one is @ 54 gigs, lol |
20:41
🔗
|
MrRadar |
Scroll down to the bottom of the dashboard to see some *really* big jobs |
20:41
🔗
|
ErkDog |
yeah lol one is 100 gigs, that one is 999 gigs? |
20:44
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
20:46
🔗
|
SketchCow |
54gb is small |
20:49
🔗
|
phuzion |
54gb is tiny. I have a flash drive with more than 54gb of usable capacity. |
20:49
🔗
|
phuzion |
Actually, I have like 3 or 4 laying around. |
20:50
🔗
|
ErkDog |
LOL well I guess it depends on how you look at it |
20:50
🔗
|
ErkDog |
since -most- websites are like super tiny compared to that |
20:50
🔗
|
ErkDog |
we run a hosting company |
20:50
🔗
|
ErkDog |
our customer's largest site is 1.5 gigs, and it's eCommerce |
20:51
🔗
|
MrRadar |
Keep in mind that the ArchiveBot saves web requests not necessarily what would be stored on the server |
20:51
🔗
|
MrRadar |
If you had a PHP script that printed an endless stream of random numbers that would be small on disk but the response would be huge |
20:52
🔗
|
MrRadar |
For full-site grabs we also tend to target sites that have lots of interesting stuff to save |
20:55
🔗
|
ErkDog |
hmmm true |
21:04
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:35
🔗
|
FalconK |
SketchCow: so do you want me to upload things to opensource with a special tag? or somewhere else? |
21:41
🔗
|
FalconK |
meh whatever I'll just upload them with subject: archivebot for now and we can always make more changes if desirable. |
21:43
🔗
|
|
VADemon has joined #archiveteam-bs |
21:45
🔗
|
xmc |
yep |
21:45
🔗
|
xmc |
as long as they're separable from everything else |
21:52
🔗
|
arkiver |
SketchCow: any taks this year in the Netherlands? |
21:52
🔗
|
arkiver |
talks* |
21:53
🔗
|
SketchCow |
None planned, but then again this is the year I planned for not doing much speaking/travel except the Japan trip |
21:54
🔗
|
godane |
i figure a telethon at the end of the year at IA |
21:57
🔗
|
|
fie has joined #archiveteam-bs |
21:57
🔗
|
FalconK |
ok, much, much better |
21:57
🔗
|
FalconK |
getting 5 mbit up into IA |
21:57
🔗
|
FalconK |
the uploads are collection: opensource, subject: archivebot |
21:58
🔗
|
FalconK |
content-type: web |
21:58
🔗
|
FalconK |
who moves them? |
22:00
🔗
|
yipdw |
if you can hold off the uploads until we can get that sorted out, that'd be nice |
22:00
🔗
|
yipdw |
I don't think the viewer will find those |
22:00
🔗
|
yipdw |
(until they get in the right place) |
22:07
🔗
|
|
metalcamp has quit IRC (Ping timeout: 258 seconds) |
22:10
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
22:14
🔗
|
|
dashcloud has joined #archiveteam-bs |
22:15
🔗
|
FalconK |
argh, since I already started, I can't. |
22:16
🔗
|
xmc |
no worries |
22:16
🔗
|
xmc |
items can always be moved |
22:16
🔗
|
xmc |
it's easy |
22:17
🔗
|
FalconK |
it looks like mostly a metadata change yes |
22:17
🔗
|
xmc |
yup |
22:17
🔗
|
xmc |
metamgr can do it i think? |
22:17
🔗
|
FalconK |
on the bright side, my pipeline is emptying out now and actually crawling things again |
22:18
🔗
|
FalconK |
so |
22:18
🔗
|
FalconK |
who are the individuals that are needed to sort it out? |
22:18
🔗
|
xmc |
what's your IA account email address? |
22:18
🔗
|
FalconK |
falcon@falconk.rocks |
22:18
🔗
|
xmc |
nerd |
22:19
🔗
|
xmc |
https://archive.org/details/archiveteam_archivebot_go_falconk_test_20160307www_youtube_com_20160306 this thingy |
22:19
🔗
|
xmc |
your item name is kind of fucky |
22:19
🔗
|
FalconK |
yes, that was the test item |
22:19
🔗
|
xmc |
ah |
22:20
🔗
|
FalconK |
there is another, which was just uploaded, but isn't showing up under my uploads page |
22:20
🔗
|
FalconK |
though I recall it taking a moment |
22:21
🔗
|
yipdw |
oh, right, that's the main problem with distributed upload |
22:21
🔗
|
yipdw |
s |
22:21
🔗
|
yipdw |
naming |
22:21
🔗
|
xmc |
can you get to metamgr with your account http://archive.org/metamgr.php?&w_uploader=falcon@falconk.rocks |
22:22
🔗
|
dxrt |
Just my 2c on this whole thing -I don't really want all the random crap my pipeline has grabbed to show up under my user account and linked to me - especially if something questionable is discovered later, it kind of seems like it'll be my liability and 'my upload' rather than the current system. |
22:23
🔗
|
FalconK |
xmc: not authorized |
22:23
🔗
|
xmc |
ok |
22:23
🔗
|
yipdw |
dxrt: as far as I can tell, the rsync mode still exists |
22:23
🔗
|
FalconK |
yes |
22:23
🔗
|
FalconK |
this change is very optional |
22:23
🔗
|
yipdw |
I am however wondering how to name these items |
22:23
🔗
|
yipdw |
the time-sequence thing doesn't work anymore |
22:23
🔗
|
yipdw |
and UUID is not a solution |
22:23
🔗
|
dxrt |
Right! I thought it was a current re-work of the current uploader, but I'm happy to hear that! |
22:23
🔗
|
xmc |
archivebot_username_date ? |
22:24
🔗
|
yipdw |
maybe, assuming username keeps all their clocks in sync |
22:24
🔗
|
FalconK |
so the way I am naming then now is like archiveteam_archivebot_go_falconk_content_radiosega_net_20160307 |
22:24
🔗
|
xmc |
well is it a problem to put them in somewhat incorrect items |
22:24
🔗
|
xmc |
because timestamps exist in the datas |
22:24
🔗
|
FalconK |
for a crawl of content.radiosega.net which the crawler named with 20160307 as the date in the filename |
22:24
🔗
|
arkiver |
SketchCow: ok, the little archiveteam meeting last year was nice. We got some new project out of it too |
22:24
🔗
|
xmc |
i thought you were doing one item per day per pipeliner |
22:25
🔗
|
FalconK |
well I thought of doing that and then I wondered why I was associating items which had no logical association except that they were gathered proximally |
22:26
🔗
|
FalconK |
I mean the item name is pretty arbitrary right? |
22:26
🔗
|
xmc |
yes |
22:26
🔗
|
xmc |
it comes down to semantics really |
22:26
🔗
|
xmc |
i guess there's nothing wrong with item per archivebot job |
22:26
🔗
|
yipdw |
yeah I guess in the end I'm ok with that |
22:26
🔗
|
xmc |
but we do a bunch of single-page grabs too |
22:26
🔗
|
* |
xmc shrug |
22:27
🔗
|
yipdw |
provided the viewer can find them |
22:27
🔗
|
* |
yipdw checks |
22:27
🔗
|
FalconK |
there would be something wrong with one item per 5gb chunk |
22:27
🔗
|
FalconK |
I doubt the viewer will find them until they are moved into a blessed collection |
22:27
🔗
|
xmc |
an item per job, containing one or many warcs |
22:27
🔗
|
FalconK |
they're in opensource with type web currently |
22:27
🔗
|
xmc |
sounds good to me |
22:27
🔗
|
yipdw |
oh, I meant that the viewer doesn't add additional criteria on top of collection |
22:27
🔗
|
yipdw |
like /[0-9]+/ |
22:27
🔗
|
FalconK |
oh |
22:27
🔗
|
yipdw |
I think the answer is no? but I haven't checked |
22:27
🔗
|
FalconK |
I hope it doesn't! |
22:28
🔗
|
yipdw |
ok the answer is probably "it's fine" |
22:28
🔗
|
FalconK |
it would be good to know but I have no way to verify |
22:28
🔗
|
FalconK |
cool |
22:28
🔗
|
FalconK |
I can help with whatever bulk crap needs doing as a result of this |
22:28
🔗
|
yipdw |
at least https://github.com/ArchiveTeam/ArchiveBot/blob/master/viewer/archivebotviewer/database.py#L417, to me, indicates that we're clear |
22:29
🔗
|
FalconK |
archivebot identifiers already have _, and I am doing some string translation |
22:29
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
22:29
🔗
|
yipdw |
it just occurred to me because we do have some tools that do that check |
22:30
🔗
|
FalconK |
the translation is re.sub(r'[^0-9a-zA-Z-]+', '_', basename) |
22:30
🔗
|
FalconK |
so DNS characters or _ |
22:30
🔗
|
|
ndiddy has joined #archiveteam-bs |
22:30
🔗
|
yipdw |
yeah those'll be fine |
22:30
🔗
|
yipdw |
AFAICT |
22:30
🔗
|
FalconK |
:) |
22:30
🔗
|
FalconK |
if not, we'll see. |
22:32
🔗
|
FalconK |
and... wow, I have made a thing that uploads over 1GB per hour of internet into the archive. |
22:32
🔗
|
yipdw |
nice |
22:32
🔗
|
* |
FalconK enjoys this |
22:32
🔗
|
yipdw |
the only other place I've seen that is on another rsync target we have |
22:33
🔗
|
yipdw |
it's Kenshin's |
22:33
🔗
|
FalconK |
well actually 1GB per 15min |
22:33
🔗
|
yipdw |
as far as I can tell Kenshin basically owns Singapore |
22:33
🔗
|
FalconK |
this just has 1gbps upstream |
22:33
🔗
|
FalconK |
nothing special about it besides that |
22:34
🔗
|
FalconK |
the transfer rate is really, really fluttery though |
22:34
🔗
|
FalconK |
they end up looking like this: |
22:34
🔗
|
FalconK |
https://archive.org/details/archiveteam_archivebot_go_falconk_content_radiosega_net_20160307 |
22:36
🔗
|
FalconK |
the non-viewability seems to be common to WARCs in opensource |
22:36
🔗
|
yipdw |
yeah |
22:37
🔗
|
yipdw |
the WARC also doesn't have extension .warc.gz for some reason |
22:37
🔗
|
FalconK |
it is _warc_gz |
22:37
🔗
|
FalconK |
hmm. |
22:37
🔗
|
FalconK |
is that my doing? |
22:38
🔗
|
yipdw |
it could be; IIRC wpull does .warc.gz |
22:38
🔗
|
FalconK |
yes, it is my doing |
22:38
🔗
|
yipdw |
I don't think it matters for derives (though maybe it does), but it can matter for browser downloads |
22:38
🔗
|
yipdw |
and etc |
22:38
🔗
|
FalconK |
let me fix that. |
22:38
🔗
|
arkiver |
it matters for derives |
22:38
🔗
|
yipdw |
oh |
22:42
🔗
|
FalconK |
fixed. targets are now like /archiveteam_archivebot_go_falconk_content_radiosega_net_20160307/content.radiosega.net-inf-20160307-051602-1qvpq-00001.warc.gz |
22:42
🔗
|
yipdw |
cool |
22:42
🔗
|
FalconK |
now is there some way to rename the one extant misnamed file |
22:44
🔗
|
FalconK |
... probably not. |
22:44
🔗
|
FalconK |
not by me anyway. |
23:16
🔗
|
VADemon |
Does anyone know, is 1GB softlimit per WARC file still recommended for mirrors or should it be raised? |
23:21
🔗
|
ErkDog |
ftp ftp RSynch target is fast |
23:21
🔗
|
ErkDog |
the* |
23:21
🔗
|
ErkDog |
I can dump @ 15M/sec from 2 different systems all day |
23:28
🔗
|
FalconK |
** rsync |
23:29
🔗
|
|
tomwsmf-a has quit IRC (Read error: Operation timed out) |
23:30
🔗
|
ErkDog |
so you made it so you can upload directly into IA FalconK instead of having to rsynch it somewhere? |
23:37
🔗
|
|
xXx_ndidd has joined #archiveteam-bs |
23:38
🔗
|
|
fie_ has joined #archiveteam-bs |
23:38
🔗
|
|
hawc145 has joined #archiveteam-bs |
23:39
🔗
|
|
RichardG_ has joined #archiveteam-bs |
23:42
🔗
|
|
phuz has joined #archiveteam-bs |
23:42
🔗
|
|
Start has joined #archiveteam-bs |
23:42
🔗
|
|
is-_ has joined #archiveteam-bs |
23:42
🔗
|
|
ndiddy has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
dashcloud has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
fie has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
RichardG has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
ohhdemgir has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
yipdw has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
signius has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
HCross has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
ErkDog has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
chfoo has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
toad1 has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
JW_work has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
phuzion has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
is- has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
MrRadar has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
chazchaz has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
Laverne has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
SimpBrain has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
zino_ has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
Infreq has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
Darkstar has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
slyphic has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
Frogging has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
dcmorton has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
Cameron_D has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
dxrt has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
atlogbot has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
swebb has quit IRC (hub.efnet.us irc.servercentral.net) |
23:42
🔗
|
|
Famicoma1 has quit IRC (Ping timeout: 270 seconds) |
23:43
🔗
|
|
chazchaz_ has joined #archiveteam-bs |
23:44
🔗
|
|
yipdw_ has joined #archiveteam-bs |
23:44
🔗
|
* |
FalconK looks at the sadness that is efnet |
23:44
🔗
|
|
dxrt_ has joined #archiveteam-bs |
23:45
🔗
|
|
Infreq_ has joined #archiveteam-bs |
23:45
🔗
|
|
ErkDog_ has joined #archiveteam-bs |
23:45
🔗
|
|
swebb_ has joined #archiveteam-bs |
23:45
🔗
|
|
Frogging_ has joined #archiveteam-bs |
23:45
🔗
|
|
chfoo0 has joined #archiveteam-bs |
23:46
🔗
|
|
zino__ has joined #archiveteam-bs |
23:46
🔗
|
|
SimpBrai1 has joined #archiveteam-bs |
23:48
🔗
|
FalconK |
ErkDog_: yes, I did. |
23:48
🔗
|
FalconK |
(so we can forget that the correct spelling of the project name is rsync, or that it even exists, perhaps...) ;) |
23:51
🔗
|
|
pi has joined #archiveteam-bs |
23:55
🔗
|
|
pi is now known as MrRadar_ |
23:56
🔗
|
ErkDog_ |
lol |
23:56
🔗
|
ErkDog_ |
soz :-D |
23:57
🔗
|
|
ErkDog_ is now known as ErkDog |
23:57
🔗
|
|
dashcloud has joined #archiveteam-bs |
23:57
🔗
|
|
swebb_ is now known as swebb |
23:57
🔗
|
|
Frogging_ is now known as Frogging |
23:58
🔗
|
|
JW_work has joined #archiveteam-bs |
23:58
🔗
|
|
MrRadar_ is now known as MrRadar |
23:59
🔗
|
|
toad1 has joined #archiveteam-bs |
23:59
🔗
|
|
slyphic has joined #archiveteam-bs |