Time |
Nickname |
Message |
00:00
🔗
|
|
DogsRNice has joined #archiveteam-ot |
00:08
🔗
|
|
sHATNER has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
Tenebrae has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
Fusl has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
Dallas has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
jodizzle has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl) |
00:08
🔗
|
|
nyaomi has quit IRC (hub.efnet.us irc.efnet.nl) |
00:09
🔗
|
|
yano has quit IRC (Read error: Operation timed out) |
00:10
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
00:13
🔗
|
|
sHATNER has joined #archiveteam-ot |
00:13
🔗
|
|
Tenebrae has joined #archiveteam-ot |
00:13
🔗
|
|
Fusl has joined #archiveteam-ot |
00:13
🔗
|
|
Dallas has joined #archiveteam-ot |
00:13
🔗
|
|
jodizzle has joined #archiveteam-ot |
00:13
🔗
|
|
MrRadar2 has joined #archiveteam-ot |
00:13
🔗
|
|
BnAboyZ has joined #archiveteam-ot |
00:13
🔗
|
|
nyaomi has joined #archiveteam-ot |
00:13
🔗
|
|
irc.efnet.nl sets mode: +oo Fusl MrRadar2 |
00:14
🔗
|
|
Fusl_ sets mode: +o Fusl |
00:15
🔗
|
|
yano has joined #archiveteam-ot |
00:18
🔗
|
|
schbirid has joined #archiveteam-ot |
00:18
🔗
|
|
Fusl sets mode: +o kiska1 |
00:18
🔗
|
|
Fusl sets mode: +o kiskabak |
00:18
🔗
|
|
Fusl sets mode: +o kiska |
00:18
🔗
|
|
Fusl sets mode: +o JAA |
00:18
🔗
|
|
Fusl sets mode: +o svchfoo3 |
00:18
🔗
|
|
Fusl sets mode: +o jrwr |
00:18
🔗
|
|
Fusl sets mode: +o astrid |
00:18
🔗
|
|
Fusl sets mode: +o dxrt |
00:18
🔗
|
|
Fusl sets mode: +o chfoo |
00:18
🔗
|
|
Fusl sets mode: +o svchfoo1 |
00:18
🔗
|
|
Fusl sets mode: +o ivan_ |
00:19
🔗
|
|
Fusl sets mode: +o dxrt_ |
00:19
🔗
|
|
Fusl sets mode: +o AlsoJAA |
00:19
🔗
|
|
Fusl sets mode: +o arkiver |
00:19
🔗
|
|
Fusl sets mode: +o Kenshin |
00:19
🔗
|
|
Fusl sets mode: +o SketchCow |
00:19
🔗
|
|
Fusl sets mode: +o Fusl_ |
00:19
🔗
|
|
Fusl sets mode: +o hook54321 |
00:19
🔗
|
|
Fusl sets mode: +o Kaz |
00:19
🔗
|
|
Fusl sets mode: +o HCross |
00:24
🔗
|
|
Dragnog2 has quit IRC (Quit: Connection closed for inactivity) |
02:18
🔗
|
|
qw3rty111 has joined #archiveteam-ot |
02:24
🔗
|
|
qw3rty119 has quit IRC (Read error: Operation timed out) |
02:59
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
02:59
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
02:59
🔗
|
|
Flashfire has joined #archiveteam-ot |
03:00
🔗
|
|
kiska has joined #archiveteam-ot |
03:00
🔗
|
|
Fusl sets mode: +o kiska |
03:00
🔗
|
|
Fusl_ sets mode: +o kiska |
03:29
🔗
|
|
qw3rty112 has joined #archiveteam-ot |
03:35
🔗
|
|
qw3rty111 has quit IRC (Read error: Operation timed out) |
03:37
🔗
|
|
killsushi has quit IRC (Quit: Leaving) |
03:51
🔗
|
|
odemg has quit IRC (Read error: Operation timed out) |
04:06
🔗
|
|
odemg has joined #archiveteam-ot |
04:07
🔗
|
|
DogsRNice has quit IRC (Read error: Connection reset by peer) |
05:48
🔗
|
|
godane has quit IRC (Ping timeout: 506 seconds) |
05:54
🔗
|
|
Dragnog2 has joined #archiveteam-ot |
06:01
🔗
|
|
killsushi has joined #archiveteam-ot |
07:08
🔗
|
|
schbirid has quit IRC (Read error: Operation timed out) |
08:20
🔗
|
|
bluefoo has quit IRC (Read error: Operation timed out) |
08:41
🔗
|
|
godane has joined #archiveteam-ot |
09:24
🔗
|
|
Dragnog2 has quit IRC (Quit: Connection closed for inactivity) |
09:45
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
11:24
🔗
|
|
bluefoo has joined #archiveteam-ot |
13:02
🔗
|
|
VerifiedJ has joined #archiveteam-ot |
13:05
🔗
|
|
bluefoo has quit IRC (Ping timeout: 615 seconds) |
15:45
🔗
|
|
Dragnog2 has joined #archiveteam-ot |
16:27
🔗
|
|
schbirid has joined #archiveteam-ot |
17:36
🔗
|
|
Dj-Wawa has quit IRC (Quit: Connection closed for inactivity) |
17:41
🔗
|
|
systwi_ is now known as systwi |
17:49
🔗
|
|
bluefoo has joined #archiveteam-ot |
18:18
🔗
|
|
Dj-Wawa has joined #archiveteam-ot |
18:24
🔗
|
yano |
https://www.phoronix.com/scan.php?page=news_item&px=CVE-2019-1125-SWAPGS |
18:28
🔗
|
|
nataraj has joined #archiveteam-ot |
18:38
🔗
|
systwi |
Is there anywhere I can find a list of network protocols? I'm trying to find ones like http:// https:// ftp:// etc. I'm writing a script & need to check if any of those have a . in them |
18:39
🔗
|
systwi |
And before you mention it I did look it up earlier |
18:40
🔗
|
systwi |
I was just having a difficult time finding a list that explicitly stated if a protocol can or cannot have a "." in the name |
18:43
🔗
|
Fusl |
systwi: not really. you can probably find a list of common URL protocols but since technically any arbritary URL protocol can be registered and the database be extended, there can always be others |
18:43
🔗
|
Fusl |
https://www.mediawiki.org/wiki/Manual:$wgUrlProtocols is a good list |
18:49
🔗
|
|
nataraj has quit IRC (Read error: Operation timed out) |
18:55
🔗
|
coderobe |
systwi: according to the spec it can be any ascii string https://url.spec.whatwg.org/#url-representation |
18:56
🔗
|
coderobe |
it cannot be null but it can be an empty string too |
19:00
🔗
|
h3ndr1k |
Git can work with ssh+git:// |
19:01
🔗
|
h3ndr1k |
Better use a proper lib to parse the url and escape all chars you can't handle |
19:05
🔗
|
|
deevious has quit IRC (Ping timeout: 252 seconds) |
19:19
🔗
|
|
Stiletto has quit IRC (Read error: Connection reset by peer) |
19:28
🔗
|
|
Stiletto has joined #archiveteam-ot |
19:28
🔗
|
systwi |
Ugh, well that's going to be difficult. See, in my script I need it to accept urls like: https://www.example.com/ , but also urls like www.example.com |
19:29
🔗
|
systwi |
And to filter out something that's junk, i.e. akjh.sf.dsfa://www.example.com/ |
19:31
🔗
|
h3ndr1k |
Technical that is not junk. But if you cant handle some schemes you should just ignore them. And when there is no scheme you have to define a default one for your program |
19:32
🔗
|
h3ndr1k |
urls are hard to handle universally |
19:32
🔗
|
systwi |
There might be a *nix program I can pass the URL into but I don't know of any |
19:33
🔗
|
h3ndr1k |
I remember a video of a talk, where 3 different python libraries parsed 3 different host information from the same specially crafted url |
19:34
🔗
|
systwi |
I guess I can just do that, blacklist all protocols that I know of that I don't want to use, and then for unrecognized ones tack on "https://" to the beginning and if it doesn't work, then report to the user that the URL is bad |
19:34
🔗
|
h3ndr1k |
I dont quite understand your problem. A url parsing lib should be all you need. |
19:35
🔗
|
systwi |
I'm sorry I don't understand that. This is a shell script btw |
19:37
🔗
|
h3ndr1k |
Uh ok. Kind of hard to do that with shell. Maybe a small python script which parses a url and returns all parts (scheme, host, etc) line separated? |
19:37
🔗
|
h3ndr1k |
Whatever, you will figure something out. |
19:37
🔗
|
systwi |
Sounds like a good idea except for the fact I know nothing about python :-/ But yeah I will keep looking around, thanks |
19:50
🔗
|
h3ndr1k |
systwi: This might help you: https://pastebin.com/N7xiCErW |
19:51
🔗
|
h3ndr1k |
Thats how you would parse a url in python. You can then print whatever properties you want and use then in your shell script |
19:55
🔗
|
|
VerifiedJ has quit IRC (Quit: Leaving) |
20:04
🔗
|
systwi |
Thanks h3ndr1k I'll give that a try |
20:05
🔗
|
h3ndr1k |
np |
20:13
🔗
|
systwi |
So, I'm probably doing it wrong, but I saved that pastebin content into a .py file and ran it in my Terminal like so: $ ./url_test.py |
20:14
🔗
|
systwi |
That gave an exit code of 1 which would make sense since there is no URL |
20:14
🔗
|
systwi |
Trying: $ ./url_test.py irc://example.com |
20:14
🔗
|
systwi |
Outputted some text to the screen giving an exit code of 0. |
20:15
🔗
|
systwi |
The exact same result happen with http://example.com |
20:15
🔗
|
systwi |
:-/ |
20:24
🔗
|
h3ndr1k |
Yeah it is just a template. In the moment there is a constant string passed to the parser instead of the argument |
20:25
🔗
|
h3ndr1k |
You just have to replace 'ab...://......' with url |
20:26
🔗
|
h3ndr1k |
systwi: ^ so that you have urlparse(url) in there |
20:27
🔗
|
systwi |
Ok thanks let my try that. Is there a way I can pass argument 1 into the (url) place? e.g. in bash it would be "$1" |
20:28
🔗
|
h3ndr1k |
sys.argv[1] is exactly $1 |
20:28
🔗
|
h3ndr1k |
And the line url=sys.argv[1] essentially asigns $1 to $url (In shell terms :) ) |
20:29
🔗
|
systwi |
Ohhh okay cool let me try that |
20:30
🔗
|
systwi |
Thanks for the shell terms btw makes more sense to me :) |
20:31
🔗
|
h3ndr1k |
:) |
20:48
🔗
|
systwi |
It looks like it's working well so far, however I tried filling in a junk input ( $ ./url_test.py htt://shttp:example.com/ ) and it kinda killed python: |
20:49
🔗
|
systwi |
$ ./url.py htt://shttp:example.com/ |
20:49
🔗
|
systwi |
htt |
20:49
🔗
|
systwi |
shttp |
20:49
🔗
|
systwi |
Traceback (most recent call last): |
20:49
🔗
|
systwi |
File "./url.py", line 16, in <module> |
20:49
🔗
|
systwi |
print(parsed.port) |
20:49
🔗
|
systwi |
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/parse.py", line 169, in port |
20:49
🔗
|
systwi |
port = int(port, 10) |
20:49
🔗
|
systwi |
ValueError: invalid literal for int() with base 10: 'example.com' |
20:49
🔗
|
systwi |
That's okay, though, I can just check the user input beforehand to make sure a bad url like that isn't passed to url_test.py |
21:04
🔗
|
h3ndr1k |
systwi: We could wrap the parsing in a try-catch-block and exit with 1 in case of such an error. But Im about to go to bed now. I may find something tomorrow. |
21:05
🔗
|
h3ndr1k |
The error says, that the port (after the colon) is not a number btw. |
21:05
🔗
|
systwi |
No worries, see you tomorrow, thanks! |
21:19
🔗
|
yano |
i found this in my irc logs from a few years ago; it seems to be a collection of over a billion reddit comments from 2007 to 2015 according to the readme, magnet:?xt=urn:btih:7690f71ea949b868080401c749e878f98de34d3d&dn=reddit_data |
21:20
🔗
|
systwi |
yano: What format? |
21:20
🔗
|
yano |
"Each file is compressed with bzip2 compression. When uncompressed, each file |
21:20
🔗
|
yano |
is a series of JSON blocks delimited by new lines (\n). The name of each file |
21:20
🔗
|
yano |
follows the format RC_yyyy-mm.bz2 where yyyy is the year and mm is the month. |
21:20
🔗
|
yano |
RC stands for "reddit comments."" |
21:21
🔗
|
yano |
this is the readme file, https://0bin.net/paste/qzH7ebO+eKGBGwKn#6m7gsYDKUNcX0pbKg56Sfh4qLGWfuC2IEEBz26HnSva |
21:22
🔗
|
yano |
it's about 150 GB |
21:23
🔗
|
|
Jens has quit IRC (Remote host closed the connection) |
21:24
🔗
|
|
Jens has joined #archiveteam-ot |
23:54
🔗
|
|
Dragnog2 has quit IRC (Quit: Connection closed for inactivity) |
23:57
🔗
|
|
Flashfire has quit IRC (Remote host closed the connection) |
23:57
🔗
|
|
kiska has quit IRC (Remote host closed the connection) |
23:57
🔗
|
|
Flashfire has joined #archiveteam-ot |
23:58
🔗
|
|
kiska has joined #archiveteam-ot |
23:58
🔗
|
|
Fusl sets mode: +o kiska |
23:58
🔗
|
|
Fusl_ sets mode: +o kiska |